White Paper | AMD EMBEDDED R-SERIES PLATFORM

*Inspires Innovative New Applications based on Small Form Factor Board Designs*
# TABLE OF CONTENTS

1. INTRODUCTION 3
2. APUS: BIG PERFORMANCE IN SMALL PACKAGES 3
   2.1 UNDER THE HOOD OF APUS 5
3. EFFECT ON SMALL FORM FACTORS 6
   3.1 HOW LOW CAN YOU GO? 6
   3.2 THERMAL DESIGN CONSIDERATIONS 9
4. APPLYING THE AMD R-SERIES APU TO SFF BOARDS 10
   4.1 COM EXPRESS® AND ITS VARIANTS 10
   4.1.1 IMPLEMENTING WITH THE AMD R-SERIES APU 11
   4.2 THE SUMIT™ STACKABLE EXPANSION INTERFACE 12
   4.2.1 SUMIT SIGNALS 12
   4.2.2 LEGACY SUPPORT 13
   4.2.3 IMPLEMENTING WITH THE AMD R-SERIES APU 13
   4.3 COMIT™ 14
   4.3.1 COMIT SIGNALS 15
   4.3.2 IMPLEMENTING WITH THE AMD R-SERIES APU 15
   4.4 PC/104 STANDARDS 16
   4.4.1 PCI/104-EXPRESS™ AND PCIE/104 16
   4.4.2 IMPLEMENTING WITH THE AMD R-SERIES APU 17
   4.5 QSEVEN® 19
   4.5.1 MXM CONNECTOR 19
   4.5.2 IMPLEMENTING WITH THE AMD R-SERIES APU 19
5. SUMMARY 20
6. WHERE TO FIND MORE INFORMATION 20
1. INTRODUCTION

Consumer demand for multimedia-rich portable and handheld devices, combined with industrial and military miniaturization, have spawned and accelerated several major trends in embedded systems technology and design over the last few years:

• The definition and advancement of new, compact, high-speed interconnect schemes that are completely independent of board form-factor standards (commonly referred to as “Connectorology”, a once obscure term that is becoming widely recognized in the industry).

• Modular design approaches that separate the development of processor-based boards from I/O boards as in COMs (Computer On Module) and stackable board schemes in order to limit the amount of customization that must be done by in-house designers and to also expand the variety and availability of off-the-shelf module solutions.

• Ever shrinking form factors.

• The continuance and proliferation of legacy standards in both board form factors and interconnect schemes to lower system costs and speed time to market, even as new high speed serial I/O interfaces are increasingly implemented to support higher performance peripherals.

• Diversification of feature sets due to the rapid expansion of unique applications that contain very specific requirements.

• A growing emphasis being placed on overall power efficiency and manageability as opposed to just high performance or low power consumption.

New processors and chipsets continue to drive performance up and power down, and the rapid integration of CPU, graphics, and I/O functions into 2-chip (and even single-chip) solutions in order to meet small form-factor demands is also forcing board and system suppliers to come up with new standardized methods of designing modular and scalable systems that require minimal changes to existing software and chassis designs. And as all of these CPUs, platforms, and interconnect schemes continue to evolve, a natural (and necessary) symbiotic relationship has formed between the various players, in the form of consortiums, standards committees, and other technology partnerships, to insure that the resulting products, and the users of those products, can reap the full benefits of all that technological advancement.

In recent years, system designers have become quite adept at “force-fitting” desktop and mobile PC solutions into embedded form factors requiring high levels of performance. On the other end of the scale, new low power x86 (and non-x86) solutions have inspired an entire generation of new innovative handheld and mobile devices. However, even with continued improvements in traditional CPU and Graphics engine performance and power efficiency, designers of small form factor and portable systems still face significant challenges integrating these state of the art solutions into their most ambitious platforms. For a long time, there was a gap that no solution was able to reach the necessary...handheld form factors. But in 2012, AMD filled that gap.

The AMD Embedded R-Series platform, first launched in 2012 with the 2nd generation launched in 2014, bridges the gap between high performance PC solutions and low power embedded solutions by bringing an unprecedented combination of performance, power efficiency, and integration to the embedded market, enabling immersive multi-media and visual experiences to be developed on handheld, portable, and other small form factor devices while minimizing product life cycle costs. The AMD R-Series platform is the high performance line of AMD’s proven Accelerated Processing Units (APUs), combining the power of AMD CPU technology with discrete-class DirectX® 11 capable, AMD Radeon™ graphics into a single device.

In this white paper, we’ll explore the AMD R-Series APU platform as it relates to small form factor (SFF) embedded design and review some of the latest and most popular connector and board standards that are making it possible for system designers to pack higher performance and more features into smaller packages.

2. APUS: BIG PERFORMANCE IN SMALL PACKAGES

Over the past decade, advances in semiconductor technology have continued to follow Moore’s law by roughly doubling the number of transistors available in a given area of silicon every two years. With these ever increasing transistor budgets, architects of traditional x86 CPUs have focused performance improvement efforts on techniques such as increasing clock rates, expanding the size and number of on-chip caches, and adding additional processor cores. As such, performance gains have been tremendous, enabling PCs to become much more efficient at multitasking; however, as fast as these modern PC processors are, they alone still cannot deliver the image, video, and digital signal processing horsepower that many of today’s emerging...
interactive multimedia embedded applications demand, and at the low power required by small form factors. Unlike traditional PC applications primarily built on scalar data structures and serial algorithms, emerging embedded applications, such as medical imaging and intelligent cameras, require processors that can handle vast amounts of data consisting of hundreds if not thousands of individual threads that must be manipulated and processed in parallel.

Graphics processing units (GPUs) originally intended to enhance and accelerate the rendering of 3D images, have evolved into powerful, programmable vector processors that can accelerate a wide variety of data-intensive algorithms and applications (commonly referred to as "Stream processing"). GPUs implemented on PC add-in cards like the familiar AMD Radeon™ graphics can pack Teraflops of floating-point compute horsepower onto a single PCI Express™ graphics card. With each generation from the AMD Radeon™ x1000 to the latest Radeon HD 9000 Series, features have been added and limitations removed, from vertex processing stream operations to flexible branching and array manipulation, finally to append buffers and atomic operations. As of today, the hundreds of processing cores in AMD Radeon GPUs have deep pipelines and are nearly identical to each other, making them highly scalable and setting the stage for GPGPU (General-Purpose GPU) computations of highly parallel workloads. As opposed to the conventional sequential-processing CPUs which have been enhanced with only modest parallelism in the form of multi-threading and multiple cores, modern GPUs are optimized for massive parallel computing – whether graphics or otherwise.

Smaller die geometries and new innovations in silicon design enabled AMD to create the first family of single die CPU+GPU solutions. With hundreds of computing cores, these heterogeneous multi-core processors, or APUs, can help reduce the size and power of embedded systems dramatically while at the same time increasing performance. Modern GPUs are much more scalable than the handful of cores offered in a CPU-centric paradigm and can offer an order of magnitude increase in performance for small form factor embedded applications such as portable ultrasound systems, smart cameras in surveillance and machine vision, integrated digital signage, digital signal processing, and similar compute intensive tasks. AMD’s first embedded APU, the AMD G-Series APU, was released in 2011. It contained 80 GPU cores with a calculated single precision floating point performance of up to 91 GFLOPs. The 1st Generation AMD R-Series APU increased the number of parallel compute units to as many as 384 resulting in up to 578 GFLOPs of calculated single precision performance, greater than 6x that of the AMD G-Series APU, while increasing average power consumption by only a few watts.

Software development tools that enable application developers to fully exploit the benefits of APU architecture have also come a long way. Limited, proprietary tools have given way to advanced, open and portable standards. AMD’s commitment to these standards is evidenced by support for a broad range of APIs and other software development tools, most notably DirectCompute (as part of DirectX 11) and OpenCL™. AMD also provides the AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK), a complete development platform that enables fast and easy development of applications accelerated by AMD APP (Accelerated Parallel Processing) technology. These tools are gaining momentum and enabling developers to create standards-based applications that leverage the combined power of CPU cores and GPU cores, and that can run on a wide variety of hardware platforms.
2.1 Under the Hood of APUs

APU stands for Accelerated Processing Unit, which combines both x86 processing cores and discrete-level graphics processing units on a single die. They earned the name APU by making the GPU fully programmable, offering acceleration to the processor for compute intensive tasks. Until very recently, transistor budget constraints typically mandated a two-chip solution that inherently created performance constraints due to external busses that add latency to memory access. When integrated onto the same die, the APU’s x86 CPU cores and SIMD GPU engines share a common, and much higher speed path to system memory that helps to overcome these constraints. Furthermore, AMD’s APU implementations divide this shared memory into multiple regions: those managed by the operating system running on the x86 cores and others managed by software running on the GPU engines. The APU architecture provides high-speed block transfer engines that move data directly between the x86 and GPU memory partitions without needing the additional bus transactions that occur when the frame buffer and system memory are separate. By structuring the code properly, it is also possible to overlap, or interleave, memory transactions between the CPU and GPU memory partitions to gain even higher performance.
Scalar processors operate on arrays of data one element at a time. Vector processors like those used in advanced GPUs have dozens, and sometimes hundreds of calculating units that operate simultaneously. Using AMD’s proven x86 core technology that can efficiently cut through scalar workloads and through vector workloads using enhanced versions of its GPU technology, total system performance can now be enhanced. Although AMD had to overcome many technical challenges to merge its vector and scalar technologies in a manner that preserves the advantages of each, having the core IP for both processing elements gave AMD the tools to develop this innovative architecture resulting in APUs with a significant advantage over other hardware designs that contain only one or the other. Compared to competing platforms, the end result is either greater performance within the same power envelope, or much lower power consumption within the same performance envelope. In fact, the latest APUs from AMD can now deliver higher graphics performance than many of today’s standard desktop and notebook PCs at lower power.

3. EFFECT ON SMALL FORM FACTORS

APUs are well suited for a wide range of new and emerging embedded applications targeted at small form factors including: medical imaging and diagnostics, video surveillance and image processing, HD and multi-display gaming platforms, interactive digital signage, thin clients, in-vehicle applications and more. As previously mentioned, AMD’s integrated architecture reduces the board footprint of a traditional three-chip x86 platform to two chips: The APU and its companion controller hub. With just 32 x 29mm of combined chip real estate and only 1510 balls, the 2nd Generation AMD R-Series platform’s high level of integration, reduced pin count, low power consumption, and rich I/O support, makes it an ideal choice for small form factor designs that require ever increasing levels of performance. New low-profile-expandable small form factors are here, and thanks to groundbreaking APUs, these new form factors will deliver capabilities not previously available.

3.1 How Low Can You Go?

Small form factor standards such as PC/104 and EBX were designed specifically for embedded applications and have been around for a long time. But developers trying to take advantage of the standardization and economies of scale developed around standardized PC motherboards have worked to bridge the gap between standardized PC motherboards and small form factor embedded motherboards. The ongoing march toward smaller size of these standardized motherboards for embedded applications is illustrated below. Comparing them from just 10 years ago to today, the smallest, high-volume motherboard for embedded systems in 2001 was a derivative of small PC motherboards and was known as Mini ITX, at 170 x 170mm. Mini ITX was then replaced by Nano ITX as the smallest standard embedded motherboard size and by 2007, it had evolved into Pico ITX, which at only 100 x 72mm, represents a size reduction of 75% from Mini ITX. New, smaller platforms continue to be developed today like Mobile ITX, which at 60 x 60mm is driving a new generation of handheld devices.

Looking at some of the embedded specific small form factor boards, COM (consisting of individual CPU and I/O-based modules) form factors have typically been smaller than the standardized embedded motherboards due to their modular/stacking nature and their need to fit onto small form factor base boards to implement I/O. And at 90 x 96mm, PC/104 has been an industry standard form factor for embedded system modules since 1992. Both PC/104 and COM standards have varied in size and connectorization to accommodate various CPU architectures. The figure below shows a comparison of some of the common small form factor embedded board sizes.
When evaluating silicon solutions for small form factor systems, it is important to “look below the surface” of just footprint size and maximum rated power consumption. While these factors are important, they can be misleading and do not provide a complete picture of the solution’s suitability for a given form factor. There are many other related factors that must also be considered before deciding whether a particular silicon solution not only meets performance and thermal system requirements, but is also practical and economically feasible. Factors to be considered include:

*Can the PCB be routed on a minimal number of layers given the available real estate?*

Highly integrated components often do not actually reduce the number of balls or signals that must be routed on the PCB. The smart integration of the AMD R-Series APU solution eliminates many external signals and busses found in traditional “high integration” solutions making it much easier to route within a small footprint or by eliminating costly board layers. Another significant factor is the ball pitch (distance between balls). In order to keep the package size/area small while maintaining a large number of balls, many vendors use fine-pitch BGAs with a ball pitch of 0.6mm or less. By contrast, the AMD R-Series platform uses a 0.8mm ball pitch for the controller hub and between .8mm and 1.2mm ball pitch for the APU, resulting in significantly more area for routing additional traces between balls and for “breaking out” all of the signals (including power and ground) from the device to a smaller number of PCB layers.

---

![Size comparisons of the latest and most popular small form factor boards (SBCs) and modules for embedded systems. Note the relative size of the AMD R-Series APU and controller hub compared to PCB size.](image-url)
How much heat can the system really dissipate?

AMD adheres to the industry recognized definition for TDP or Thermal Design Power which is the maximum power consumption of a device for thermally significant periods running worst-case non-synthetic workloads. Historically, designers of PCs have relied heavily on the TDP rating of the CPU and other major system components to calculate heatsink, fan, and other system cooling requirements. While TDP is important, increasingly there are additional thermal and environmental characteristics that designers of small form factor embedded systems should consider. This is primarily due to the fact that as embedded systems continue to evolve away from traditional PC implementations, the applications they are required to run are becoming more and more device specific. For these cases, the TDP of a part may be misleading since it may not accurately represent the actual power that will be drawn during the execution of a set of device-specific tasks. In the table below you can see that while running 3DMark06, which gives the GPU, CPU cores, Memory Controller, and a portion of the I/O a significant workout, the AMD R-464L APU model rated at 35W TDP (which is actually the highest performance / highest power model currently in the series) consumes just over 13 Watts (and just under 14 Watts including the Controller Hub), representing significantly lower power consumption than what TDP alone would indicate. You can also see that other benchmarks that can be representative of some embedded applications result in even lower average power consumption measurements. These low power consumption numbers, for even the highest performance AMD R-Series APU, can help keep system-level power consumption to within the thresholds defined by the various small form factor specifications. And in many cases, depending on the size and complexity of the chassis, the board design and which AMD R-Series APU is selected, power consumption can be kept below the threshold at which fan-less cooling techniques can be employed.

What features are inherent in the APU and/or chipset that lead to greater power efficiency and easier power/thermal manageability?

As mentioned previously, with higher performance processors continually being developed, greater power efficiency is being demanded by system developers to give them the flexibility they need to put the most demanding applications into the smallest form factors possible. We’ve come a long way since the days of CPUs merely sensing a core temperature breach and triggering the system to either throttle down a system clock or perform some other drastic measure in order to prevent damage to the processor or system. The AMD R-Series platform has been designed to provide maximum power efficiency while giving system and software developers unprecedented control over how the processor operates dynamically in order to provide the optimal balance of performance and power based on thermal conditions and the applications being run at any given time. The heterogeneous architecture of the AMD R-Series APU, combining one or more high performance x86 processor cores (CPUs) with a highly parallel graphics processing unit (GPU) on a single die inherently provides greater processing efficiency for both traditional serial-based workloads and highly parallel graphics, video, and compute intensive workloads. Open programming standards such as OpenCL combined with support for DirectX 11 (including DirectCompute) are also enabling software developers to get the most performance out of the system while

<table>
<thead>
<tr>
<th>APU Model</th>
<th>Core Clock Speed PR/boost</th>
<th>Max TDP</th>
<th>Average Power 3DMark 06</th>
<th>Average Power YouTube</th>
</tr>
</thead>
<tbody>
<tr>
<td>R-464L</td>
<td>2.3/3.2 GHz</td>
<td>35W</td>
<td>13.105w / 0.694w</td>
<td>5.240w / 0.762w</td>
</tr>
</tbody>
</table>

Table 1: AMD R-Series platform TDP vs Average Power

---

1 3DMark 06 average power was measured during a single 10-12 minute iteration of the benchmark
2 Average power was measured playing a 720p YouTube video over 10 minutes

making it easier to partition code to execute on the most power efficient core in the system. Additionally, power management has been built in to every aspect of the AMD R-Series platform (including the APU and Controller Hub) at both the system and core level, and easily controllable through ACPI mechanisms, giving system designers and software developers extremely granular control over performance and power attributes of the various cores and interfaces.

At the heart of the AMD R-Series platform power management architecture is a centralized, highly programmable Application Power Management (APM) controller. This controller utilizes an algorithm called Bidirectional APM (BAPM) which essentially allows the OS to maintain the temperature of the APU within pre-defined limits by controlling power limits of each individual compute unit including the GPU. Each major block within the APU and Controller Hub operates based on programmable performance and power states that utilize dynamic power management features such as clock ramping and power gating. Each major block reports its status to the centralized APM controller. By utilizing this architecture, the OS and applications can ensure that the needed level of performance is achieved while minimizing overall power consumption by ensuring that all blocks are put into low power state when idle and enabling active cores to operate at higher frequency/voltage levels when needed. So, for example, on the AMD R-464L APU for graphics or parallel processing applications, the power allocation can be shifted to the GPU for up to a 38% boost in graphics speed, while staying within the defined power limits. For CPU intensive applications, the power allocation can be shifted to the CPU for up to a 39% boost in CPU speed. Other parts of the platform contain innovative power reducing techniques including:

- Power management of all major I/O interfaces. For example, unused or inactive PCIe lanes can be individually powered down and the width (number of lanes) of certain links can be changed dynamically to save even more power.
- CPU power gating including Core C6 (CC6) enabling individual compute units to be powered down and Core powered off on OS halt.
- Dynamic DRAM speed reduces power when bandwidth requirements are low.
- For applications requiring high performance multi-media capabilities, there are a variety of new power saving features. These include the Video Compression Engine that provides a dedicated hardware video encoder, to quickly and efficiently encode video. The Secure Asset Management unit enables GPU-assisted encryption and decryption of content. And enhancements to the Unified Video Decoder, extend the capabilities of the AMD R-Series platform to include dual, high-definition video decode. Each helps to minimize CPU utilization when dealing with video and reduce APU power consumption.

3.2 Thermal Design Considerations
A major consideration (and quite often the main consideration) for determining whether a given processor solution can be designed into a particular form factor is how much time and effort (and expense) must be put into designing a proper thermal solution in order to keep all major components operating below their max temperature ratings. Because there are so many factors to consider, and so many implications for the rest of the system, this is a decision that should be made very early in the product development cycle. Designing the most cost and thermally efficient solution often requires specialized expertise and the use of modeling, simulation, and analysis tools is becoming more common every day.

Generally, the answer to the question of whether a particular CPU can adequately be cooled in a given system is not a simple yes or no. There are usually levels of difficulty, time, and cost associated with different techniques, depending on the total amount of heat that must be dissipated, and so the designer must balance all of the system requirements against cost and time-to-market pressures to find the ideal solution.

The various cooling methods implemented in embedded systems rely on three types of heat transfer:

- Convection: Heat transfer by mixing action of fluids (gas or liquid). Natural convection (i.e. passive) thermal solutions are typically the simplest as they rely on this natural movement of air and are often accomplished by simply venting the chassis. Forced convection solutions employ one or more fans to increase airflow.
- Conduction: Heat transfer between molecules. Conduction is typically achieved by creating a mechanical connection between components and the chassis directly or indirectly (through a heat spreader plate).
- Radiation: Heat transfer by electromagnetic waves. Radiation is always present but is only a significant factor when power consumption is extremely low.
While a single method of cooling can be adequate, a combination of methods is often necessary to achieve the best tradeoff between efficiency and cost. Designing a thermal solution must then take several factors into consideration including:

- The specific applications that are going to be run and how they affect the actual maximum power consumption of the processor, graphics engine, and other major components of the system.
- The power manageability of the main components of the system – can all of the power management options be taken advantage of and still get the performance needed out of the system to adequately run all of the necessary applications.
- Cost constraints.
- Whether a fan is allowed under any circumstances.
- Environmental constraints such as ambient temperature range and whether the device will be subjected to shock and/or vibration.
- The size of the board, or chassis that must be cooled.
- The maximum rated operating temperature of the CPU (Tdie max) and other major components. Embedded processors such as the AMD R-Series APU are often characterized to run at a higher Tdie max than their commercial counterparts. The Tdie max for the AMD R-Series APUs is 100°C.

And while recent advancements in technology and material quality are providing a wider variety of more efficient cooling solutions, there are also steps designers can take to maximize the efficiency of their thermal designs:

- Use materials with the highest thermal conductivity wherever possible such as copper vs aluminum.
- Ensure the layout of components maximizes airflow with regard to location of a chassis fan and that the air doesn’t flow over the highest heat generating component first. Also, mounting all components on the same side of the PCB allows for a heat plate to be used more effectively to conduct heat directly to the chassis.
- Use a high quality thermal compound / grease when attaching a heat sink or heat spreader to components to maximize overall thermal conductivity.

Choose a heatsink based on whether the system has forced air or not. Heatsinks optimized for airflow have a different shape and fin design than those optimized for natural convection only.

- If a fan cannot be avoided, select one designed to run at a lower RPM, which makes it quieter and more reliable.

4. APPLYING THE AMD R-SERIES APU TO SFF BOARDS

Achieving the smallest size possible while meeting performance and I/O requirements for embedded applications has historically required lengthy and expensive custom designs. The early days of leveraging PC technology often yielded large, bulky designs based on traditional PC card edge connectors and cabling systems that weren’t very rugged or heat efficient.

Over the last several years, a plethora of standardized small form factors and I/O connectivity schemes have been developed to help designers support large amounts of I/O in extremely small systems while improving modularity, interoperability, scalability, and reliability. These form factors and connector schemes run the gamut from complementing each other to being independent of each other, and in some cases even competing against each other. A primary objective in developing the AMD R-Series platform was to bring a new level of performance and power efficiency to the broadest array of these new small form factor standards as possible. Also, by supporting the vast majority of I/O contained within these standards with minimal external circuitry required, the AMD R-Series platform accelerates design cycles while maximizing the number of options system designers and their customers have to choose from.

4.1 COM Express® and its variants

In the 1990’s, the Computer on Module (COM) concept became wide-spread in the embedded industry. COMs allowed the pairing of a highly integrated CPU module with an application specific carrier board for the I/O and eliminated much of the complexity and expense associated with a full-custom single board system while keeping the resulting form factor significantly smaller than traditional PC-based systems. Early COMs had a few drawbacks however. They were not standardized
which limited the number of commercially available modules that were compatible with each other, and the latest and greatest I/O technologies were not always supported.

COM Express, adopted by the PCI Industrial Computer Manufacturer Group (PICMG) in 2005, was developed to standardize the concept of COMs and to address the shortcomings of existing COM technology. A COM Express module typically integrates core CPU and memory functionality, the common I/O of a PC/AT, USB, audio, graphics, and Ethernet and is mated to a carrier board containing application-specific I/O. The I/O signals of the carrier board are mapped to two high density, low profile connectors on the bottom side of the PCB. Designers can select from a variety of standardized pin-out types depending on what I/O their embedded system requires.

COM Express continues to provide a smooth transition from legacy bus technologies to newer interface technologies by defining new pinouts and form factors as needed. Standardized interfaces, pinouts, and connector size and location ensure compatibility between different form factors. The COM Express specifications currently define 3 module sizes:

- **Basic**: 95 x 125 mm
- **Extended**: 110 x 155 mm
- **Compact**: 95 x 95 mm

The Compact form factor is the newest and was developed as part of revision 2.0 of the COM Express specification, adopted in August 2010 and formally known as COM.O Revision 2.0. Revision 2.0 was developed to address the new functionalities and capabilities of the latest embedded processor families, like the AMD R-Series platform, in order to make smaller COM Express form factors possible (especially for existing Basic form factors using the popular type 2 pin-out), and to provide new pin-out types on existing COM Express connectors that offer a migration path to the latest I/O technologies.

An even smaller “Mini” sized module was also developed and announced in January of 2011 by the nanoETXexpress Industrial Group. Known formally as the nanoETXexpress 2.0 specification, the module is 84 x 55 mm and is fully compatible with Com Express rev. 2.0 utilizing pin-out Types 1 and 10.

### 4.1.1 Implementing with the AMD R-Series APU

There are now 7 different pin-outs defined in COM Express Rev. 2.0. The two new pin-outs added to the specification (available from www.picmg.org) are Type 6 and Type 10:

**Type 6**: A double connector (440 pins total) derived from Type 2, this pin-out expands graphics and display options by replacing legacy PCI with Digital Display Interface (DDI) and additional PCI Express lanes. Resulting I/O support includes 24 PCI Express lanes, PEG, DDI (for DisplayPort, HDMI, DVI or SDVO), 4 SATA, 1 LAN, 2 Serial COM ports, and USB 3.0.

**Type 10**: A single connector (220 pins) derived from, and compatible with, Type 1, this pin-out removes 2 SATA channels and reserves those pins for future technologies such as USB 3.0. Resulting I/O support includes 4 PCI Express lanes, 2 SATA, 1 LAN, single channel LVDS only, DDI, and 2 Serial COM ports.
An example of how the AMD R-Series platform supports the Type 6 pin-out is shown in Figure x below:

### 4.2 The SUMIT™ Stackable Expansion Interface

To help alleviate I/O tradeoff issues and in anticipation of new, higher performance, highly integrated, and I/O-rich processor solutions on the horizon, the Stackables Working Group of the SFF-SIG in 2008 developed a legacy-friendly stackable interconnect technology called SUMIT (Stackable Unified Module Interconnect Technology) for small form factor systems. Because it is form factor independent, SUMIT can be applied to any number of standardized SBCs, COMs, or custom board designs for smoothly integrating new, high-speed, serial technologies into legacy systems while preserving OEM investments in I/O, cabling and enclosure designs. SUMIT facilitates the vertical stacking of SBCs and processor modules with I/O boards of various sizes and technologies to create highly modular solutions that can meet a wide range of requirements. SUMIT adoption is accelerating rapidly and is currently deployed on a number of custom and standard SFF SBCs such as EPIC, EBX, ISM, and Pico-ITXe.

#### 4.2.1 SUMIT Signals

SUMIT is defined as two 52-pin, high-speed connectors capable of supporting current and future PCI Express and USB data rates as well as other moderate speed interfaces for I/O expansion. Each connector measures only 0.88 x 0.32” (22.35 x 8.13 mm) with a pitch of 0.634 mm. The size and density provides the basis for a highly compact, I/O-centric multi-board solution that is processor architecture and chipset independent.
SUMIT supports the following I/O connectivity technologies:

<table>
<thead>
<tr>
<th>SUMIT Connector A</th>
<th>SUMIT Connector B</th>
</tr>
</thead>
<tbody>
<tr>
<td>• One PCI Express x1 link</td>
<td>• One PCI Express x1 link</td>
</tr>
<tr>
<td>• Four USB 2.0 channels</td>
<td>• One PCI Express x4 link which can be configured as four x1 links instead</td>
</tr>
<tr>
<td>• Low Pin Count (LPC) bus with SERIRQ</td>
<td></td>
</tr>
<tr>
<td>• Express Card Detection including Request and Present signals</td>
<td></td>
</tr>
<tr>
<td>• Two SPI / uWire channels</td>
<td></td>
</tr>
<tr>
<td>• System Management Bus (SMbus/ I²C)</td>
<td></td>
</tr>
<tr>
<td>• ACPI Wakeup</td>
<td></td>
</tr>
</tbody>
</table>

Each connector can be implemented individually or used together depending on the I/O requirements of the system and cost constraints. By using two smaller connectors rather than a single large connector, SUMIT provides maximum flexibility allowing I/O boards containing only one connector to plug into other processor or I/O boards containing both connectors.

4.2.2 Legacy Support
SUMIT technology preserves OEM investments by bridging existing applications smoothly to new bus technologies and takes the concept further by not forcing complete system redesign for many legacy applications. The use of legacy I/O, especially ISA-bus-based I/O is still widespread in the embedded systems marketplace, especially for applications where not all I/O requires a high-speed serial bus or in long-life applications where the cost of a redesign is not needed or cannot be justified. SUMIT supports these legacy applications in two ways:

1. By providing an LPC bus and the SERIRQ signal (necessary to support ISA-style interrupts) on Connector A. Together, these signals enable devices such as LPC-to-ISA bridges, LPC UARTs, or Super I/O devices to be used in order to interface to legacy devices.

2. By defining a form factor specification called SUMIT-ISM™ that incorporates both SUMIT A/B connectors with either a PC/104 (ISA) or PCI-104 (PCI) connector on an industry standard 90 x 96mm footprint. SUMIT-ISM boards can be implemented as CPU (SBC) or I/O modules and the location and use of the legacy connectors combined with I/O zones and mounting holes ensure that they are compatible and stackable with the large existing base of PC/104 and PCI-104 modules and enclosures in use around the world today. A SUMIT-ISM module using the 104-pin PC/104 connector is called a “Legacy Type 1” module and a SUMIT-ISM module using the 120-pin PCI-104 connector is called a “Legacy Type 2” module.

4.2.3 Implementing with the AMD R-Series APU
SUMIT showcases the AMD R-Series platform well, providing a common set of well-supported low-, medium- and high-speed serial interfaces, including the LPC bus and SERIRQ signal as discussed earlier for legacy applications. Supported configurations include an A-only implementation supporting mostly legacy interfaces, a B-only implementation supporting multiple high bandwidth PCI Express lanes, and an AB implementation supporting SUMIT’s full range of I/O.
4.3 COMIT™

COMIT stands for Computer On Module Interconnect Technology and is the latest modular, high-speed connector system specified by the SFF-SIG. Introduced at the Embedded Systems Conference in San Jose, California in April 2009, COMIT is considered a third generation COM specification and was developed to provide a dense, robust connector solution supporting an optimal balance of new and legacy interfaces for COMs-based small form factor embedded systems. COMIT continues the philosophy behind SUMIT, such as focusing on system-level requirements and design challenges independent of PCB form factors, but includes even more I/O (such as audio, video, and Ethernet) in a smaller package, and in a combination that appeals to the broadest set of small form-factor embedded applications as possible. The COMIT connector has been designed to support even higher bandwidth serial interfaces than what is currently implemented and has unused pins reserved for future revisions of these interfaces. COMIT also simplifies power supply design by including standard power rails and a large number of ground pins, while adhering to ACPI power control standards.

While a COMIT connector can be used on virtually any standardized SFF board, two specific modules have been defined by the SFF-SIG. SFF-COM™ is a 62 x 75mm module intended specifically to allow extremely small CPU modules to be used in industry-standard, embedded SBCs such as EBX, EPIC, ISM, and PC/104. COMIT-ISM™ is an ISM-sized board (90 x 96mm) intended to support slightly larger, more powerful processor/chipset combinations that may not fit on either SFF-COM sized modules or on ISM sized SBCs.

---

**Figure 4: AMD Embedded R-Series Platform interface to SUMIT in a Type AB configuration. Note that interfacing to a PC/104 or PCI-104 connector on a SUMIT-ISM module can easily be accomplished with an LPC-to-ISA bridge (as shown) or a PCI-to-ISA bridge placed on the CPU baseboard or the I/O module.**
4.3.1 COMIT Signals

COMIT employs a 240-pin high-density connector where the supported interfaces, pin assignments, and connector itself were developed according to the goals of optimizing cost-per-pin, pin density, bandwidth through the connector, and total area (footprint). COMIT supports both high-speed serial I/O as well as a select list of legacy I/O deemed to be the most widely adopted for current embedded applications. These interfaces include:

- Three x1 PCI Express links
- One x4 PCI Express link (which can be configured as four x1 links)
- Six USB 2.0 channels
- VGA, digital video, and dual 18/24-bit LVDS video interfaces
- Two SATA channels
- One 10/100 or Gigabit Ethernet
- One 8-bit SDIO
- HD Audio
- LPC Bus
- SPI/uWire, SMBus/I2C Bus
- Power and ground
- System clock and control signaling

4.3.2 Implementing with the AMD R-Series APU

The AMD R-Series platform supports the COMIT interfaces with minimal external interface logic required. Controller Hub options exist to support this specification as certain interfaces continue to evolve such as USB 2.0 to USB 3.0 and SATA 300 to SATA 600.
4.4 PC/104 Standards

In the early days of embedded system design, many companies sought ways to reap the benefits of using the PC architecture in embedded applications. However, the standard PC bus form-factor (12.4" x 4.8") and its associated card cages and backplanes quickly became too bulky (and expensive) to support the requirements of newer, smaller form factor systems.

Throughout the late 80’s and early 90’s, there were few practical ways of embedding the PC architecture into space and power sensitive applications other than designing a full custom solution. However in order to stay competitive, through the reduction of development costs and acceleration of design cycles, companies were continually searching for ways to move away from full custom designs and leverage the growing availability of off-the-shelf, modular solutions whenever possible.

A need therefore arose for a more compact, yet standardized implementation of the PC architecture, satisfying the reduced space and power constraints of embedded control applications. Yet these goals had to be realized without sacrificing full hardware and software compatibility with the popular PC bus standard.

In 1992, the PC/104 Consortium was created in response to this need. It offered full architecture, hardware, and software compatibility with the PC bus, but in ultra-compact (90x96mm) stackable modules. For over two decades, the PC/104 Consortium has continued to develop new standards as PC technology has evolved while continuing to support legacy standards such as ISA and PCI.

4.4.1 PCI/104-Express™ and PCIe/104

The PCI/104-Express specification establishes a standard to use high speed PCI Express® bus in embedded applications. It was developed by the PC/104 Consortium and adopted by member vote in March 2008. Revision 2 was ratified in February of 2011 and adds a secondary connector option (referred to as “Type 2”) that provides interface support for USB 3.0, SATA, LPC, and RTC battery. The PC/104 Consortium chose PCI Express® because of its full PC market adoption, performance, scalability, and growing silicon availability worldwide. It provides a new high-performance physical interface while retaining software compatibility with existing PCI infrastructure.

Incorporating the PCI Express bus within the industry proven PC/104 architecture brings many advantages for embedded applications including fast data transfer, low cost due to PC/104’s unique self-stacking bus, high reliability due to PC/104’s inherent ruggedness, and long term sustainability.

The main objective in defining the addition of PCI Express to PC/104 was to preserve the attributes that have made PC/104 so successful in embedded applications, including maintaining its original 90mm x 96mm size, vertical self-stacking, rugged connectors, four-corner mounting holes, and full PC compatibility.

In addition, it was important that a stackable form of PCI Express take into consideration backward compatibility with current PC/104 Consortium specifications and form factors. The design had to support automatic detection of up or down stacking and had to have automatic link shifting to allow simplified, universal add-on module designs.

PCI/104-Express is the fourth generation of the PC/104 series of stackable bus technologies specified for the industry standard PC/104 form factor. A stackable PCI/104-Express system can be implemented starting with a standalone 90x96mm module or as part of a larger PC/104 standard embedded SBC form-factor such as EPIC or EBX. Implementing PC/104-Express in this way results in form factors known as EPIC-Express and EBX-Express respectively.

PCle/104 is a recent variant of PCI/104-Express (making it the fifth PC/104 module specification overall) that provides a PCIe connector only and is intended for more compact systems that don’t require legacy PCI support. However, since PCI Express is based on PCI technology, PCI Express to PCI Bridge devices are straightforward and widely available, making it fairly simple to create a board stack-up that supports both PCIe and PCI starting with a PCIe/104 CPU Module. Figure x below illustrates how a PCI/104-Express CPU-based stack-up and a PCIe/104 CPU-based stack-up can support both PCI Express and PCI add-in cards.
4.4.2 Implementing with the AMD R-Series APU

There are two complementary versions of PCI/104-Express that can be implemented called Type 1 and Type 2. The main difference is that Type 2 replaces the x16 PCI Express link of Type 1 with SATA, USB 3.0, LPC, and RTC battery as shown in Table 2 below.

<table>
<thead>
<tr>
<th>Feature</th>
<th>PCIe Connector Type 1</th>
<th>PCIe Connector Type 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>x1 PCI Express Link</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>USB 2.0</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>ATX power and control</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>3.3v, 5v, 12v power</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>SMBus</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>X16 PCI Express Link (or two x8 links, two x4 links, or two SDVO)</td>
<td>1</td>
<td>-</td>
</tr>
<tr>
<td>X4 PCI Express Link</td>
<td>-</td>
<td>2</td>
</tr>
<tr>
<td>USB 3.0</td>
<td>-</td>
<td>2</td>
</tr>
<tr>
<td>SATA</td>
<td>-</td>
<td>2</td>
</tr>
<tr>
<td>LPC Bus</td>
<td>-</td>
<td>1</td>
</tr>
<tr>
<td>RTC Battery</td>
<td>-</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 2: PCI/104-Express Type 1 and Type 2 Interface Support
The AMD Embedded R-Series platform is well suited for PC/104-Express and PCIe/104 boards utilizing either PCIe connector configuration as shown in the diagrams below.

Figure 7: AMD Embedded R-Series Platform interface on a PCI/104-Express board containing a PCIe Type 1 and PCI connector.

Figure 8: AMD Embedded R-Series Platform interface on a PCI/104-Express board containing a PCIe Type 2 and PCI connector.
Legacy ISA support is still a requirement within the PC/104 ecosystem, and is used widely in applications such as industrial PC and industrial automation. While the newer PC/104 form factors such as PCI-104, PCI/104-Express, and PCIe/104 do not specify an ISA connector, ISA can still be supported in these board stacks using a PCI-to-ISA bridge device. The ITE Technology IT8888 can support full ISA functionality including DMA support on AMD APU based platforms. Please see the white paper titled Delivering Full ISA Support with the AMD Embedded G-Series APU Platform and the ITE Tech. IT8888 PCI to ISA Bridge, Publication ID 51762, on the AMD Embedded Developers Web site.

4.5 Qseven®
Initially developed in 2008 by three European companies, the motivation behind Qseven was to take concept of COMs and extend it to enable even smaller-sized and lower-power embedded applications. Optimized for mobile and battery-operated applications, Qseven modules provide an extremely small, off-the-shelf, multi vendor, Single-Board-Computer that integrates all the core components of a common PC onto a 70x70mm form-factor, and is then mounted onto an application specific carrier board. Qseven modules have specified pin-outs based on the high speed MXM system connector that has a standardized pin-out and sourced by multiple vendors. The Qseven module provides the functional requirements for a typical embedded application including, but are not limited to, graphics, audio, mass storage, network and multiple USB ports. The carrier board provides all the interface connectors required to attach the system to the application specific peripherals. This versatility allows the designer to create a dense and optimized package, and allows multiple systems to be built using different combinations of processor and I/O modules.

Qseven was designed to be legacy free and so includes only the latest high-speed I/O technologies on this minimum size form factor. The supported interfaces include:
• PCI Express®
• USB 2.0
• ExpressCard™
• High Definition Digital Audio
• Serial ATA®
• LPC
• Secure Digital I/O interface
• Gigabit Ethernet
• DisplayPort™, TMDS or SDVO Interface
• LVDS Display Interface

Currently, there are dozens of Qseven modules (processor) and carrier boards (I/O) available on the market from leading manufacturers. Carriers boards are available in a wide range of form factors including popular standards such as Mini-ITX, Pico-ITX, 3.5” ESB, and PCI/104-Express.

4.5.1 MXM Connector
The Qseven module utilizes a single, 230-pin, ruggedized card-edge connector that is also used for PCI Express capable notebook graphics cards following the MXM specification. Therefore, this connector type is also known as a MXM connector. The MXM connector accommodates two connector heights (5.5mm and 7.8mm) for different carrier board applications needs.

4.5.2 Implementing with the AMD R-Series APU
The AMD R-Series platform supports the Qseven interfaces with minimal external interface logic required. The primary exception to this is in supporting the CAN Bus which facilitates communications between multiple embedded network nodes and is used extensively in automotive applications. The CAN Bus can be supported by simply connecting a CAN protocol controller chip between the SPI Interface of the AMD Controller Hub and the CAN Bus data interface on the MXM connector pins.
5. SUMMARY

As embedded system design marches steadily forward with ever shrinking form factor standards, a gap has formed between the performance that is needed for new and emerging graphics intensive and visually immersive applications, and the low power that is needed to put these applications into small and highly mobile systems. The merging of advanced x86 computing capabilities with the parallel processing power of a general purpose graphics processing unit into a single device, as evidenced by the AMD Embedded R-Series APU platform, bridges the gap between the high performance and low power that embedded system designers need to achieve their most ambitious goals. AMD’s heterogeneous APU architecture provides superior performance per watt, accelerates software development through support for DirectX 11, DirectCompute, and OpenCL, mitigates thermal design challenges through its extensive power optimizations and granular programmability, provides straightforward upgrade paths via pin compatible APU and Controller Hub options, and gives designers a wealth of standardized small form factor platforms to choose from through efficient integration and rich support of advanced I/O.

6. WHERE TO FIND MORE INFORMATION

To learn more about AMD’s APU architecture and the AMD R-Series platform, including product specifications, reference designs, performance comparisons, and application information, please visit AMD’s web site at http://www.amd.com/R-Series. To learn more about notable applications and demos...
created by customers and technology partners using AMD Accelerated Parallel Processing technology, including compilers, libraries, and a multitude of multimedia applications, please visit AMD’s Accelerated Parallel Processing (APP) Developer Showcase at http://developer.amd.com/sdks/AMDAPPSDK/samples/showcase/Pages/default.aspx.

The SUMIT, SUMIT-ISM, COMIT, and Pico-I/O specifications are available as free downloads at www.sff-sig.org, administered by the Small Form Factor Special Interest Group (SFF-SIG).

PC/104 related information and form factor specifications including PCIe/104, EBX, EBX Express, EPIC, and EPIC Express can be downloaded free of charge from the PC/104 Consortium website at http://www.pc104.org/.

AMD is a member of these organizations, and encourages its board-vendor partners to join these forward-thinking trade groups.

Small Form Factor SIG
2784 Homestead Road #269
Santa Clara, CA.
www.sff-sig.org
Please e-mail your inquiries to info@sff-sig.org.

PC/104 Consortium
16795 Lark Avenue, Suite 104
Los Gatos, CA 95032 USA
Phone: 408-337-0904
Fax: 408-521-9191
www.pc104.org
Please e-mail your inquiries to info@pc104.org.

www.amd.com/embedded