

# Multicore Devices: The New Generation of Reconfigurable Architectures

Steven A. Guccione Cmpware, Inc.



#### Abstract

#### **Multicore Devices: The New Generation of Reconfigurable Architectures**

For two decades, reconfigurable computing systems have provided an attractive alternative to fixed hardware solutions. Reconfigurable computing systems have demonstrated the low cost and flexibility of a software solution combined with the high performance of fixed hardware. For a variety of practical reasons, much of the work in this area focused on commercial FPGA devices as the underlying hardware platform. Recently, several new designs have diverged from the bit-level, circuit-oriented architectures of FPGAs and produced a variety of architectures more suitable for computation and high level language programming. These new highly parallel architectures contain a relatively large number of programmable cores, each approaching the complexity of a traditional microprocessor. Today such devices can be found in popular consumer electronics including game consoles and desktop PC graphics controllers as well as a new generation of supercomputers. These new devices, often described using the generic term 'multicore' represent the latest phase in the evolution of reconfigurable systems. Like earlier reconfigurable systems they promise very high performance at relatively low power with high levels of programmability. These new systems also feature software development tools geared more toward traditional high level language programming than the hardware design orientation found in earlier generations of reconfigurable systems.



## What is Multicore?

- Multicore features:
  - Programmable: cores must execute software instructions (not 'hard' IP)
  - *Explicitly parallel*: parallelism visible to the programmer
  - Single device: not a multi-device multiprocessor

Multi•core: a programmable, explicitly parallel, hardware device.



## Why Multicore?

- **Performance:** traditional uniprocessor techniques to improve performance failing
- **Power:** single core devices hitting power limitations (related to performance)
- **Price:** designing and verifying a 1B transistor CPU becoming too expensive

==> the only way to stay on the CPU performance curve (Moore's Law)



#### **Multicore Evolution**



## **Reconfigurable Computing**

- Based on commercial FPGAs
  - Convenient, inexpensive hardware
  - Architecture not well suited to computation
  - Circuit-oriented tools slow and clumsy
  - Proprietary architectures and software
- Research focused on software:
  - HLL to HDL translation
  - HPC applications







## **ALU Array Architectures**

- Replaced FPGA LUTs with coarse grained logic elements (ALUs)
- Retained FPGA-style circuit switched routing
- Simplified tools (?)
- Improved circuit density (?)
- Many diverse approaches







## **ALU Array Architectures**

- ALU Arrays:
  - **PACT XPP**: 8x8 24-bit ALUs. From Kaiserslautern Xputer.
  - *Elixent D-Fabrix*: IP core. From HP Chess. Acquired by Matsushita.
  - *Rapport KiloCore*: 16 x 16 8-bit ALUs. From CMU PipeRench.



# ALU Array Architectures (cont.)

- Chameleon CS2112: 80 32-bit ALUs.
- Stream Processors Storm-1. 960 32-bit cores. From Stanford Imagine.
- IPFlex DAPDNA: 955 16-bit ALUs
- **Systolix PulseDSP**: 144 16-bit ALUs. Acquired by RadioScape.
- MathStar Arrix FPOA: 256 32-bit ALUs + 64 40bit MACs.



# Transitional: Early Multicore

- Mostly targeting DSP / wireless
  - **Texas Instruments OMAP**: Dual core RISC + DSP for mobile handsets.
  - *Xilinx Virtex V2Pro*: quad PowerPC + FPGA
  - **QuickSilver Technology**: multiple heterogeneous cores for mobile handsets.
  - Cradle CT3400: 8 DSPs + 6 CPUs (all 32-bit)





#### **Multicore: Networking**

- Networking:
  - Cisco CRS-1: 192x Tensilica RISC cores
  - **PA Semi PWRefficient**: 2x PowerPC
  - Raza Micro XLR700: 8x MIPS64
  - Freescale MPC8572: 2x PowerPC
  - Broadcom (SiByte) BCM1250: 2x 4x MIPS64





## **Multicore: Game Consoles**

- High performance / low power
- High volume consumer 'appliance'
- Game consoles:
  - Sony Playstation3: PowerPC + 8x VLIW cores (Sony/Toshiba/IBM 'Cell')
  - Microsoft Xbox 360: 3x PowerPC CPUs







# Multicore: Desktop and Servers

- Replicate existing CPU cores 2x 8x
- Desktop and servers:
  - *Sun*: 2x 8x SPARCs
  - Intel / AMD: 2x 4x x86
- Useful for task level parallelism
- Reduces power
- Decreases design cost
- Does not address memory bottleneck



intel

Smarter Choice



## Transitional: Multicore SoC

- Custom ASIC
- Embedded in various consumer devices (MP3 players, etc.)
- May contain multiple CPU cores + IP
- Supported by most CPU core IP companies (*MIPS, Tensilica, ARM, ARC*)
- Special purpose / niche market



# Transitional: Soft Multicore

- Configure FPGA as multicore multiprocessor
- Hundreds of RISC CPUs in an FPGA
- Program with standard HLL tools ('C')
- Permits hardware / software trade offs
  - New instructions
  - Custom coprocessors
  - Memory resizing







#### **Massively Multicore**

- Larger number of cores (10s to 100s+)
- Targeting embedded systems
  - Azul Vega 2: 48x CPUs (64-bit)
  - Ambric Am2000: 336x CPUs
  - picoChip PC200: 248x 16-bit DSPs
  - Tilera Tile64: 64x MIPS CPUs





## Massively Multicore (cont.)

- Boston Circuits gCORE: 16x ARC CPUs
- Intellasys: 24x custom cores
- Parallax Propeller: 8x custom cores
- ElementCXI CXI64: Heterogeneous multicore
- Coherent Logix: Defense DSP



#### Massively Multicore: GPUs

- Evolved from graphics ASICs
- High performance floating point
- GPUs:
  - ATI / AMD FireStream: 320x FP cores
  - *Nvidia GeForce*: 112x FP cores (32-bit)
- GPU-like devices for HPC:
  - Grape-DR: 512x FP cores (Japan)
  - ClearSpeed CSX600: 96x FP cores



ClearSpeed







#### **Multicore Software**

- Hardware available for years; software still under development
- Described as 'crisis', 'panic', etc.
- Some traditional multiprocessing influence
- Ties to Reconfigurable Computing (*Tilera*, *ElementCXI*, *Rapport*, others)
- Industry ahead of academia
  - Little academic research in multicore
  - Large number of diverse start-ups



#### **Multicore Software Tools**

- Multiprocessor tools:
  - Communication libraries (MPI)
  - Threads programming
  - Mostly ad-hoc / task level
- Languages:
  - Lots of data parallel / stream tools
  - Much experimentation
- Other tools:
  - Mostly ports of unicore tools (Gnu)

# Commercial Multicore Software

- Design specification:
  - *RapidMind*: Data parallel / stream (GPU)
  - PeakStream: Data parallel / stream (GPU; acquired by Google)
  - National Instruments: LabView (graphical)
- Simulation / debug:
  - Cmpware: CMP-DK
  - Imperas: OVP (open source)



## **Multicore Information On-Line**

- Cmpware Multiprocessor Report
  <u>http://www.MultiprocessorReport.com/</u>
- Berkeley Parallel Computing Lab
  <u>http://view.eecs.berkeley.edu/wiki/Main\_Page/</u>
- The Multicore Association

http://www.multicore-association.com/

GPGPU.ORG

http://www.gpgpu.org/

• Web sites for various multicore companies

## **Multicore Information On-Line**

- Places *not* to go:
  - <u>Multicore.com</u> recently owned by <u>Multicore</u>
    Solders (yes, the stuff that holds IC to PCBs)
  - <u>Multicore.com</u> now owned by German manufacturer *Henkel* (sells laundry soap, cosmetics -- not *J. A. Henckel* knives, BTW)
  - <u>Multicore.org</u> a one-page web site for 'building systems' in Nambia, Africa
  - Multicore.net still available



#### Conclusions

- 100s 1000s core on the horizon
- Great power / performance benefits
- No 'silver bullet' for software
- Various tools and techniques will evolve
- Applications will drive tools
- Apps and tools will drive architecture (we can hope and dream)



# **Extra Slides**







