Title: PlayStation and IBM Cell Architecture
1PlayStation and IBM Cell Architecture
- Benjamin Levine,1 Jacob Schroeder,2 Pavan
Tumati,3 Eric DeSturler,2 Sanjay Patel,3 Todd J.
MartÃnez1
1Department of Chemistry 2Computer Science
3Electrical and Computer Engineering
2Commodity-Off-The-Shelf (COTS) Computing
- Benefits
- Economy of Scale
- Ease of Upgrade
- Obstacles
- Dissimilarity between desktop applications and
physics simulations
3Commodity-Off-The-Toy-Shelf (COTTS) Computing
- Benefits
- Economics of game consoles
- Similarity between video games and physics
simulations
- Obstacles
- Complexity of hardware
- Absence of scientific software (e.g. BLAS)
- Lifespan of product
4PS2 Architecture
16 MB RDRAM
16 MB RDRAM
Emotion Engine (EE)
Graphic Synthesizer
Video
IEEE 1394
Sound Processor
16 MB SDRAM
I/O Processor
Sound
USB
Controller
Operating System ROM
PCMCIA interface
DVD
5Emotion Engine (EE) Architecture
Vector Processing Unit 1 (VPU1)
Vector Processing Unit 0 (VPU0)
to Graphics Synthesizer
CPU
System Bus (128 bit)
Direct Memory Access Controller (DMAC)
Image Processing Unit
Memory Interface
I/O Interface
to Peripherals
to Main Memory
6Key Components
Vector Unit 0 (VU0)
Floating Point Unit (FPU)
Processor Core
Instruction Memory (4 kB)
Data Memory (4 kB)
Scratchpad RAM
Inst. Cache
Data Cache
Vector Interface 0 (VIF0)
CPU Analogous to Pentium
VPU0 Macromode (coprocessor) or Micromode
(Asynchronous w/ 2 instruction streams)
7Key Components
- System Bus
- 128bit
- 2.4Gb/sec transfer
- 10-channel DMA
Vector Unit 1 (VU1)
Instruction Memory (16 kB)
Data Memory (16 kB)
System Bus (128 bit)
Graphics Interface (GIF)
DMA
Memory Interface
Vector Interface 1 (VIF1)
to Main Memory
VU1 Micromode only
8How do we compare to a Pentium?
(MFLOPS Million Floating Point Operations /
Second)
9How do we compare to a Pentium?
(MFLOPS Million Floating Point Operations /
Second)
10How is the EE used by game designers?
Vector Processing Unit 1 (VPU1)
Vector Processing Unit 0 (VPU0)
to Graphics Synthesizer
CPU
System Bus (128 bit)
Direct Memory Access Controller (DMAC)
Image Processing Unit
Memory Interface
I/O Interface
to Peripherals
to Main Memory
11How is the EE used by game designers?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- CPU Control, Basic physics
12How is the EE used by game designers?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- CPU Control, Basic physics
- VPU0 Basic graphics transformation
13How is the EE used by game designers?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- CPU Control, Basic physics
- VPU0 Basic graphics transformation
- VPU1 Further graphics transformation
14How is the EE used by game designers?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- CPU Control, Basic physics
- VPU0 Basic graphics transformation
- VPU1 Further graphics transformation
- GS Texturing
15How is the EE used by game designers?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- General Characteristics
- One directional flow of data
- Each VPU does one task
16Ab Initio Molecular Dynamics (AIMD)
1-e-, 2-e- Integrals
MCSCF, DFT, etc.
Classical Propagation
17How would a quantum chemist use EE for AIMD?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
System Bus
- CPU Control, 1-e- integrals
18How would a quantum chemist use the EE?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
System Bus
- CPU Control, 1-e- integrals
- VPUs 2-e- integrals
19How would a quantum chemist use the EE?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
System Bus
- CPU Control, 1-e- integrals
- VPUs 2-e- integrals
- VPUs Linear Algebra (for MCSCF, etc.)
20How would a quantum chemist use the EE?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
System Bus
- CPU Control, 1-e- integrals
- VPUs 2-e- integrals
- VPUs Linear Algebra (for MCSCF, etc.)
- CPU Classical Propagation
21How would a quantum chemist use the EE?
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
System Bus
- General Characteristics
- Data flow is more bus-intensive/chaotic
- Each VPU does a few tasks
22Obstacles to Overcome
to Graphics Synthesizer
VPU1
CPU
VPU0
System Bus (128 bit)
Image Processing Unit
Memory Interface
I/O Interface
DMAC
to Peripherals
to Main Memory
23Obstacles to Overcome
to Graphics Synthesizer
VPU1
CPU
VPU0
System Bus (128 bit)
Image Processing Unit
Memory Interface
I/O Interface
DMAC
to Peripherals
to Main Memory
24Obstacles to Overcome
to Graphics Synthesizer
GIF
VPU1
CPU
VPU0
VIF0
VIF1
System Bus (128 bit)
Image Processing Unit
Memory Interface
I/O Interface
DMAC
to Peripherals
to Main Memory
25Dot Product Test Macromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- CPU uses VPU0 like a coprocessor
- 4 element dot products
26Dot Product Test Macromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
27Micromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
- VPUs run independently of CPU
- Data transfer controlled by either CPU or DMAC/VIF
28Matrix-Vector Multiplication Micromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
29Matrix-Matrix Multiplication Micromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0
30Playing GAMESS
- GAMESS successfully ported to PS2
- Uses CPU only, not VPUs
- Test case - 11 steps of a HF geometry
optimization of butadiene
Schmidt, M. W. Baldridge, K. K. Boatz, J. A.
Elbert, S. T. Gordon, M. S. Jensen, J. H.
Koseki, S. Matsunaga, N. Nguyen, K. A. Su, S.
Windus, T. L. Dupuis, M. Montgomery, J. A. J.
Comp. Chem. 1993 14, 1347.
31Playstation Cluster
- We installed MPICH on two PS2s.
- All tests ran successfully, with no modifications
at all. - We built and ran a driver to calculate numerical
energy derivatives in parallel.
32Playstation3 and Cell
256Mb RAM 3.0GHz 8SPU/1VMX/1PE 400? Spring 2006
Q1 2006
33Playstation3 and Cell
Collaboration with IBM Yorktown (Ashwini
Nanda) Implemented DP matrix-matrix
multiplication on Cell simulator 85 of peak
performance (at 3.0GHz 80 GFlops) Most PS2
Problems go away! DMA supports transfer to/from
SPU and CPU 256K local memory / SPU (compare to
32K on PS2 VU) 3.0GHz CPU/SPU (compare to 266MHz
for PS2) Hardware DP floating point
support Compiler support from IBM (no more
assembler) Remaining concerns Limited memory
256Mb on PS3, 512Mb on Cell blades
34Conclusion
- Game consoles can potentially out perform
conventional x86 computers for scientific
computing, at a significantly lower cost. - With what we have learned from the Playstation 2
and collaborations with IBM on Cell, we will be
able to utilize the Playstation 3 soon after its
release.
35Dot Product Test Micromode
to Graphics Synthesizer (GS)
VPU1
CPU
VPU0