Title: Evaluating the Imagine Stream Processor
1Evaluating the Imagine Stream Processor
- Jung Ho Ahn, William J. Dally, Brucek Khailany,
Ujval J. Kapasi, - and Abhishek Das
- ISCA 2004
2Motivation
- Provide efficiency of an ASIC
- Provide flexibility of a programmable processor
- Simplify special-purpose processor design
- Lower special-purpose processor design cost
- Provide better applicability
- Target media applications
3Stream Architecture
4Development Board
PowerPC, 150 MHz 2 x Imagine, 200 MHz FPGA
Bridge, 66 MHz 256MB of SDRAM / Imagine, 100 MHz
5Applications
6Mapping
7Execution on a Single Stream
Kernel 1
SRF
Iteration 1
Input Stream
Output Stream
Iteration n
8Execution of Multiple Kernels
Kernel 1
SRF
Stream 1
processing
Stream 2
Kernel 2
Stream 3
processing
Kernel 3
Stream 4
processing
9Application Performance
GOPS 18
GFLOPS 60
10Sources of Overhead
11Stream Length Effects
12Access Pattern Effects
13Energy Efficiency
- Energy consumption per FLOP
- (when normalized to 0.13um 1.2V process)
- Imagine _at_ 200 MHz
- 277pJ/FLOP
- TI C67x DSP _at_ 225MHz
- 889pJ/FLOP (3.2x more)
- Intel Pentium M _at_ 1200GHz
- 3600pJ/FLOP (13x more)
14Memory Bandwidth Requirement
15Host Processor Bandwidth Requirement
16Programming Model
17Compiler OptimizationsStream Ordering
18Compiler OptimizationsSRF Overlapping and Packing
19Compiler OptimizationsStrip-mining
20Compiler OptimizationsLoop Unrolling and
Software Pipelining
21Conclusions
- Provides performance close to that of ASIC and
flexibility via programming - Can sustain between 16 and 60 of the peak
arithmetic performance - Exposed 2-level register file allows compiler to
exploit locality - Broader applicability
- Requires considerable programming effort
- Limited to media applications with regular
control-flow
22Collab Questions
- How does the performance compare to other
processors? (Dan, Marko, Jason, Prateeksha,
Chris) - What is the compiler efficiency? (Mario, Liang)
- How were the design decisions motivated? (Jing,
Marisabel) - How does the programming model compare to that of
GPUs? (Greg)
23(No Transcript)
24Kernels