Evaluating the Imagine Stream Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating the Imagine Stream Processor

Description:

Title: Networks-on-Chip (NoCs) Author: Gracjan Last modified by: Kevin Skadron Created Date: 8/16/2006 12:00:00 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: Grac65
Category:

less

Transcript and Presenter's Notes

Title: Evaluating the Imagine Stream Processor


1
Evaluating the Imagine Stream Processor
  • Jung Ho Ahn, William J. Dally, Brucek Khailany,
    Ujval J. Kapasi,
  • and Abhishek Das
  • ISCA 2004

2
Motivation
  • Provide efficiency of an ASIC
  • Provide flexibility of a programmable processor
  • Simplify special-purpose processor design
  • Lower special-purpose processor design cost
  • Provide better applicability
  • Target media applications

3
Stream Architecture
4
Development Board
PowerPC, 150 MHz 2 x Imagine, 200 MHz FPGA
Bridge, 66 MHz 256MB of SDRAM / Imagine, 100 MHz
5
Applications
6
Mapping
7
Execution on a Single Stream
Kernel 1
SRF
Iteration 1

Input Stream


Output Stream

Iteration n



8
Execution of Multiple Kernels
Kernel 1
SRF
Stream 1

processing


Stream 2
Kernel 2


Stream 3
processing


Kernel 3
Stream 4


processing

9
Application Performance
GOPS 18
GFLOPS 60
10
Sources of Overhead
11
Stream Length Effects
12
Access Pattern Effects
13
Energy Efficiency
  • Energy consumption per FLOP
  • (when normalized to 0.13um 1.2V process)
  • Imagine _at_ 200 MHz
  • 277pJ/FLOP
  • TI C67x DSP _at_ 225MHz
  • 889pJ/FLOP (3.2x more)
  • Intel Pentium M _at_ 1200GHz
  • 3600pJ/FLOP (13x more)

14
Memory Bandwidth Requirement
15
Host Processor Bandwidth Requirement
16
Programming Model
17
Compiler OptimizationsStream Ordering
18
Compiler OptimizationsSRF Overlapping and Packing
19
Compiler OptimizationsStrip-mining
20
Compiler OptimizationsLoop Unrolling and
Software Pipelining
21
Conclusions
  • Provides performance close to that of ASIC and
    flexibility via programming
  • Can sustain between 16 and 60 of the peak
    arithmetic performance
  • Exposed 2-level register file allows compiler to
    exploit locality
  • Broader applicability
  • Requires considerable programming effort
  • Limited to media applications with regular
    control-flow

22
Collab Questions
  • How does the performance compare to other
    processors? (Dan, Marko, Jason, Prateeksha,
    Chris)
  • What is the compiler efficiency? (Mario, Liang)
  • How were the design decisions motivated? (Jing,
    Marisabel)
  • How does the programming model compare to that of
    GPUs? (Greg)

23
(No Transcript)
24
Kernels
Write a Comment
User Comments (0)
About PowerShow.com