Synchroscalar and Imagine - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Synchroscalar and Imagine

Description:

ASICs can perform tasks with low power. DSPs are much more flexible, lower performance and higher power/Op. ASICs expensive to develop ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 12
Provided by: ctho4
Category:

less

Transcript and Presenter's Notes

Title: Synchroscalar and Imagine


1
Synchroscalar and Imagine
  • Chris Thomas
  • Chris Chaney
  • 10/5/05

2
Background - Synchroscalar
  • ASICs can perform tasks with low power
  • DSPs are much more flexible, lower performance
    and higher power/Op
  • ASICs expensive to develop
  • Synchroscalar seeks performance and power similar
    to ASIC with flexibility of DSP

3
Scynchroscalar - Architecture
  • Array of processing elements arranged in a grid
  • Elements grouped into columns
  • Each column runs one stream of instructions
  • High bandwidth within column (256 bit bus)
  • Segmented bus static scheduling
  • Allows for broadcast or localized communication
  • Aimed at high parallelism with low clock rate
  • Each column has its own clockrate/voltage

4
Synchroscalar - Architecture
5
Synchroscalar - Voltage/Freq
  • Each column has independent voltage/freq
  • Analysis of workload producer/consumer
    relationships
  • Scale back column performance for less
    computationally intensive threads
  • Power scales dramatically with frequency
  • Up to 32 power savings on full applications
  • Up to 83 power savings on kernels
  • Power consumption only 8-30x worse than ASICs
  • Number of tiles was varied for each application

6
Synchroscalar - Results
7
Background - Imagine
  • Aims to provide very high performance on DLP
    workloads
  • Media processing apps are a common workload
  • High ratio of computation to memory bandwidth
  • Ops are mostly independent, very latency tolerant
  • Performance of an ASIC with more flexibility
  • High percentage of area devoted to ALUs

8
Imagine - Architecture
  • 48 ALUs organized into 8 clusters
  • Each ALU has 2 ported local register file
  • Outputs of ALUs go to all LRFs in cluster
  • 544 GB/s
  • Each cluster has 3 adders, 2 multipliers, and
    divide/sqrt unit
  • Each cluster also has a scratch pad register
    file, and inter-cluster communication unit
  • Clusters execute 576 bit VLIW instructions

9
Imagine - Architecture
  • Large stream register file (128 KB)
  • Single ported
  • 22 logical ports
  • Each buffer reads 2 blocks of 32 words
  • Contend for single physical port
  • Works because data accesses are streaming, can
    prefetch very effectively.

10
Imagine - Architecture
11
Imagine - Results
  • Comparision to Nvidia Quadro
  • 1/5 the speed in raster limited scenes
  • Faster in geometry limited scenes
Write a Comment
User Comments (0)
About PowerShow.com