Datapath Synthesis of VLIW Video Signal Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Datapath Synthesis of VLIW Video Signal Processor

Description:

Run pixie -idtrace. Run dis -h. Dynamic trace. Scheduler. Result ... Instrumented program prog.pixie. Block Diagram of the Scheduler. Dependency analyzer ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 18
Provided by: engi53
Category:

less

Transcript and Presenter's Notes

Title: Datapath Synthesis of VLIW Video Signal Processor


1
Data-path Synthesis of VLIW Video Signal Processor
Zhao Wu and Wayne Wolf Dept. of Electrical
Engineering, Princeton University
2
Outline
  • Introduction
  • Architectural paradigm
  • Trace-driven simulation
  • Performance estimation
  • Conclusions

3
Introduction
  • Why programmable VSP?
  • intense computation
  • complex and diverse video applications
  • increased development cost
  • time-to-market pressure
  • Why VLIW?
  • Easy to implement in hardware
  • high speed
  • high degree of ILP available in video applications

4
Architecture Paradigm
5
Architectural Parameters
  • Register file
  • number of registers
  • Functional unit
  • number and type of functional units
  • Interconnect
  • number of clusters
  • interconnect mechanism

6
Impact on MPEG-2 Encoder
7
Trace-Driven Scheduling
Run pixie -idtrace
Instrumented program prog.pixie
Dynamic trace
Resource description
Binary program prog
Scheduler
Disassembled program prog.asm
Run dis -h
Result statistics
8
Block Diagram of the Scheduler
disassembled program
Assembly code parser
Result statistics
Program trace
Dependency analyzer
VLIW scheduler
Scheduling record
Register manager
Register scoreboard
Memory manager
Memory scoreboard
Resource description
Funct unit manager
Reservation station
Resource manager
9
Features of the Scheduler
  • (Relatively) fast
  • Instrumentation rather than interpretation
  • linear to trace length
  • Moderate memory requirement
  • Pipelining saves storage
  • Large scheduling window
  • up to 109 instructions
  • simulates both a VLIW compiler a VLIW processor
  • Realistic model
  • limited resources

10
Performance Estimation
  • Why do we need performance estimation?
  • trace-driven simulation too slow (trace too long)
  • design space too big
  • How do we estimate?
  • start from full-length trace simulation results
  • increase resource lower bound on cycle count
  • decrease resource upper bound on cycle count

bigger design
target design
smaller design
11
IPC Histogram of ALU
Average IPCALU 11.47
Average IPCALU 13.24
12
Increase and Decrease Resources
13
Decrease resource
  • Split cycles that issue more FU ops and retime
  • 16?88, 15?87, 14?86, 13?85, 12?84,
  • Why upper bound of cycle count
  • 7, 6, 5, 4, could be combined with 1, 2, 3, 4,

14
Increase resource
This cycle removed
  • Tnew Told - T8
  • 16?88, 15?87, 14?86, 13?85, 12?84,
  • Why lower bound of cycle count
  • sometimes cant merge (e.g. increase from 8 to
    12)
  • sometimes no parallelism

15
Change More Than One Resource
  • Have to take into account resource
    inter-correlation
  • depres1,res2,n of cycles when at least one
    res1-instruction depends on n res2-instructions
  • Combine several bounds into one semi-bound
  • Increase resource (mgtn)
  • Decrease resource (mltn)

16
Results
17
Conclusions
  • Trace-driven simulation
  • quantitative evaluation of an architecture
  • too slow to be applied for every possible design
  • Performance estimation
  • based on simulated results
  • automated procedure
  • accurate enough
Write a Comment
User Comments (0)
About PowerShow.com