A High Performance Application Representation for Reconfigurable Systems PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A High Performance Application Representation for Reconfigurable Systems


1
A High Performance Application Representation
for Reconfigurable Systems
  • Wenrui Gong Gang Wang Ryan Kastner
  • Department of Electrical and Computer
    EngineeringUniversity of CaliforniaSanta
    Barbara, CA 93106-9560
  • gong, wanggang, kastner_at_ece.ucsb.edu
  • http//express.ece.ucsb.edu
  • June 22, 2004

2
Outline
  • Reconfigurable computing systems
  • Compilation process
  • Synthesizing to hardware
  • Experimental results
  • Concluding remarks

3
Outline
  • Reconfigurable computing systems
  • Reconfigurable computing systems
  • Challenges of application representations
  • Compilation process
  • Synthesizing to hardware
  • Experimental results
  • Concluding remarks

4
Reconfigurable Computing Systems
  • Standard programmable platforms
  • Post-manufacturing customization
  • Designs shift from physical chips to
    configuration files
  • A software design flow
  • Feature hardware speed with software flexibility
  • Enable higher productivity

5
Application Representations
  • A common application representation is needed to
    tame the complexity of system synthesis
  • Requirements
  • Able to generate software code for
    microprocessors
  • Able to be easily translate to hardware
    configuration files
  • Allow a variety of transformations and
    optimizations to exploit the performance

6
Parallelism Exploration
  • Fine grain parallelism
  • Multiple functional units
  • Issuing an operation to a free functional units
  • Operations executed independently
  • Coarse grain parallelism
  • Executing multiple threads
  • With occasional synchronization
  • Reconfigurable computing systems support both
    fine and coarse grain parallelism

7
PDG SSA
  • The PDG SSA representation can be used for both
    hardware synthesis and software generation
  • The PDG and SSA forms are common representations
    for software generation
  • Here we concentrate on hardware synthesis

8
Outline
  • Reconfigurable computing systems
  • Compilation process
  • Overview
  • Constructing the PDG
  • Incorporating the SSA form
  • Synthesizing to hardware
  • Experimental results
  • Concluding remarks

9
Overview
10
Program Dependence Graph
  • PDG Program Dependence Graph
  • ENTRY node the root node of a PDG
  • PREDICATE nodes producing predicate values from
    expressions
  • Diamond-shaped nodes 2, 3, and 4
  • STATEMENTS nodes a arbitrary set of operations
  • Circle nodes 1, 4, 6, 7, and 8
  • REGION nodes summarizing all operations with the
    same control conditions together.
  • House-shaped nodes R2, R3, R4
  • R3 the predicate value of 2 is True
  • Edges represent dependencies

11
Constructing the PDG from the CDFG
  • Implemented based on Ferrantes algorithm
  • Using post-dominate tree

var pred for (i 0 i lt len i) val
diff if (val gt 32767) val
32767 else if (val lt -32768) val
-32768 return val
12
Constructing the PDG (contd)
13
The Static Single Assignment Form
  • Each variable has exactly one assignment
  • A variable is referenced always using the same
    name
  • At joint points of control conditions, special Ø
    nodes are inserted.

val diff if (val gt 32767) val
32767 else if (val lt -32768) val -32768
val_2 val_1 diff if (val_2 gt 32767)
val_3 32767 else if (val_2 lt -32768) val_4
-32768 val_5 phi(val_2,val_3,val_4)
14
Extending the PDG with Ø-Nodes
15
The Program Representation
  • Loop independent Ø-nodes
  • taking two or more input values and a predicate
    value
  • committing one of the inputs depending on this
    predicate
  • Loop carried Ø-nodes
  • Input the initial value, the loop-carried value,
    and also a predicate value
  • Outputs one to the iteration body, and the other
    to the loop exit
  • Directing proper values to proper outputs.

16
Outline
  • Reconfigurable computing systems
  • Compilation process
  • Synthesizing to hardware
  • Data-path elements
  • Ø-nodes
  • Experimental results
  • Concluding remarks

17
Synthesizing the Data-Path
  • A one-to-one mapping is used
  • Different resource allocation and binding
    algorithms can be used (on-going work)
  • Each operation has an operator and several
    operands
  • Operands are synthesized directly to wires in the
    circuit
  • Each variable in the SSA form has only one
    definition point
  • PREDICATE nodes synthesized to Boolean logic
    signals to control next-stage transitions and
    direct multiplexers to commit the correct value.

18
Synthesizing Ø-nodes
  • A loop-independent Ø-nodes are synthesized to a
    multiplexer. The multiplexer selects input values
    depending on the predicate values.
  • For a loop carried Ø-node, an additional switch
    is generated to direct the loop-exiting values

19
Synthesize to Hardware
  • Simplifications and optimizations
  • Removing unnecessary control dependencies
  • Cascading/ expanding multipliers obtain better
    performance
  • Flip-flops are inserted
  • Guarantee that correct values will available no
    matter which execution path is taken

20
Outline
  • Reconfigurable computing systems
  • Compilation process
  • Synthesizing to hardware
  • Experimental results
  • Setup and benchmarks
  • Results
  • Concluding remarks

21
Setup and Benchmarks
  • Benchmark suites
  • Functions from the MediaBench suite
  • Profiled using sample data
  • Only report conservative results
  • Estimated execution time
  • Aggressive predicated execution
  • Only report conservative results
  • Area
  • One-to-one mapping without resource sharing
  • Reported in numbers of FPGA slices

22
Estimated Execution Time
23
Estimated Execution Time (contd)
24
Estimated FPGA Area
25
Outline
  • Reconfigurable computing systems
  • Compilation process
  • Synthesizing to hardware
  • Experimental results
  • Concluding remarks
  • On-going/future work

26
Concluding Remarks
  • The PDGSSA form supports a variety of
    transformations and enables both coarse and fine
    grain parallelism
  • A method to synthesize this form to hardware
  • This form gives faster execution time using
    similar area when compared with CFG and PSSA forms

27
On-going/Future work
  • Investigate transformations to create coarse
    grained parallelism using the PDGSSA form
  • Augment the PDGSSA form with architectural
    information to provide fast estimation.
  • Integrate of resource sharing and other
    architectural synthesis techniques

28
Thank You
  • Prof Ryan Kastner and Gang Wang
  • All audiences

29
Questions
Write a Comment
User Comments (0)
About PowerShow.com