CompilerGenerated Communication for Pipelined FPGA Applications - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

CompilerGenerated Communication for Pipelined FPGA Applications

Description:

Compiler-Generated Communication for Pipelined FPGA Applications ... Splash 2, PipeRench, Napa C. Parallelizing Compiler Background. Program analysis on arrays ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 17
Provided by: informatio144
Category:

less

Transcript and Presenter's Notes

Title: CompilerGenerated Communication for Pipelined FPGA Applications


1
Compiler-Generated Communication for Pipelined
FPGA Applications
USC
  • University of Southern California
  • Information Sciences Institute
  • Heidi Ziegler, Mary Hall, Pedro Diniz
  • June 4th, 2003

2
Mapping Assignment
3
Mapping an Application to Hardware
4
I have to get this design out!
Due Yesterday!! Is help available?
5
Combining the Best of Two Technologies
  • Behavioral Synthesis
  • Optimizations
  • program analysis for scalar variables
  • Supports user controlled
  • unrolling
  • Manages register and
  • inter-operator communication
  • Considers one FPGA
  • Performs hardware allocation,
  • binding and scheduling
  • Parallelizing Compiler
  • Optimizations
  • program analysis for scalar and array variables
  • across loop iterations
  • Analysis guides automatic loop
  • transformations
  • Manages register and memory
  • trade-offs
  • System-level view
  • No knowledge of hardware
  • implementation

6
This Research
  • Automatically maps a C application onto a set of
    FPGAs
  • Identifies coarse grain pipeline stages
  • Determines data to be communicated
  • Determines best communication granularity
  • Finds communication placement points
  • Performs code transformations

7
System-Level Compiler
C
Communication and Pipeline Analysis (RDAD, CED)
Code Transformations
Generate VHDL
Behavioral Synthesis and Estimation
Commercial Tools
Work discussed in this paper
Design Space Exploration (System-Level Metrics)
No
Inputs
Good Design?
Yes
Logic Synthesis / Place Route
Configuration bit-stream
8
Sequential MVIS Kernel
Write
Execution Order
Read
Sobel
Feature
Time
Distance
data dependence
2-D array
access order row-wise
9
Reaching Definition Data Access Descriptor
  • Set describes basic data access information
  • s program point
  • r, w read or write array access
  • a accessed array section, integer linear
    inequalities
  • t traversal order, vector of dims., slowest to
    fastest
  • d vector of dominant induction variables for ea.
    dim
  • w set of statements this tuple describes (def or
    use)
  • g set of reaching definitions

10
Communication Requirements
Solve directly for data, granularity, placement
Read (4)
Write (3)
11
Communication Edge Descriptor
  • Set describes communication between stages
  • si, sj sending receiving pipeline stages
  • a array section, per communication instance
  • l send point
  • r receive point

12
Compare Valid Granularities for MVIS
  • Element Row
    Array on-chip Array off-chip

S1
S1
S1
S1
S2
S2
S2
S2
13
MVIS Execution Times
14
Communication Placement
  • architecture a_main of main is
  • p1 process
  • begin
  • main_loop loop
  • FOR x IN 0 to 28 loop
  • wait until clk'event
  • FOR y IN 0 to 28 loop
  • wait until clk'event
  • uh1 -3 u(x32y) - ....
  • uh2 3 u(x32y) ....
  • B(x32y) uh1 uh2
  • dPass0local(y) B(x32y)
  • end loop
  • -- send a row of B
  • dPass0 lt dPass0local
  • end loop
  • setVariable(calcDone)
  • end loop main_loop
  • end process p1

p2 process begin main_loop loop FOR x
IN 0 to 28 loop wait until clk'event
-- receive a row of B dPass0local
dPass0 wait until clk'event
FOR y IN 0 to 28 loop wait
until clk'event B(x32y)
dPass0local(y) if (th lt
B(x32y)) then
feature_x(x32y) x feature_y(x32y) y
else
feature_x(x32y) 0 feature_y(x32y) 0
end if
dPass1local(y) feature_x(x32y)
dPass2local(y) feature_y(x32y)
end loop -- send row of
feature_x,y (Edges 2,3) dPass1 lt
dPass1local dPass1 lt dPass2local end
loop end loop main_loop end
process p2

15
Related Work
  • Hardware Background
  • Mapping to pipelines in hardware
  • Splash 2, PipeRench, Napa C
  • Parallelizing Compiler Background
  • Program analysis on arrays
  • Hall et. al, Balasundaram Kennedy, Amarasinghe
  • Communication
  • Amarasinghe Lam, Kennedy, C.-W. Tseng

16
Conclusion
  • System-level compiler automatically derives a
    pipelined implementation with explicit
    communication
  • Results for different communication granularities
    demonstrate
  • 1.76 times faster for row versus array on-chip
  • 3 times faster for row versus element
  • 4.5 times faster row versus array off-chip
  • Current work integrates communication analyses
    with
  • Partitioning across multiple FPGAs
  • Global data layout
  • Design space exploration
Write a Comment
User Comments (0)
About PowerShow.com