CompilerGenerated Communication for Pipelined FPGA Applications

About This Presentation

Title:

CompilerGenerated Communication for Pipelined FPGA Applications

Description:

Compiler-Generated Communication for Pipelined FPGA Applications ... Splash 2, PipeRench, Napa C. Parallelizing Compiler Background. Program analysis on arrays ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 17

Provided by: informatio144

Category:

more less

Transcript and Presenter's Notes

Title: CompilerGenerated Communication for Pipelined FPGA Applications

1
Compiler-Generated Communication for Pipelined
FPGA Applications
USC

University of Southern California
Information Sciences Institute
Heidi Ziegler, Mary Hall, Pedro Diniz
June 4th, 2003

2
Mapping Assignment
3
Mapping an Application to Hardware
4
I have to get this design out!
Due Yesterday!! Is help available?
5
Combining the Best of Two Technologies

Behavioral Synthesis
Optimizations
program analysis for scalar variables
Supports user controlled
unrolling
Manages register and
inter-operator communication
Considers one FPGA
Performs hardware allocation,
binding and scheduling

Parallelizing Compiler
Optimizations
program analysis for scalar and array variables
across loop iterations
Analysis guides automatic loop
transformations
Manages register and memory
trade-offs
System-level view
No knowledge of hardware
implementation

6
This Research

Automatically maps a C application onto a set of
FPGAs
Identifies coarse grain pipeline stages
Determines data to be communicated
Determines best communication granularity
Finds communication placement points
Performs code transformations

7
System-Level Compiler
C
Communication and Pipeline Analysis (RDAD, CED)
Code Transformations
Generate VHDL
Behavioral Synthesis and Estimation
Commercial Tools
Work discussed in this paper
Design Space Exploration (System-Level Metrics)
No
Inputs
Good Design?
Yes
Logic Synthesis / Place Route
Configuration bit-stream
8
Sequential MVIS Kernel
Write
Execution Order
Read
Sobel
Feature
Time
Distance
data dependence
2-D array
access order row-wise
9
Reaching Definition Data Access Descriptor

Set describes basic data access information
s program point
r, w read or write array access
a accessed array section, integer linear
inequalities
t traversal order, vector of dims., slowest to
fastest
d vector of dominant induction variables for ea.
dim
w set of statements this tuple describes (def or
use)
g set of reaching definitions

10
Communication Requirements
Solve directly for data, granularity, placement
Read (4)
Write (3)
11
Communication Edge Descriptor

Set describes communication between stages
si, sj sending receiving pipeline stages
a array section, per communication instance
l send point
r receive point

12
Compare Valid Granularities for MVIS

Element Row
Array on-chip Array off-chip

S1
S1
S1
S1
S2
S2
S2
S2
13
MVIS Execution Times
14
Communication Placement

architecture a_main of main is
p1 process
begin
main_loop loop
FOR x IN 0 to 28 loop
wait until clk'event
FOR y IN 0 to 28 loop
wait until clk'event
uh1 -3 u(x32y) - ....
uh2 3 u(x32y) ....
B(x32y) uh1 uh2
dPass0local(y) B(x32y)
end loop
-- send a row of B
dPass0 lt dPass0local
end loop
setVariable(calcDone)
end loop main_loop
end process p1

p2 process begin main_loop loop FOR x
IN 0 to 28 loop wait until clk'event
-- receive a row of B dPass0local
dPass0 wait until clk'event
FOR y IN 0 to 28 loop wait
until clk'event B(x32y)
dPass0local(y) if (th lt
B(x32y)) then
feature_x(x32y) x feature_y(x32y) y
else
feature_x(x32y) 0 feature_y(x32y) 0
end if
dPass1local(y) feature_x(x32y)
dPass2local(y) feature_y(x32y)
end loop -- send row of
feature_x,y (Edges 2,3) dPass1 lt
dPass1local dPass1 lt dPass2local end
loop end loop main_loop end
process p2

15
Related Work

Hardware Background
Mapping to pipelines in hardware
Splash 2, PipeRench, Napa C
Parallelizing Compiler Background
Program analysis on arrays
Hall et. al, Balasundaram Kennedy, Amarasinghe
Communication
Amarasinghe Lam, Kennedy, C.-W. Tseng

16
Conclusion

System-level compiler automatically derives a
pipelined implementation with explicit
communication
Results for different communication granularities
demonstrate
1.76 times faster for row versus array on-chip
3 times faster for row versus element
4.5 times faster row versus array off-chip
Current work integrates communication analyses
with
Partitioning across multiple FPGAs
Global data layout
Design space exploration