Title: Data Communication Estimation and Reduction for Reconfigurable Systems
1Data Communication Estimation and Reduction for
Reconfigurable Systems
- Adam Kaplan Philip Brisk Ryan
Kastner - Computer Science
Elec. and Computer Engineering - University of California, Los Angeles
University of California, Santa Barbara - June 4, 2003
2From Algorithm to HDL
Application specified in system-level language
HDL (behavioral, structural)
Compiler
- We focus our efforts on mapping an application
written in a high-level language to a hardware
description. - We desire this mapping to have optimal
characteristics (area, latency, etc.) - In this talk, we focus on the problem of
minimizing data communication in the final
hardware.
Synthesis and Physical Design
3Similar Compilation Projects
- Hardware compilers
- Reconfigurable Architecture
- PRISM project synthesize subset of C to FPGA
- Garp compiler (BRASS) synthesize C toprocessor
FPGA platform - DEFACTO synthesize SUIF to FPGA (Wildstar)
- General Architecture
- DeepC compiler synthesize C to HDL
- MATCH compiler synthesize Matlab to HDL
- PICO synthesize nested loops into VLIW-like
functional unit
4Our Framework
SUIF/ MachSUIF Compiler
Control Data-Flow Graph (CDFG)
Hardware Description
- From the SUIF IR, we construct a CDFG
representation. - Each basic block of the CDFG becomes a separate
synthesizable module in the hardware description.
5Characterizing Data Communication
- Two examples of data communication schemes
Control Node 1
Memory (Register Bank, RAM)
Control Node 1
Bus
Control Node 3
Control Node 2
Control Node 2
Control Node 3
Control Node 4
Control Node 4
Distributed
Centralized
data communication wire
data communication storage access
6Identifying Data Communication
- Determine relationship between place(s) where
data is defined and where data is used
a ?
- Naïve method all use-points of a variable
depend on all definitions of that variable - Not all use points use a variable
b ?
a ?
b ?
a ?
c ?
? b
? c
? a
Need analysis to minimize the amount of data
communication
7Minimizing Data Communication
- Must determine relationship between where data is
generated and where data is used - Problem formulation minimize the total number of
bits communicated between all pairs of control
nodes - SSA (Static Single Assignment)
- Changes each variable to have a unique definition
point - Must add ?-nodes to merge definitions
8Using SSA to Minimize Data Communication
- SSA algorithms
- Find location of ?-nodes
- Rename variables
- Three main SSA algorithms
- Minimal, Pruned Cytron et al.
- Semi-pruned Briggs et al.
- Differ in number and location of ?-nodes
- Minimal insert ?-nodes at
- iterated dominance frontier (IDF)
- Semi-pruned insert ?-node at
- IDF if variable live outside some basic block
- Pruned insert ?-node at
- IDF if variable live at that time
9Experimental Setup
HDL Generation
Synopsys Behavioral / Design Compiler
SSA Conversion
10MediaBench Benchmark Suite
- A benchmark suite of DSP applicationsLee et al
- DSP Applications well suited to hardware
implementation - Tend to
- be parallelizable
- be computationally intensive
- often have large basic blocks
for (y_posygrid_start-y_fmid-1,res_pos0
y_poslt0 y_posygrid_step)
for (x_posxgrid_start-x_fmid-1 x_poslt0
x_posxgrid_step,res_pos)
(reflect)(filt,x_fdim,y_fdim,x_pos,
y_pos,temp,FILTER) sum0.0 for
(y_filt_linx_fdim,x_filty_im_lin0
y_filt_linltfilt_size y_im_linx_dim,y_f
ilt_linx_fdim) for (im_posy_im_lin
x_filtlty_filt_lin x_filt,im_pos)
sumimageim_postempx_filt
resultres_pos sum first_col
x_pos1 (reflect)(filt,x_fdim,y_fdim,0,y_p
os,temp,FILTER)
Sample code internal filter of an image convolver
11Results SSA for Data Comm. Minimization
- Edge Weight w(i,j) number of bits communicated
from node i to j - Total Edge Weight (TEW) - corresponds to amount
of data communication
12Results SSA for Area Minimization
13Relationship Between ?-nodesand Data
Communication
14Further Minimizing Data Communication
- Current SSA algorithms place ?-nodes temporally
- In software compilation, live ranges should be
short. - Appropriate in hardware?
Spatial ?-node distribution
Temporal ?-node distribution
a1 ?
b1 ?
a2 ?
b2 ?
a3 ?
c1 ?
? b1
? c1
TEW 3
a4 ? ?(a2,a3)
? a4
15Effect of ?-node Distribution
Spatial ?-node placement
Temporal ?-node placement
16Spatial ?-nodes Distribution Algorithm
- d number of uses of ?-node destination
- s number of ?-node source values
- Number of temporal links
- Number of spatial links
s 3
a3??(a0,a1,a2)
? a3
? a3
d 2
17Spatial SSA Results Num. Spatial ?-nodes
18Spatial SSA Results ? TEW after spatial SSA
19? area After Spatial SSA (from Synopsys)
20Conclusion
- In this work, we demonstrate a mapping from
compiler IR (CDFG) to hardware description. - SSA binds variables to values, which is useful in
reducing data communication between control
nodes. - Spatial distribution of phi nodes can reduce data
communication, modeled as total edge weight
(TEW)by as much as 20. - However, circuit area sometimes increases
- Future research refine the model using
information fromlater stages of synthesis. - Compiler techniques applied to hardware design
can greatly reduce data communication.