DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs

Description:

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California, Los Angeles – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 26
Provided by: Hidet
Category:

less

Transcript and Presenter's Notes

Title: DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs


1
DAOmap A Depth-optimal Area Optimization Mapping
Algorithm for FPGA Designs
  • Deming Chen and Jason Cong
  • Computer Science Department
  • University of California, Los Angeles

This work is partially supported by the
California MICRO program and the NSF Grant
CCR-0306682
2
Outline
  • Introduction
  • Related Works
  • Definitions and Problem Formulation
  • Algorithm Description
  • Cut Enumeration
  • Delay and Area Propagation
  • Cost Function for a Cut
  • Global and Local Cost Adjustments
  • Iterative Cut Selection
  • Experimental Results
  • Conclusions and Future Work

3
Introduction
  • Field Programmable Gate Array (FPGA) has become
    increasingly popular
  • Fast to market
  • No or very low NRE (non-recurring expenses)
  • The LUT-based FPGA architecture dominates the
    existing programmable chip industry
  • FPGA technology mapping converts a given Boolean
    circuit into a functionally equivalent network
    comprised only of LUTs
  • FPGA technology mapping is a crucial optimization
    step in the FPGA design flow

4
Related Works on FPGA Mapping
  • Area Minimization
  • Chortle-crf, Francis, et al, DAC91
  • MIS-pga, Murgai, et al, ICCAD91
  • Praetor, Cong, et al, FPGA99
  • Anti-fuse FPGA Mapper, Kang, et al, ASPDAC04
  • Delay Minimization
  • DAG-Map, Chen, et al, DTC92
  • FlowMap, Cong, et al, ICCAD92
  • Edge-map, Yang, et al, ICCAD94
  • Power Minimization
  • PowerMinMap, Li, et al, ASPDAC03
  • Emap, Lamoureux, et al, ICCAD03
  • DVmap, Chen, et al, FPGA04
  • Simultaneous Delay and Area Minimization
  • FlowMap-r, Cong, et al, TVLSI94
  • CutMap, Cong, et al, FPGA95
  • BoolMap-D, Legl, et al, DAC96

5
Definitions
  • DAG a Boolean network
  • Cone Cv a sub-network rooted on a node v
  • K-feasible cone
  • input(Cv) ? K
  • Fanin Cone Fv the largest Cv
  • K-feasible cut
  • A K-feasible Cv
  • Occupies a K-LUT
  • Unit delay model
  • One LUT contributes one unit delay
  • No edge delay

PIs
a
c
b
d
e
v
6
Problem Formulation
  • Delay-optimal Area Optimization problem
  • Given a Boolean network an integer K
  • Goal cover the network with K-feasible cones
    (K-LUTs), such that
  • Optimal mapping depth
  • Area (number of LUTs) is minimized
  • NP-hard problem on area minimization

7
Highlights of Our Algorithm
  • Consider potential node duplications and make
    mapping-area estimation close to reality
  • Search solution space considering both global and
    local optimality information
  • Carry out an iterative cut selection procedure on
    top of cost adjustment to further improve
    solution quality
  • Each technique used is simple and intuitive
  • The key is the right combination of them

8
Cut Enumeration
z
w
y
x
c
a
b
d
Combine sub-cuts on the inputs of the
gate Process each gate in topological order from
PIs to POs
9
Complexity Analysis
  • Number of cuts on a node for the worst case is
    O(nK)
  • Practically, it is a small constant for small K

Average over 20 largest MCNC benchmarks
10
Delay and Area Propagation
z
w
y
x
b
Delay 1 Area 1
Delay 1 Area 1
a
c
Delay 1 Area 1
d
e
g
f
Delay 2 Area 2
Propagation process visits cuts and nodes
iteratively The longest best delay on the POs is
the optimal mapping delay
11
Area Estimation
Ap
  • AC ? Ai / f(i) UC
  • i input(C)
  • Ai estimated area of the fanin cone on signal i
  • f(i) fanout number of i
  • Uc area of the cut itself
  • Try to estimate area considering fanout effect
  • Praetor, Cong, et al, FPGA99
  • Can under-estimate the area because of node
    duplications

p
n
m
o
f(p) 2
q
r
Cut C
s
u
t
Cut Ct
Cut Cu
12
Cost (Area) Function of a Cut
  • Some Key parameters
  • IC cutsize of C
  • NC number of nodes covered by C
  • f(v) fanout number of the root node v
  • Pf duplication cost

a
C1
c
b
C2
d
e
v
13
Duplication Cost Adjustment
  • Consider potential node duplications
  • Check the sub-cuts for multiple fanouts
  • Propagate adjusted cost globally
  • Duplication Cost
  • NCf number of nodes the subcut Cf contains
  • IC cutsize of C

p
n
m
o
q
r
Subcut Cf2 NCf2 1
Subcut Cf1
s
New cut C IC 4
Multiple fanouts
14
Cut Selection Mapping Generation
  • From POs to PIs
  • Critical paths optimal delay best area
    available
  • Non-critical paths relaxed delay better area

z
w
y
x
b
a
c
d
e
g
f
15
Techniques for Better Cut Selection
  • Cut selection equivalent to min-cover problem
  • Greedy approach will not work well
  • Use heuristics to guide the selection
  • Iterative Cut Selection Procedure
  • Local Cost Adjustment
  • Input Sharing
  • Slack Distribution
  • Cut Probing

16
Iterative Cut Selection (ICS)
  • Some valuable information on area is unknown
    until after mapping
  • mapped LUT root nodes
  • duplicated nodes
  • ICS carries out multiple mapping iterations

17
Local Cost Adjustment Input Sharing
  • Takes advantage of existing resources
  • Considers roots from previous iterations
  • The more a cut shares inputs with others, the
    better for the cut

d
e
g
f
18
Local Cost Adjustment Slack Distribution
  • SlackC Reqv 1 MAX (Arri)
  • i ? input(C)
  • If SlackC lt 0, C is not a timing_feasible cut
  • The larger the SlackC, the better for C in terms
    of slack distribution effect

z
w
y
x
b
Largest arrival time among inputs
a
c
C
d
Reqd Required time of the root
19
Local Cost Adjustment Cut Probing
  • Probe the amount of area gain locally before
    making decisions about a cut
  • Reduce connections between LUTs
  • Reduce potential node duplications based on
    previous duplication profiling
  • Reconvergent paths handling

Use Cfinal to guide cut selection
20
Experimental Results Settings
  • DAOmap is implemented using C language within the
    UCLA RASP system
  • Compare LUT counts and runtime to CutMap Cong et
    al, FPGA95
  • Use a 750 MHz SunBlade-1000 Solaris machine
  • Test on LUT input numbers from 4 to 6
  • Benchmarks
  • 20 largest MCNC benchmarks
  • A set of large industrial benchmarks

21
Experimental Results of DAOmap over CutMap on
MCNC Benchmarks
After mapping
  Average Area Reduction Average Run Time Improvement
4-LUT -13.98 13.2X
5-LUT -16.02 24.2X
6-LUT -12.44 4.7X
After mapping packing (daomap mpack) vs.
(cutmap x mpack)
  Average Area Reduction Average Run Time Improvement
4-LUT -7.50 57.7X
5-LUT -11.31 38.7X
6-LUT -7.90 10.1X
22
Detailed Experimental Results on Industrial
Benchmarks
CutMap   CutMap   DAOmap   DAOmap   Comparison   Comparison  
Bench marks LUT No. Run Time (s) LUT No. Run Time (s) LUT (Reduce) Run Time (Improve)
big1 9928 301 9169 93 -7.6 3.2
big2 - gt10H. 14625 708 - -
big3 10005 28926 9031 106 -9.7 272.9
big4 11800 583 9364 156 -20.6 3.7
big5 - gt10H. 32230 3377 - -
big6 39000 14437 32028 402 -17.9 35.9
Ave. -13.98 78.9X
After mapping into 5-LUTs
23
Individual Technique Analysis
Techniques dropped
Cut Enumeration  
Min-cost propagation 4.35
Global cost adjustment 2.68
Cut Selection  
Input sharing 4.55
Iterative cut selection (ICS) 2.04
Others lt1
24
Mapping Iteration Analysis
2.5
2.0
1.5
Improvement
1.0
0.5
0.0
1
2
3
4
5
6
Mapping Iterations
  • For single iteration only (the base case), use
    manual profiling Chen et al, FPGA04
  • When the iteration number is more than 3, it is
    no longer helpful

25
Conclusions and Future Work
  • We presented a new mapping algorithm, DAOmap, to
    minimize FPGA delay and area
  • We built several cost-adjustment heuristics and
    used an iterative mapping procedure
  • DAOmap gained significant amount of area and
    runtime reduction over a state-of-the-art
    algorithm CutMap
  • Future works include adding cut-pruning
    techniques for mapping with larger K values
Write a Comment
User Comments (0)
About PowerShow.com