DAOmap: A Depthoptimal Area Optimization Mapping Algorithm for FPGA Designs - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

DAOmap: A Depthoptimal Area Optimization Mapping Algorithm for FPGA Designs

Description:

Reduce potential node duplications based on previous duplication profiling ... DAOmap is implemented using C language within the UCLA RASP system ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 26
Provided by: Hidetosh9
Category:

less

Transcript and Presenter's Notes

Title: DAOmap: A Depthoptimal Area Optimization Mapping Algorithm for FPGA Designs


1
DAOmap A Depth-optimal Area Optimization Mapping
Algorithm for FPGA Designs
  • Deming Chen, Jason Cong
  • Computer Science Department
  • University of California, Los Angeles

This work is partially supported by the
California MICRO program and the NSF Grant
CCR-0306682
2
Outline
  • Introduction
  • Related Works
  • Definitions and Problem Formulation
  • Algorithm Description
  • Cut Enumeration
  • Delay and Area Propagation
  • Cost Function for a Cut
  • Global and Local Cost Adjustments
  • Cut Selection
  • Experimental Results
  • Conclusions and Future Work

3
Introduction
  • Field Programmable Gate Array (FPGA) has become
    increasingly popular
  • Fast to market
  • No or very low NRE (non-recurring expenses)
  • The LUT-based FPGA architecture dominates the
    existing programmable chip industry
  • FPGA technology mapping converts a given Boolean
    circuit into a functionally equivalent network
    comprised only of LUTs
  • FPGA technology mapping is a crucial optimization
    step in the FPGA design flow

4
Related Works on FPGA Mapping
  • Area Minimization
  • Chortle-crf, Francis, et al, DAC91
  • MIS-pga, Murgai, et al, ICCAD91
  • Praetor, Cong, et al, FPGA99
  • Anti-fuse FPGA Mapper, Kang, et al, ASPDAC04
  • Delay Minimization
  • DAG-Map, Chen, et al, DTC92
  • FlowMap, Cong, et al, ICCAD92
  • Edge-map, Yang, et al, ICCAD94
  • Power Minimization
  • PowerMinMap, Li, et al, ASPDAC03
  • Emap, Lamoureux, et al, ICCAD03
  • DVmap, Chen, et al, FPGA04
  • Simultaneous Delay and Area Minimization
  • FlowMap-r, Cong, et al, TVLSI94
  • CutMap, Cong, et al, FPGA95
  • BoolMap-D, Legl, et al, DAC96

5
Definitions
  • DAG Boolean network
  • Cone Cv sub-network rooted on node v
  • K-feasible cone input(Cv) ? K
  • Fanin Cone Fv the largest Cv
  • K-feasible cut a K-feasible Cv
  • Unit delay model
  • One LUT contributes one unit delay
  • No edge delay

PIs
a
c
b
d
e
v
6
Problem Formulation
  • Delay-optimal Area Optimization problem
  • Given a Boolean network an integer K
  • Goal cover the network with K-feasible cones
    (K-LUTs), such that
  • Optimal mapping depth
  • Area (number of LUTs) is minimized
  • NP-hard problem on area minimization

7
Highlights of Our Algorithm
  • Consider potential node duplications and make
    mapping-area estimation close to reality
  • Search solution space considering both global and
    local optimality information
  • Carry out an iterative cut selection procedure on
    top of cost adjustment to further improve
    solution quality
  • Techniques used are simple and intuitive
  • The key is the right combination of them

8
Cut Enumeration
z
w
y
x
c
a
b
d
Combine sub-cuts on the inputs of the
gate Process each gate in topological order from
PIs to POs
9
Complexity Analysis
  • The number of cuts on a node for the worst case
    is O(nK)
  • Practically, it is a small constant for small K

Both Max. and Ave. numbers are obtained averaging
over 20 largest MCNC benchmarks
10
Delay and Area Propagation
z
w
y
x
b
Optimal Delay 1 Area 1
Optimal Delay 1 Area 1
a
c
Optimal Delay 1 Area 1
d
e
g
f
Optimal Delay 2 Area 2
Propagation process visits cuts and nodes
iteratively The longest best delay on the POs is
the optimal mapping delay
11
Area Estimation
Ap
  • AC ? Ai / f(i) UC
  • i input(C)
  • Ai estimated area of the fanin cone on signal i
  • f(i) fanout number of i
  • Uc area of the cut itself
  • Try to estimate area considering fanout effect
  • Praetor, Cong, et al, FPGA99
  • Can under-estimate the area because of node
    duplications

p
n
m
o
f(p) 2
q
r
Cut C
s
X
u
t
Cut Ct
Cut Cu
12
Cost (Area) Function of a Cut
  • Some Key parameters
  • IC cutsize of C
  • NC number of nodes covered by C
  • f(v) fanout number of the root node v
  • Pf duplication cost

a
C1
c
b
C2
d
e
v
13
Duplication Cost Adjustment
  • Consider potential node duplications
  • Check the sub-cuts for multiple fanouts
  • Propagate adjusted cost globally
  • Duplication Cost
  • NCf number of nodes the subcut Cf contains
  • IC cutsize of C

p
n
m
o
q
r
Subcut Cf2 NCf2 1
Subcut Cf1
s
New cut C IC 4
Multiple fanouts
14
Cut Selection Mapping Generation
  • From POs to PIs
  • Critical paths optimal delay best area
    available
  • Non-critical paths relaxed delay better area

z
w
y
x
b
a
c
d
e
LUT roots in list L L f, g L g, e, d
L e, d L b
g
f
15
Techniques for Better Cut Selection
  • Cut selection is equivalent to the min-cover
    problem
  • Greedy approach will not work well
  • Use heuristics to guide the selection procedure
  • Iterative Cut Selection Procedure
  • Local Cost Adjustment
  • Input Sharing
  • Slack Distribution
  • Cut Probing

16
Iterative Cut Selection (ICS)
  • Some valuable information on area is unknown
    until after mapping
  • mapped LUT root nodes
  • duplicated nodes
  • ICS carries out multiple mapping iterations

17
Local Cost Adjustment Input Sharing
  • Takes advantage of existing resources
  • Considers roots from previous iterations
  • The more a cut shares inputs with others, the
    better for the cut

d
e
g
f
18
Local Cost Adjustment Slack Distribution
  • SlackC Reqv 1 MAX (Arri)
  • i ? input(C)
  • If SlackC lt 0, C is not a timing_feasible cut
  • The larger the SlackC, the better for C in terms
    of slack distribution effect

z
w
y
x
b
Largest arrival time among inputs
a
c
C
d
Reqd Required time of the root
19
Local Cost Adjustment Cut Probing
  • Probe the amount of area gain locally before
    making decisions about a cut
  • Reduce connections between LUTs
  • Reduce potential node duplications based on
    previous duplication profiling
  • Reconvergent paths handling

Use Cfinal to guide cut selection
20
Experimental Results Settings
  • DAOmap is implemented using C language within the
    UCLA RASP system
  • We compare LUT counts and runtime to CutMap
    Cong, FPGA95, a state-of-the-art delay-optimal
    area minimization algorithm
  • Run on a 750 MHz SunBlade 1000 Solaris machine
  • Use the largest 20 MCNC benchmarks and a set of
    industrial benchmarks
  • Test on LUT input numbers from 4 to 6

21
Experimental Results of DAOmap over CutMap on
MCNC Benchmarks
After mapping
After mapping packing (cutmap x mpack)
22
Detailed Experimental Results on Industrial
Benchmarks
After mapping into 5-LUTs
23
Individual Technique Analysis
24
Mapping Iteration Analysis
2.5
2.0
1.5
Improvement
1.0
0.5
0.0
1
2
3
4
5
6
Mapping Iterations
  • For single iteration only (the base case), use
    manual profiling Chen, FPGA04
  • When the iteration number is more than 3, it is
    no longer helpful

25
Conclusions and Future Work
  • We presented a new mapping algorithm, DAOmap, to
    minimize FPGA delay and area
  • We built novel cost-adjustment heuristics and
    used an iterative mapping procedure
  • DAOmap gained significant amount of area and
    runtime reduction over a state-of-the-art
    algorithm CutMap
  • Future works include adding cut-pruning
    techniques for mapping with larger K values
Write a Comment
User Comments (0)
About PowerShow.com