Lokesh Subramany - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Lokesh Subramany

Description:

... consisting of v and some of its predecessors, such that for any node w in Ov, ... Fanin cone:- The maximum cone of v, consisting of all PI predecessors of v ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 29

Provided by: lokeshsu

Category:

more less

Transcript and Presenter's Notes

Title: Lokesh Subramany

1
Depth Optimal Area Optimization Mapping

By
Lokesh Subramany
Stu 23789289

2
Outline

Introduction to Tech mapping
FPGA Architecture
Some definitions
Problem definition
Alternate approaches
Algorithmic description
Enhancements to the basic algorithm
Results
Conclusion

3
Introduction

Need for FPGAs- The short design windows,
changing requirements and cost favor the use of
an FPGA for emulation of logic systems
What is FPGA tech mapping- Converting a given
boolean circuit into a functionally equivalent
network comprising only of LUTs
Role of tech mapping- It is the actual gate
choice to implement the equations for example,
choosing the fastest gates along the critical
path and using the most area efficient
combination of gates off the critical path.

4
FPGA structure

The BLE consists of a K input Look up table.
Each LUT produces a single output
We can obtain sequential circuits by utilizing
the D flip flop
Combinational circuits can be obtained by
directly connecting the output of the LUT to the
output buffer.

5
Logic cluster

The logic cluster is made up of N BLEs.
This is obtained after packing the BLEs.

6
General Definition of the mapping problem

The tech mapping problem is viewed as the
optimization problem of finding a minimum cost
covering of the subject graph by choosing from
the collection of pattern graphs created for all
gates in the library.
A cover is a collection of pattern graphs such
that every node of the subject graph is contained
in one or more of the pattern graphs
Area optimization The cost of the cover is
defined as the sum of the areas of the individual
gates
Delay optimization The cost of the cover is
defined as the critical path delay of the
resulting circuit using an appropriate delay
model.
Minimum area under timing constraint - A cover
which results in a circuit with critical path
delay greater than that allowed for any output is
considered illegal.

7
Some Definitions and notations

PI (Primary input)- A node that does not have
any incoming edges
PO (Primary output)- A node that does not have
any outgoing edges.
Cone (Ov)- A subnetwork of the original network,
consisting of v and some of its predecessors,
such that for any node w in Ov, there is a path
from w to v in Ov.
Fanin cone- The maximum cone of v, consisting of
all PI predecessors of v
Input(Ov)- Denotes the set of distinct nodes
outside Ov which supply inputs to the gates in
Ov.
Cut- It is a partitioning (X,X) of a cone Ov
such that X is a cone of v.
Cut-set- It is represented as V(X,X), and
consists of input(X)

8
More definitions

Cutsize- It is the cardinality of the cut-set. A
cut is said to be K-feasible if the cutsize is
ltK
Level- The level of a node v is the length of
the longest path from any PI to the node v.
Depth- The depth of a network is the largest
node level in the network.
L-bounded- A boolean network is l-bounded if
input(v) lt l for each node v.
Unit delay model- Each interconnection edge in
the boolean network is assumed to have a constant
delay, which translates to each LUT on the
critical path contributing one unit delay.
Mapping Depth- The largest optimal delay of the
mapped circuit.

9
Problem Formulation

The mapping problem is to cover a given l-bounded
Boolean network with K-feasible cones (K LUTs)
such that the total LUT count after mapping is
minimized while the optimal mapping depth is
guaranteed under the unit delay model

10
Alternate approaches

Area Minimization- Chortle-crf, MIS-pga, XMap,
VisMap, TechMap and Praetor
Delay Minimization- Chortle-d, MIS-pga-delay,
TechMap-L, DAG-map, Flowmap
Power Minimization- PowerMap, PowerMinMap, Emap
Delay and Area minimization- FlowMap-r, Cutmap
FlowMap-r starts with depth optimal mapping
solution and applies depth relaxation techniques
such as remapping and node packing for non
critical paths.
CutMap combines depth and area minimization
during the mapping process by computing min-cost
min-height K-feasible cuts for non-critical nodes
using the network flow method. Cut Map is widely
used for various FPGA evaluation and design flows.

11
Algorithm Methodology

Cut enumeration based method consisting of cut
generation and cut selection
Cut generation traverses the network from the PI
to the PO.
The subcuts on the fanin nodes of the target node
are combined to generate all the cuts on the
target node. Here each cut represents one
possible LUT implementation rooted on the target
node.
After the cuts are generated, the network is
traversed from the PO to the PI, and the cuts are
selected to produce the LUT mapping result

12
Cut Enumeration

Cut enumeration means generating all K-feasible
cuts of a cone for a given node
A cut rooted on node v can be represented using a
product term (or a p-term) of the variables
associated with the nodes in the cut-set V(Xv,
Xv). A set of cuts can be represented by a
sum-of-product expression using the corresponding
p-terms. Cut enumeration is guided by the
following theorem 6
where f(K, v) represents all the K-feasible
cuts rooted at node v, operator is Boolean OR,
and K is Boolean AND on its operands, but
filtering out all the resulting p-terms with more
than K variables.

13
Cut enumeration continued

In the example below, all the cuts rooted on node
s can be generated by combining the cuts rooted
on its fanin nodes q and r. The cuts on the
fanin nodes are called subcuts. Combining C1
with C2 will form a new cut Cs m, n, o, p
rooted on s. If the input of the new cut exceeds
K, the cut is discarded.

14
Calculating arrival time

The arrival time propagates through each of the
cut, and each cut represents a LUT and hence a
unit delay. The minimum arrival time at a node v
is
where C represents every cut generated for v
through cut enumeration. Arri is the minimum
arrival time on input signal i of C.
The cut C that produces Arrv is called MCv for
node v and these MCv s form a set Xv. The minimum
arrival time for each node is propagated to the
Pos from the PIs through the cuts
The longest minimum arrival time of the POs is
the minimum arrival time of the circuit, i.e the
optimal mapping depth of the circuit

15
Area Propagation

Similar to the arrival time, the area can also be
propagated. The area is calculated as
Where Uc is the area contributed by the cut C, Ai
is the estimated area of the cone rooted on
signal i and f(i) is the fanout number of signal
i. That means that the area on i is shared and
distributed into other fanout nodes of i.
This process calculates the area more accurately
by taking into consideration the effects of gate
fanouts.

16
Area propagation under Timing constraints

To guarantee optimal mapping depth, we need to
propagate the estimated area together with the
minimum arrival time
The best propagated area in the fanin cone Fv is
Av represents the best achievable area under the
constraint that it also generates the optimal
mapping delay upto the point of v
With these formulae, the areas of cuts and nodes
are iteratively calculated until the enumeration
process reaches the POs.
Later on during the cut selection process when we
know that v is not on a critical path, a cut C
not belonging to Xv can be chosen as long as it
does not violate the timing constraint.

17
Cost function for a cut

We need to keep the following points in mind
while obtaining a cost function
Using a fixed area for a cut will not accurately
reflect the property of the cut
W need to take into consideration the number of
re-convergent paths covered by a cut, as this
affects the amount of logic covered
The third factor is the fan-out number of the
root nod. The larger the fan-out, the larger the
possibility that picking this cut will reduce
potential duplications
cuts of different sizes have different areas.

18
Cost function example

In the example above C1 and C2 have the same
cutsize, but C2 is better
C2 covers two sets of reconvergent paths
Having a cut rooted at node 5 will reduce
potential duplications

19
Formula for area

The cost of a cut is represented as
Where, Ic is the cutsize of C, Nc is the number
of nodes covered by C,f(v) is the fanout number
of the root node Rc is the number of reconvergent
paths completely covered by C, a and ß are
positive constants (a0.8, ß0.4).
The smaller the value of Uc, the better the cost
of the circuit.

20
Cost adjustment for Global duplication

From the example, if Cs is used to implement a
LUT and there is no duplication, the area rooted
on node s, should be equally shared by t and u.
Otherwise the area will be falsely double counted
But if the final mapping uses Ct and Cu, this
estimation is not accurate as Cu treats the node
as not duplicated but s is actually duplicated in
Ct. We need to compensate for this effect.

21
Cut selection

After cut enumeration, we obtain the optimal
mapping depth of the network. This is set as the
required time for the network. The critical path
is the path that leads to this mapping depth. The
nodes on the non critical path have the luxury of
selecting different cuts that offer smaller cost
with a relaxed delay value as long as the
required time of the circuit is maintained
The following enhancements are added to the basic
algorithm
Iterative cut selection procedure
This procedure produces the final mapping based
on the previous cut selection iterations. A
previous iteration can be considered as a
tentative mapping that provides guidance for the
next iteration.
The profiling information includes the LUT roots
in the mapping solution of the previous iteration

22
Algorithm

Update_profiling_info updates information about
the nodes
Update_req_time updates required time of the
input nodes
Pick_cut will us profiling data to update cost
of the node in each iteration

23
Cut selection continued

Local cost adjustment
To map a critical node v, only the cut that
provides Av is picked to implement the LUT to
guaranty the optimal mapping depth.
Input sharing while picking a cut, we see if
some of the cut set nodes are already LUT roots.
If so then this node is shared among several
mapped LUTs.
Slack distribution
Slackv Reqv Arrv
We distribute slack along the edges of the
entire paths to encourage more nodes on the paths
to have moreflexibility.
Cut Probing Looking at cuts from other
approaches. For ex C3 in the example reduces the
fanout of gate 3 to 1. Also When we pick node 6
as the root, we have two reconvergent paths in
the network. This eliminates gates 1 and 3 being
duplicated.

24
Results

With DAO map, the researchers have obtained
better area values with a lower runtime, when
compared to CutMap.
The magnitude of the difference in runtime
reduces when moving from a 4-LUT to a 5 LUT, due
to an increase in the number of cuts generated
per node.
The authors also demonstrate the scalability of
the algorithm, by using it to map a few large
industrial benchmarks. In some cases, CutMap was
not able to map the circuits even after 10 hours,
while DAO Map did. The runtime was two orders of
magnitude better.

25
Impact of various techniques

The impact of the various techniques used, on the
final area values is shown here. dropped refers
to the drop in the quality of placement in terms
of area, when the particular optimization is not
used

26
Continued

Input sharing proves to be the most important
technique to reduce area because it reduces the
number of edges and node duplications
The mincost propagation is trying to evaluate how
accurate our cost estimation model is.
Global duplication cost adjustment offers the
next largest gain, which shows that duplication
of nodes adds to the area cost

27
References

1 Cluster-Based Logic Blocks for FPGAs Area-
Ef?ciency vs. Input Sharing and Size, Vaughn Betz
and Jonathan Rose
2 DAOmap A Depth-optimal Area Optimization
Mapping Algorithm for FPGA Designs, Deming Chen,
Jason Cong
3 J. Cong, C. Wu, and E. Ding, Cut Ranking and
Pruning Enabling A General and Efficient FPGA
Mapping Solution, FPGA, Feb. 1999.

28
Questions

Write a Comment

User Comments (0)