Lokesh Subramany - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Lokesh Subramany

Description:

... consisting of v and some of its predecessors, such that for any node w in Ov, ... Fanin cone:- The maximum cone of v, consisting of all PI predecessors of v ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 29
Provided by: lokeshsu
Category:

less

Transcript and Presenter's Notes

Title: Lokesh Subramany


1
Depth Optimal Area Optimization Mapping
  • By
  • Lokesh Subramany
  • Stu 23789289

2
Outline
  • Introduction to Tech mapping
  • FPGA Architecture
  • Some definitions
  • Problem definition
  • Alternate approaches
  • Algorithmic description
  • Enhancements to the basic algorithm
  • Results
  • Conclusion

3
Introduction
  • Need for FPGAs- The short design windows,
    changing requirements and cost favor the use of
    an FPGA for emulation of logic systems
  • What is FPGA tech mapping- Converting a given
    boolean circuit into a functionally equivalent
    network comprising only of LUTs
  • Role of tech mapping- It is the actual gate
    choice to implement the equations for example,
    choosing the fastest gates along the critical
    path and using the most area efficient
    combination of gates off the critical path.

4
FPGA structure
  • The BLE consists of a K input Look up table.
  • Each LUT produces a single output
  • We can obtain sequential circuits by utilizing
    the D flip flop
  • Combinational circuits can be obtained by
    directly connecting the output of the LUT to the
    output buffer.

5
Logic cluster
  • The logic cluster is made up of N BLEs.
  • This is obtained after packing the BLEs.

6
General Definition of the mapping problem
  • The tech mapping problem is viewed as the
    optimization problem of finding a minimum cost
    covering of the subject graph by choosing from
    the collection of pattern graphs created for all
    gates in the library.
  • A cover is a collection of pattern graphs such
    that every node of the subject graph is contained
    in one or more of the pattern graphs
  • Area optimization The cost of the cover is
    defined as the sum of the areas of the individual
    gates
  • Delay optimization The cost of the cover is
    defined as the critical path delay of the
    resulting circuit using an appropriate delay
    model.
  • Minimum area under timing constraint - A cover
    which results in a circuit with critical path
    delay greater than that allowed for any output is
    considered illegal.

7
Some Definitions and notations
  • PI (Primary input)- A node that does not have
    any incoming edges
  • PO (Primary output)- A node that does not have
    any outgoing edges.
  • Cone (Ov)- A subnetwork of the original network,
    consisting of v and some of its predecessors,
    such that for any node w in Ov, there is a path
    from w to v in Ov.
  • Fanin cone- The maximum cone of v, consisting of
    all PI predecessors of v
  • Input(Ov)- Denotes the set of distinct nodes
    outside Ov which supply inputs to the gates in
    Ov.
  • Cut- It is a partitioning (X,X) of a cone Ov
    such that X is a cone of v.
  • Cut-set- It is represented as V(X,X), and
    consists of input(X)

8
More definitions
  • Cutsize- It is the cardinality of the cut-set. A
    cut is said to be K-feasible if the cutsize is
    ltK
  • Level- The level of a node v is the length of
    the longest path from any PI to the node v.
  • Depth- The depth of a network is the largest
    node level in the network.
  • L-bounded- A boolean network is l-bounded if
    input(v) lt l for each node v.
  • Unit delay model- Each interconnection edge in
    the boolean network is assumed to have a constant
    delay, which translates to each LUT on the
    critical path contributing one unit delay.
  • Mapping Depth- The largest optimal delay of the
    mapped circuit.

9
Problem Formulation
  • The mapping problem is to cover a given l-bounded
    Boolean network with K-feasible cones (K LUTs)
    such that the total LUT count after mapping is
    minimized while the optimal mapping depth is
    guaranteed under the unit delay model

10
Alternate approaches
  • Area Minimization- Chortle-crf, MIS-pga, XMap,
    VisMap, TechMap and Praetor
  • Delay Minimization- Chortle-d, MIS-pga-delay,
    TechMap-L, DAG-map, Flowmap
  • Power Minimization- PowerMap, PowerMinMap, Emap
  • Delay and Area minimization- FlowMap-r, Cutmap
  • FlowMap-r starts with depth optimal mapping
    solution and applies depth relaxation techniques
    such as remapping and node packing for non
    critical paths.
  • CutMap combines depth and area minimization
    during the mapping process by computing min-cost
    min-height K-feasible cuts for non-critical nodes
    using the network flow method. Cut Map is widely
    used for various FPGA evaluation and design flows.

11
Algorithm Methodology
  • Cut enumeration based method consisting of cut
    generation and cut selection
  • Cut generation traverses the network from the PI
    to the PO.
  • The subcuts on the fanin nodes of the target node
    are combined to generate all the cuts on the
    target node. Here each cut represents one
    possible LUT implementation rooted on the target
    node.
  • After the cuts are generated, the network is
    traversed from the PO to the PI, and the cuts are
    selected to produce the LUT mapping result

12
Cut Enumeration
  • Cut enumeration means generating all K-feasible
    cuts of a cone for a given node
  • A cut rooted on node v can be represented using a
    product term (or a p-term) of the variables
    associated with the nodes in the cut-set V(Xv,
    Xv). A set of cuts can be represented by a
    sum-of-product expression using the corresponding
    p-terms. Cut enumeration is guided by the
    following theorem 6
  • where f(K, v) represents all the K-feasible
    cuts rooted at node v, operator is Boolean OR,
    and K is Boolean AND on its operands, but
    filtering out all the resulting p-terms with more
    than K variables.

13
Cut enumeration continued
  • In the example below, all the cuts rooted on node
    s can be generated by combining the cuts rooted
    on its fanin nodes q and r. The cuts on the
    fanin nodes are called subcuts. Combining C1
    with C2 will form a new cut Cs m, n, o, p
    rooted on s. If the input of the new cut exceeds
    K, the cut is discarded.

14
Calculating arrival time
  • The arrival time propagates through each of the
    cut, and each cut represents a LUT and hence a
    unit delay. The minimum arrival time at a node v
    is
  • where C represents every cut generated for v
    through cut enumeration. Arri is the minimum
    arrival time on input signal i of C.
  • The cut C that produces Arrv is called MCv for
    node v and these MCv s form a set Xv. The minimum
    arrival time for each node is propagated to the
    Pos from the PIs through the cuts
  • The longest minimum arrival time of the POs is
    the minimum arrival time of the circuit, i.e the
    optimal mapping depth of the circuit

15
Area Propagation
  • Similar to the arrival time, the area can also be
    propagated. The area is calculated as
  • Where Uc is the area contributed by the cut C, Ai
    is the estimated area of the cone rooted on
    signal i and f(i) is the fanout number of signal
    i. That means that the area on i is shared and
    distributed into other fanout nodes of i.
  • This process calculates the area more accurately
    by taking into consideration the effects of gate
    fanouts.

16
Area propagation under Timing constraints
  • To guarantee optimal mapping depth, we need to
    propagate the estimated area together with the
    minimum arrival time
  • The best propagated area in the fanin cone Fv is
  • Av represents the best achievable area under the
    constraint that it also generates the optimal
    mapping delay upto the point of v
  • With these formulae, the areas of cuts and nodes
    are iteratively calculated until the enumeration
    process reaches the POs.
  • Later on during the cut selection process when we
    know that v is not on a critical path, a cut C
    not belonging to Xv can be chosen as long as it
    does not violate the timing constraint.

17
Cost function for a cut
  • We need to keep the following points in mind
    while obtaining a cost function
  • Using a fixed area for a cut will not accurately
    reflect the property of the cut
  • W need to take into consideration the number of
    re-convergent paths covered by a cut, as this
    affects the amount of logic covered
  • The third factor is the fan-out number of the
    root nod. The larger the fan-out, the larger the
    possibility that picking this cut will reduce
    potential duplications
  • cuts of different sizes have different areas.

18
Cost function example
  • In the example above C1 and C2 have the same
    cutsize, but C2 is better
  • C2 covers two sets of reconvergent paths
  • Having a cut rooted at node 5 will reduce
    potential duplications

19
Formula for area
  • The cost of a cut is represented as
  • Where, Ic is the cutsize of C, Nc is the number
    of nodes covered by C,f(v) is the fanout number
    of the root node Rc is the number of reconvergent
    paths completely covered by C, a and ß are
    positive constants (a0.8, ß0.4).
  • The smaller the value of Uc, the better the cost
    of the circuit.

20
Cost adjustment for Global duplication
  • From the example, if Cs is used to implement a
    LUT and there is no duplication, the area rooted
    on node s, should be equally shared by t and u.
    Otherwise the area will be falsely double counted
  • But if the final mapping uses Ct and Cu, this
    estimation is not accurate as Cu treats the node
    as not duplicated but s is actually duplicated in
    Ct. We need to compensate for this effect.

21
Cut selection
  • After cut enumeration, we obtain the optimal
    mapping depth of the network. This is set as the
    required time for the network. The critical path
    is the path that leads to this mapping depth. The
    nodes on the non critical path have the luxury of
    selecting different cuts that offer smaller cost
    with a relaxed delay value as long as the
    required time of the circuit is maintained
  • The following enhancements are added to the basic
    algorithm
  • Iterative cut selection procedure
  • This procedure produces the final mapping based
    on the previous cut selection iterations. A
    previous iteration can be considered as a
    tentative mapping that provides guidance for the
    next iteration.
  • The profiling information includes the LUT roots
    in the mapping solution of the previous iteration

22
Algorithm
  • Update_profiling_info updates information about
    the nodes
  • Update_req_time updates required time of the
    input nodes
  • Pick_cut will us profiling data to update cost
    of the node in each iteration

23
Cut selection continued
  • Local cost adjustment
  • To map a critical node v, only the cut that
    provides Av is picked to implement the LUT to
    guaranty the optimal mapping depth.
  • Input sharing while picking a cut, we see if
    some of the cut set nodes are already LUT roots.
    If so then this node is shared among several
    mapped LUTs.
  • Slack distribution
  • Slackv Reqv Arrv
  • We distribute slack along the edges of the
    entire paths to encourage more nodes on the paths
    to have moreflexibility.
  • Cut Probing Looking at cuts from other
    approaches. For ex C3 in the example reduces the
    fanout of gate 3 to 1. Also When we pick node 6
    as the root, we have two reconvergent paths in
    the network. This eliminates gates 1 and 3 being
    duplicated.

24
Results
  • With DAO map, the researchers have obtained
    better area values with a lower runtime, when
    compared to CutMap.
  • The magnitude of the difference in runtime
    reduces when moving from a 4-LUT to a 5 LUT, due
    to an increase in the number of cuts generated
    per node.
  • The authors also demonstrate the scalability of
    the algorithm, by using it to map a few large
    industrial benchmarks. In some cases, CutMap was
    not able to map the circuits even after 10 hours,
    while DAO Map did. The runtime was two orders of
    magnitude better.

25
Impact of various techniques
  • The impact of the various techniques used, on the
    final area values is shown here. dropped refers
    to the drop in the quality of placement in terms
    of area, when the particular optimization is not
    used

26
Continued
  • Input sharing proves to be the most important
    technique to reduce area because it reduces the
    number of edges and node duplications
  • The mincost propagation is trying to evaluate how
    accurate our cost estimation model is.
  • Global duplication cost adjustment offers the
    next largest gain, which shows that duplication
    of nodes adds to the area cost

27
References
  • 1 Cluster-Based Logic Blocks for FPGAs Area-
    Ef?ciency vs. Input Sharing and Size, Vaughn Betz
    and Jonathan Rose
  • 2 DAOmap A Depth-optimal Area Optimization
    Mapping Algorithm for FPGA Designs, Deming Chen,
    Jason Cong
  • 3 J. Cong, C. Wu, and E. Ding, Cut Ranking and
    Pruning Enabling A General and Efficient FPGA
    Mapping Solution, FPGA, Feb. 1999.

28
Questions
Write a Comment
User Comments (0)
About PowerShow.com