July 6, 2006 - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

July 6, 2006

Description:

Bounding Box Theorem. We find a theorem to greatly reduce buffering/sizing candidates. ... Our tool beat all competitors with the same subject in the CAD contest '05. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 42
Provided by: jar90
Category:
Tags: beat | box | july | my

less

Transcript and Presenter's Notes

Title: July 6, 2006


1
???????????????????????????????????????
???? ?????
ECO Timing Optimization Using Spare Cells and
Technology Remapping
  • July 6, 2006

2
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Experimental results
  • Conclusions

3
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Experimental results
  • Conclusions

4
Introduction
  • ECO (Engineering Change Order) is usually
    performed during the chip implementation cycle.
  • Change the design incrementally.
  • When performing ECO to a placed design, change a
    small portion of netlist to
  • optimize the chip timing.
  • Functionality is unchanged.
  • change chip functions.
  • Logic bugs.
  • New versions.

5
Netlist Change Using Spare Cells
  • Spare cells are designed for design changes after
    placement, and they are distributed evenly on the
    chip layout.
  • Using spare cells is an efficient way to do
    netlist changes.
  • Save time and effort of re-placing the netlist
  • Save production cost of masks
  • It is getting more and more difficult in the
    nanometer technology.
  • Circuit size is increasing substantially.
  • Timing issues are hard to be considered when
    changing netlist locally.

6
Problem Formulation
  • Given a placed chip layout,
  • rewire the circuit using spare cells. There are
    several techniques
  • gate sizing
  • buffer insertion
  • technology mapping
  • shorten the delays and minimize the total
    negative slack of all ECO timing paths.

slack -0.7
slack 0.0
slack -0.5
slack 0.0
before
after
7
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Experimental results
  • Conclusions

8
Dynamic Programming
  • Buffer insertion to a single net.
  • van Ginneken et al. proposed a dynamic
    programming framework for slack optimal buffer
    insertion to a net.

b3
Load
gT2
Load
b2
RAT
gS
RAT
b1
Load
Load
Load
b4
gT3
RAT
RAT
RAT
gT1
9
Path Based Buffer Insertion
  • Shi et al. proposed a dynamic programming method
    to perform buffer insertion and gate sizing to a
    path by
  • Cut the timing violated paths into distinct paths
  • View the gates on the path as special type
    buffers and merge the whole path into a big
    routing tree.
  • Perform gate sizing and buffer insertion
    simultaneously to the routing tree.

Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
10
Logic Physical Co-synthesis
  • Layout driven technology mapping
  • Proposed by Stok et al.
  • Place the base gates as an initial placement.
  • Map the base gates using the coordinates as cost.
  • Local netlist transformation
  • Proposed by Lou et al.
  • Identify parts of the placed netlist that violate
    some target cost.
  • Extract those critical parts from the chip
    placement.
  • Re-synthesis and re-place the extracted netlist
    according to the target cost.

11
Timing Model
  • Synopsys Liberty library format
  • Use lookup table to calculate gate delays.
  • The gate delay and the output transition time are
    functions of the output loading and the input
    transition time.

Input Transition Time
Output capacitive loading
12
Timing Model (contd)
  • Output loading consists of
  • input pin capacitance
  • output pin capacitance
  • wire loading
  • FIs the amount of capacitance per unit wirelengh.

13
Properties of The Timing Model
  • Loading dominance
  • Output loading has a larger effect on gate delay
    and output transition time than input transition
    time. (6.74x vs 1.48x)
  • Shielding
  • Change of the netlist effects delay of neighbor
    gates only.

gk
gj
gi
gk
gi
14
Properties of The Timing Model (cont)
  • A buffer chain with the same type BUFX1

Input slope
Output slope
delay
output slope
15
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Overview
  • Tracing ECO paths
  • Dynamic cost programming
  • Example
  • Timing complexity analysis
  • Technology remapping
  • Experimental results
  • Conclusions

16
Optimization Flow
  • Iterate the optimization loop until the total
    negative slack reaches zero or no path can be
    improved.

Extension
17
Tracing ECO paths
  • When doing STA (static timing analysis),
  • store a pointer at each gate to point one of its
    fan-ins with the largest arrival time.
  • Obtain the ECO path
  • Trace this pointer from the end-point of the path
    to the corresponding start-point.

Start point
End point
18
Dynamic Cost Programming (DCP)
  • Dynamic programming framework with dynamic cost
    (3 steps)
  • View the gate as a special type buffer and
    merge the whole ECO path as a big routing tree.
  • Perform gate sizing and buffer insertion
    simultaneously from the end-point to the
    start-point.
  • Perform one buffer insertion operation for each
    net and one gate sizing operation for each gate.

Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
19
Dynamic Cost
  • Unlike the traditional buffer insertion problem,
    the buffering/sizing cost is dynamic because
  • all spare cells are candidates for
    buffering/sizing.
  • number of spare cells are changing during the
    optimization process.
  • Optimum solutions of sub-problems do not
    necessarily result in the optimum one of the
    overall problem.
  • Need to store a set of solutions for each
    gate/net.

b1
ECO path 1
inserted buffer
S2
S3
b2
S1No buffer insertion
1
S1
S2Insert buffer b1
0
ECO path 2
S3Insert buffer b2
Path delay
20
Solution Propagation during DCP
  • Store each solution as a point on a plane if it
    shortens the ECO timing path delays.
  • The two coordinates are
  • inserted buffer
  • approximated sub-path delays from the current
    gate to the end point of the path.
  • Sized gates are not counted.
  • Estimate the effect of operations without
    actually applying them.
  • Generate solutions based on the solutions of the
    driven gate/net.

inserted buffer
inserted buffer
b1
S2
S3
S2
S3
S5
S6
1
g1
S1
1
0
S1
S4
0
Path delay
b2
g2
Path delay
21
Judgment of Operations
  • The timing effect of a sizing/buffering operation
    can be estimated by its effect on its fanins.
  • Buffer insertion operaion to net ni
  • If delay(source of ni)delay(buffer)ltdelay(source
    of ni), store the solutions corresponding to the
    operation.
  • Gate sizing operation to gate gi
  • If delay(spare cell)ltdelay(gi) and If
    delay(fanin of gi)lt delay(fanin of gi), store
    the solutions corresponding to the operation.
  • Timing of non-ECO paths are preserved after
    optimization.

Net ni
gi
Buffer insertion
Gate sizing
22
Bounding Box Theorem
  • We find a theorem to greatly reduce
    buffering/sizing candidates.
  • Assumption
  • Gate delays are independent of the input
    transition time.
  • The driving capabilities of the sized gate and
    the sizing spare cell are the same.

23
widthdis(gE1,gE2)dis(gE1,gE3)(CEi1CEi2 )/F,
center gE1
gE2
nE1
gE1
gE3
24
Bounding Box Theorem

25
Bounding polygon
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F,
center gE2
widthdis(gE1,gE4) (CEi1)/F, center gE4
gE2
gE1
gE4
gE3
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F ,
center gE3
26
Solution Pruning during DCP
  • For each set of solutions, we keep at most k
    solutions. (k is a user-defined parameter)
  • Discard non-dominant solutions.
  • Classify these solutions by the number of used
    buffers.
  • Keep the best solutions for each
    class.

inserted buffer
3
1
2
1
1
0
Path delay
0
0
27
End of DCP
  • At the start point of the ECO path, choose the
    solution which
  • meets the timing constraint
  • uses the least number of buffers
  • Change netlist according to the solution
  • Run STA to update the timing information.

inserted buffer
3
Start point
2
End point
1
0
Path delay
clock cycle
28
An Example for Complex ECO Paths
buffer type spare cell
Path Source Target Negative slack
P1 S1-T1
P2 S1-T2 medium
P3 S2-T3 small
gate type spare cell
large
zero
T1
small
zero
S2
zero
S1
P1
P1
Slack
P2
P2
P2
P2
P3
P3
P2
T2
FINISH
0
T3
LIST
29
Timing Complexity Analysis of phase 1
  • Parameters
  • Gate count V
  • spare cells N
  • iterations of DCP L
  • Max gates of ECO path M
  • Keep at most k solutions per operation
  • Complexity of DCPO(kMN)
  • Complexity of STAO(V)
  • Complexity of phase 1O( (kMNV)L )

30
Extension Technology Remapping
  • After DCP, we can further improve the circuit
    timing by following steps
  • Identify timing critical parts of the netlist.
  • Extract those parts from the netlist.
  • Re-synthesize and map the extracted netlist.
  • Decomposition by MVSIS
  • Ideal mapping locations
  • Technology mapping
  • Run STA to update the timing information.

31
Optimal Buffering to a Line
  • The optimal buffering to a line is to insert
    buffers with equal distance
  • No gate drives a too large loading.

Optimal buffering
Non-optimal buffering
32
Ideal Mapping Locations
  • Given locations of the input and output pins, map
    the base gates evenly between the input and
    output pins.
  • No gate drives a too large loading, and the path
    delay is smaller. (Delay is proportional to
    square of wirelength)
  • Makes buffer insertion easier.

inserted buffers
delay
Input A
Output
Input B
Input A
Output
Input B
33
Calculating Ideal Mapping Locations
  • From each path from one input pin to one output
    pin, calculate ideal locations of every passed
    base gate by equal distance.
  • If a base gate has more than one ideal location,
    average these values and get a final ideal
    location.

Input A
Output
Input B
Input A
Output
Input B
34
Technology Mapping
  • Consider actual locations of spare cells as
    costs.
  • Cut the network into trees.
  • Apply dynamic programming method to map each
    tree.
  • Locations of mapped base gates are locations of
    corresponding spare cells.
  • Locations of unmapped base gates are ideal
    locations of base gates.
  • Insert buffers into mapped circuit to further
    improve timing.

Input A
Output
Input B
35
Maximum Independent Set
  • For choosing global optimum solution of the
    technology remapping, we store a set of match
    solutions for each tree and use MIS to find the
    best assignments.

Tree T2
Tree T1
g1
M2_2
M1_2
M2_3
g5
M1_1
g4
M2_1
M3_2
Tree T3
g2
g3
g6
M3_1
36
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Experimental results
  • Conclusions

37
Experimental Results
  • The five benchmarks are industrial designs.
  • Our tool is run on Linux workstation with 3.2Ghz
    CPU and 3GB memory.


38
Experimental Results (contd)
  • Our tool beat all competitors with the same
    subject in the CAD contest 05.
  • We compare the results of our algorithm with
  • the case without the aid of the bounding box
    theorem.
  • a greedy wire cost heuristic.

39
Experimental Results (contd)
  • Layout of Case 2

Before optimization
After optimization
40
Outline
  • Introduction problem formulation
  • Previous work and preliminaries
  • Algorithm
  • Experimental results
  • Conclusions

41
Conclusions
  • We proposed a dynamic programming method
    considering dynamic cost to solve the ECO timing
    optimization problem.
  • Functional change considering timing is a tougher
    work, and we will extend our work in this
    direction.
Write a Comment
User Comments (0)
About PowerShow.com