July 6, 2006 - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

July 6, 2006

Description:

Bounding Box Theorem. We find a theorem to greatly reduce buffering/sizing candidates. ... Our tool beat all competitors with the same subject in the CAD contest '05. ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 42

Provided by: jar90

Category:

more less

Transcript and Presenter's Notes

Title: July 6, 2006

1
???????????????????????????????????????
???? ?????
ECO Timing Optimization Using Spare Cells and
Technology Remapping

July 6, 2006

2
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Experimental results
Conclusions

3
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Experimental results
Conclusions

4
Introduction

ECO (Engineering Change Order) is usually
performed during the chip implementation cycle.
Change the design incrementally.
When performing ECO to a placed design, change a
small portion of netlist to
optimize the chip timing.
Functionality is unchanged.
change chip functions.
Logic bugs.
New versions.

5
Netlist Change Using Spare Cells

Spare cells are designed for design changes after
placement, and they are distributed evenly on the
chip layout.
Using spare cells is an efficient way to do
netlist changes.
Save time and effort of re-placing the netlist
Save production cost of masks
It is getting more and more difficult in the
nanometer technology.
Circuit size is increasing substantially.
Timing issues are hard to be considered when
changing netlist locally.

6
Problem Formulation

Given a placed chip layout,
rewire the circuit using spare cells. There are
several techniques
gate sizing
buffer insertion
technology mapping
shorten the delays and minimize the total
negative slack of all ECO timing paths.

slack -0.7
slack 0.0
slack -0.5
slack 0.0
before
after
7
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Experimental results
Conclusions

8
Dynamic Programming

Buffer insertion to a single net.
van Ginneken et al. proposed a dynamic
programming framework for slack optimal buffer
insertion to a net.

b3
Load
gT2
Load
b2
RAT
gS
RAT
b1
Load
Load
Load
b4
gT3
RAT
RAT
RAT
gT1
9
Path Based Buffer Insertion

Shi et al. proposed a dynamic programming method
to perform buffer insertion and gate sizing to a
path by
Cut the timing violated paths into distinct paths
View the gates on the path as special type
buffers and merge the whole path into a big
routing tree.
Perform gate sizing and buffer insertion
simultaneously to the routing tree.

Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
10
Logic Physical Co-synthesis

Layout driven technology mapping
Proposed by Stok et al.
Place the base gates as an initial placement.
Map the base gates using the coordinates as cost.
Local netlist transformation
Proposed by Lou et al.
Identify parts of the placed netlist that violate
some target cost.
Extract those critical parts from the chip
placement.
Re-synthesis and re-place the extracted netlist
according to the target cost.

11
Timing Model

Synopsys Liberty library format
Use lookup table to calculate gate delays.
The gate delay and the output transition time are
functions of the output loading and the input
transition time.

Input Transition Time
Output capacitive loading
12
Timing Model (contd)

Output loading consists of
input pin capacitance
output pin capacitance
wire loading
FIs the amount of capacitance per unit wirelengh.

13
Properties of The Timing Model

Loading dominance
Output loading has a larger effect on gate delay
and output transition time than input transition
time. (6.74x vs 1.48x)
Shielding
Change of the netlist effects delay of neighbor
gates only.

gk
gj
gi
gk
gi
14
Properties of The Timing Model (cont)

A buffer chain with the same type BUFX1

Input slope
Output slope
delay
output slope
15
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Overview
Tracing ECO paths
Dynamic cost programming
Example
Timing complexity analysis
Technology remapping
Experimental results
Conclusions

16
Optimization Flow

Iterate the optimization loop until the total
negative slack reaches zero or no path can be
improved.

Extension
17
Tracing ECO paths

When doing STA (static timing analysis),
store a pointer at each gate to point one of its
fan-ins with the largest arrival time.
Obtain the ECO path
Trace this pointer from the end-point of the path
to the corresponding start-point.

Start point
End point
18
Dynamic Cost Programming (DCP)

Dynamic programming framework with dynamic cost
(3 steps)
View the gate as a special type buffer and
merge the whole ECO path as a big routing tree.
Perform gate sizing and buffer insertion
simultaneously from the end-point to the
start-point.
Perform one buffer insertion operation for each
net and one gate sizing operation for each gate.

Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
19
Dynamic Cost

Unlike the traditional buffer insertion problem,
the buffering/sizing cost is dynamic because
all spare cells are candidates for
buffering/sizing.
number of spare cells are changing during the
optimization process.
Optimum solutions of sub-problems do not
necessarily result in the optimum one of the
overall problem.
Need to store a set of solutions for each
gate/net.

b1
ECO path 1
inserted buffer
S2
S3
b2
S1No buffer insertion
1
S1
S2Insert buffer b1
0
ECO path 2
S3Insert buffer b2
Path delay
20
Solution Propagation during DCP

Store each solution as a point on a plane if it
shortens the ECO timing path delays.
The two coordinates are
inserted buffer
approximated sub-path delays from the current
gate to the end point of the path.
Sized gates are not counted.
Estimate the effect of operations without
actually applying them.
Generate solutions based on the solutions of the
driven gate/net.

inserted buffer
inserted buffer
b1
S2
S3
S2
S3
S5
S6
1
g1
S1
1
0
S1
S4
0
Path delay
b2
g2
Path delay
21
Judgment of Operations

The timing effect of a sizing/buffering operation
can be estimated by its effect on its fanins.
Buffer insertion operaion to net ni
If delay(source of ni)delay(buffer)ltdelay(source
of ni), store the solutions corresponding to the
operation.
Gate sizing operation to gate gi
If delay(spare cell)ltdelay(gi) and If
delay(fanin of gi)lt delay(fanin of gi), store
the solutions corresponding to the operation.
Timing of non-ECO paths are preserved after
optimization.

Net ni
gi
Buffer insertion
Gate sizing
22
Bounding Box Theorem

We find a theorem to greatly reduce
buffering/sizing candidates.
Assumption
Gate delays are independent of the input
transition time.
The driving capabilities of the sized gate and
the sizing spare cell are the same.

23
widthdis(gE1,gE2)dis(gE1,gE3)(CEi1CEi2 )/F,
center gE1
gE2
nE1
gE1
gE3
24
Bounding Box Theorem

25
Bounding polygon
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F,
center gE2
widthdis(gE1,gE4) (CEi1)/F, center gE4
gE2
gE1
gE4
gE3
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F ,
center gE3
26
Solution Pruning during DCP

For each set of solutions, we keep at most k
solutions. (k is a user-defined parameter)
Discard non-dominant solutions.
Classify these solutions by the number of used
buffers.
Keep the best solutions for each
class.

inserted buffer
3
1
2
1
1
0
Path delay
0
0
27
End of DCP

At the start point of the ECO path, choose the
solution which
meets the timing constraint
uses the least number of buffers
Change netlist according to the solution
Run STA to update the timing information.

inserted buffer
3
Start point
2
End point
1
0
Path delay
clock cycle
28
An Example for Complex ECO Paths
buffer type spare cell
Path Source Target Negative slack
P1 S1-T1
P2 S1-T2 medium
P3 S2-T3 small
gate type spare cell
large
zero
T1
small
zero
S2
zero
S1
P1
P1
Slack
P2
P2
P2
P2
P3
P3
P2
T2
FINISH
0
T3
LIST
29
Timing Complexity Analysis of phase 1

Parameters
Gate count V
spare cells N
iterations of DCP L
Max gates of ECO path M
Keep at most k solutions per operation
Complexity of DCPO(kMN)
Complexity of STAO(V)
Complexity of phase 1O( (kMNV)L )

30
Extension Technology Remapping

After DCP, we can further improve the circuit
timing by following steps
Identify timing critical parts of the netlist.
Extract those parts from the netlist.
Re-synthesize and map the extracted netlist.
Decomposition by MVSIS
Ideal mapping locations
Technology mapping
Run STA to update the timing information.

31
Optimal Buffering to a Line

The optimal buffering to a line is to insert
buffers with equal distance
No gate drives a too large loading.

Optimal buffering
Non-optimal buffering
32
Ideal Mapping Locations

Given locations of the input and output pins, map
the base gates evenly between the input and
output pins.
No gate drives a too large loading, and the path
delay is smaller. (Delay is proportional to
square of wirelength)
Makes buffer insertion easier.

inserted buffers
delay
Input A
Output
Input B
Input A
Output
Input B
33
Calculating Ideal Mapping Locations

From each path from one input pin to one output
pin, calculate ideal locations of every passed
base gate by equal distance.
If a base gate has more than one ideal location,
average these values and get a final ideal
location.

Input A
Output
Input B
Input A
Output
Input B
34
Technology Mapping

Consider actual locations of spare cells as
costs.
Cut the network into trees.
Apply dynamic programming method to map each
tree.
Locations of mapped base gates are locations of
corresponding spare cells.
Locations of unmapped base gates are ideal
locations of base gates.
Insert buffers into mapped circuit to further
improve timing.

Input A
Output
Input B
35
Maximum Independent Set

For choosing global optimum solution of the
technology remapping, we store a set of match
solutions for each tree and use MIS to find the
best assignments.

Tree T2
Tree T1
g1
M2_2
M1_2
M2_3
g5
M1_1
g4
M2_1
M3_2
Tree T3
g2
g3
g6
M3_1
36
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Experimental results
Conclusions

37
Experimental Results

The five benchmarks are industrial designs.
Our tool is run on Linux workstation with 3.2Ghz
CPU and 3GB memory.

38
Experimental Results (contd)

Our tool beat all competitors with the same
subject in the CAD contest 05.
We compare the results of our algorithm with
the case without the aid of the bounding box
theorem.
a greedy wire cost heuristic.

39
Experimental Results (contd)

Layout of Case 2

Before optimization
After optimization
40
Outline

Introduction problem formulation
Previous work and preliminaries
Algorithm
Experimental results
Conclusions

41
Conclusions

We proposed a dynamic programming method
considering dynamic cost to solve the ECO timing
optimization problem.
Functional change considering timing is a tougher
work, and we will extend our work in this
direction.

Write a Comment

User Comments (0)