Title: Analytical Minimization of Signal Delay in VLSI Placement
1Analytical Minimization of Signal Delayin VLSI
Placement
- Andrew B. Kahng and Igor L. Markov
- UCSD, Univ. of Michigan
- http//www.eecs.umich.edu/imarkov
- IBM technical contact Paul Villarrubia
2Outline
- Background Global Placement for VLSI
- wirelength minimization
- delay minimization
- Contribution
- minimization objective
- generic minimization algorithm outer loop and
inner loop - empirical results
- Futures
3VLSI Global Placement
- Find locations for standard cells
- Standard cells placed in rows, without overlap
- Minimize wirelength, routing congestion
- Minimize clock cycle
- Key abstractions
- standard cells ? rectangular outlines
- netlist ? weighted hypergraph (signal nets ?
hyperedges) - signal delay ? function of cell locations
(interconnect dominates)
4A VLSI Global Placement Example
bad placement
good placement
5Netlist Hypergraph and Timing Graph
- Two signal nets 3 pins (l.blue), and 4 pins
(l.green) - Ovals hyperedges
- Red edges timing graph edges
6Top-Down Global Placement
- Placement blocks represent cells and layout area
- single block at the start, driven by recursive
(min-cut) bipartitioning - each pass number of blocks doubles, size of
blocks halves - end case several cells in a tiny region
etc.
- Intuition many cells can operate in parallel.
- Partitioning finds independent groups of cells
7Analytical Global Placement
- Find a continuous placement (locations reals)
- Efficient optimizations when nonconvex
constraints are relaxed (e.g., cells are allowed
to overlap) - Represent multi-pin hyperedges by sets of edges
- minimize total weighted wirelength of all edges
-
- Popular objectives
- Linear (Manhattan) WL w12 ( x1-x2
y1-y2 ) - Quadratic squared WL w12 ( (x1-x2)2
(y1-y2)2 ) - Constraints fixed vertices and/or region
constraints
P1
P2
8Analytical Placement Alone is Not Enough
- Many cells overlap
- Must spread the placement
- IBM CPlace and XQ
- Remove overlap (comp. geometry)
- Cplace combines min-cut with analytical
techniques
9Timing-Driven Placement
- Cycle time ? maximum path delay, not total path
delay (!) - max(x,y,...) is not differentiable
- framework pin-based timing graph
- Analytical approaches allow cell overlaps
- Cell overlaps are resolved later
- Main difficulty cannot enumerate signal paths
- Signal paths implicitly defined by device types
- signal path sources, sinks I/O pins and
storage elements - Timing constraints also implicitly defined
- actual arrival times (AATs) at sources
- required arrival times (RATs) at sinks
- source-sink path constraint path delay ?
RAT_at_sink - AAT_at_source
10Implicit Analysis of Path Constraints
- Static Timing Analysis (STA) methodology
- forward topological traversal in timing graph ?
AAT_at_every_pin - similar backward traversal ? RAT_at_every_pin
- slack_at_pin is given by RAT_at_pin - AAT_at_pin
- negative slacks ? violated timing constraints
- STA-based and STA-inspired placement methods
- slacks ? net weights for HPWL minimization
- top-down placement to maximize negative slack
(Marek-Sadowska/Lin 86) - note STA requires edge delays (e.g., from
placement) - delay budgets
- zero-slack (Hauge, Nair and Yoffa 86)
- iterative min-max (Shragowitz et al. 90/92)
- limit-bumping (Frankle 92)
11Motivations For Novelty
- Many promising techniques available
- net reweighting
- delay budgeting
- others
- Existing frameworks have weaknesses
- speed/scalability
- loss or ignorance of input information
- delay budgeting algorithms tend to ignore fixed
locations, obstacles - optimization of wrong global objectives (e.g.,
average wirelength)
12The Dimensionless Path-Timing Objective
- For path ? consider edge e??
- Dimensionless Path-Timing Objective (DPO)
- ?max? t? /c? max? (?e?? de)/c?
- Where
- c? is path constraint
- t? is path delay
- de dij(xi,yi,xj,yj) is edge delay
13DPO Properties
- ?max? t? /c? max?
(?e?? de)/c? - ? ? 1 ? all timing constraints are satisfied
- Convex when edge delay models are convex
- Min DPO ? max slack when all c? are equal
- Max slack can be reduced to min DPO
- add two new vertices the source and the sink
- connect the source to former sources
- connect the sink to former sinks
- use constant edge delay models
14Criticalities Multiplicative Slacks
- By analogy with slack, define criticalities
- ?i max? ? v t? /c? for vertex vvi
- ?ij max? ? e t? /c? for edge
eeij - Criticalities are multiplicative versions of
slack - DPO and criticalities quickly computable
- STA postprocessing
- Vertex criticalities ? cells on critical paths
- can be used by the proposed top-down
timing-driven placement flow
15Generic Minimization of DPO
- Reduce DPO to a simpler objective maxij wijdij
- maximal weighted edge delay
- use reweighting iterations
- One reweighting iteration
- assume a placement
- compute edge criticalities
- compute new edge weights wij
- minimize maxij wijdij
- (New weights wij ?ij? / dij where ? maxij
wijdij )
16Properties of Reweighting
- Theorem 1. If ? maxij wijdij does not
increase at a particular iteration, all timing
constraints must be satisfied. - Theorem 2. A re-weighting iteration either
decreases DPO, or leaves it unchanged. - Reweighting upper-bounds dij because wijdij ? ?
- can interpret reweighting as delay rebudgeting
- Youssef and Shragowitz used wij ?ij in 1990/92
- interpretation of their iterative MiniMax
- no iterations with placement ignore fixed pad
locations
17Optimization of Maximal Edge Delay
- Must consider particular edge delay models
- popular choices linear and quadratic
- Theorem 3. 2-dim max edge delay can be reduced to
1-dim case with double vertices - Inlined implementation no new graph
- max akm
tk-tm - max bkm
(tk-tm)2 - Theorem 4. Let bkmakm2 ? minimizers coincide
- Linear and quadratic WL are numerically
equivalent!
18Top-Down Placement Framework
- Top-down placement done in passes
- In one pass
- split every previously existing block
- Cell-to-block assignments
- viewed as region constraints
- gradually refine, converge to cell locs
- Assume we analytically minimized signal delay
- ? have cell locations ? can compute edge delays
- ? can perform Static Timing Analysis
- ? know which cells lie on critical paths
- Use delay-minimizing cell locs when splitting
blocks
19Empirical Validation
- We combined min-max placement with recursive
min-cut bisection (Capo ? CapoT) - Implemented minimization of edge delay
objectives - Length as delay
- Squared length as delay
- Quadratic RC delay
- MST-based Elmore delay (using
- Evaluated
- Internal evaluators (after placement) sanity
check - Industry timing analyzer
- Compared to an industry placer on 4 test-cases
- Won on three test-cases (by slack computed with
industry STA)
20Results of Quadratic, Linear and Min-Max
Placement
21Results of Quadratic, Linear and Min-Max
Placement
22Conclusions and Ongoing Work
- New timing-driven placement framework
- can potentially be combined with budgeting or
reweighting - expected to be successful enough on its own
- leverages mincut placement
- relies on a novel analytical delay minimization
- Dimensionless Path-timing Objective (DPO)
- novel global timing objective generalizes slack
optimization - New minimization algorithms
- reweighting iteration reduction to simpler
MAX-based objective - MAX-based objective can be minimized very quickly
- Ongoing work in the context of timing-driven flows
23Future Work
- Observation (how the proposed method works)
- a classic placement approach is split into stages
- a new timing optimization is performed between
those stages - most critical wires/gates are found first
- (traditionally placement is found first)
- Try other types of optimizations during
placement - routing of timing-critical nets
- better delay estimation
- early cross-talk detection?
- sizing of timing-critical drivers
- buffer insertion for timing-critical nets
- early detection of dangerous cross-talk
- Faster and cheaper ICs