Title: Lec 8: Topology Optimization
1Lec 8 Topology Optimization
- Source of Slides
- These slides are courtesy of Professor Pan.
Professor Pan is with the Electrical and Computer
Engineering at UT-Austin. - Readings
- Cong, J., Kahng, A., Robins, G., Sarrafzadeh M.,
and Wong, C.K., Performance-driven global
routing for cell based ICs, ICCD91, pp. 14-16. - Cong, J., Kahng, A., Robins, G., Sarrafzadeh M.,
and Wong, C.K., Provably good performance-driven
global routing, TCAD92, vol. 11, no. 6, pp.
739-752. - S. Rao, P. Sadayappan, F. Hwang, and P. Shor,
The rectilinear Steiner arborescence problem,
Algorithmica, vol. 7, no. 2-3, pp. 277--288,
1992. - Jason Cong, Kwok-Shing Leung, Dian Zhou,
Performance-driven interconnect design based on
distributed RC delay model, DAC93, pp.606-611.
2- Readings
- Jason Cong, Kwok-Shing Leung, Dian Zhou,
Performance-driven interconnect design based on
distributed RC delay model, UCLA Computer Since
Tech. Report CSD-920043, Los Angeles, CA 90024,
Sep. 1992. - Alpert, C.J., Hu, T.C., Huang, J.H., Kahng, A.B.,
Karger, D., Prim-Dijkstra tradeoffs for improved
performance-driven routing tree design,TCAD95,
vol.14, no. 7, pp. 890-896. - K. D. Boese, A. B. Kahng, B. A. McCoy, G. Robins,
Near-optimal critical sink routing tree
constructions, TCAD95 vol.14, no. pp.
1417-1436. - Huibo Hou, Jiang Hu, Sapatnekar, S.S., Non-Hanan
routing, TCAD99, vol. 18. no. 4, pp. 436-444. - Jiang Hu, Sachin S. Sapatnekar, Simultaneous
buffer insertion and non-Hanan optimization for
VLSI interconnect under a higher order AWE
model, ISPD99, pp. 133-138.
3Layout Design Flow for VDSM ICs
Hierarchical Design Planning and Interconnect
Planning
TopologyOptimization Buffer Insertion Device
Sizing Wiresizing . . . . .
Placement Driven Synthesis with Interconnect
Optimization
Performance/Routability Driven Global Routing
with Interconnect Optimization
Detailed Routing Clock and PG Routing
InterconnectOptimizationsLibrary
4Interconnect Topology Optimization
- Problem given a source and a set of sinks, build
the best interconnect topology to minimize
different design objectives - Wire length traditional (lower capacitance load,
and overall congestion) - Performance DSM
- New interest speed and other non-traditional
routing architecture - In most cases, topology means tree
- Because tree is the most compact structure to
connect everything without redundancy - Delay analysis is easy (cf. mesh)
5Terminology
- For multi-terminal net, we can easily construct a
tree (spanning tree) to connect the terminals
together. - However, the wire length may be unnecessarily
large. - Better use Steiner Tree
- A tree connecting all terminals as well as other
added nodes (Steiner nodes). - Rectilinear Steiner Tree
- Steiner tree such that edges can only run
horizontally and vertically. - Manhattan planes
- Note X (or Y)-architecture (non-Manhanttan)
Steiner Node
6Prims Algorithm for Minimum Spanning Tree
- Grow a connected subtree from the source, one
node at a time. - At each step, choose the closest un-connected
node and add it to the subtree.
Y
X
s
7Interconnect Topology Optimization Under Linear
Delay Model
- Conventional Routing Algorithms Are Not Good
Enough - Minimum spanning tree may have very long
source-sink path. - Shortest path tree may have very large routing
cost. - Want to minimize path lengths and routing cost at
the same time.
8Performance-Driven Interconnect Topology Design
- BPRIM BRBC (bounded-radius bounded-cost)
algorithm - Cong et al, ICCD91, TCAD92
- RSA algorithm (for Min. Rectilinear Steiner
Arborescences) - Rao-Sadayappan-Hwang-Shor, Algorithmica92
- A-Tree algorithm (generalization of RSA)
- Cong-Leung-Zhou, DAC93 Cong et al, ISPD97
- Prim-Dijkstra tradeoff algorithm
- Alpert et al, TCAD 1995
- SERT algorithm (Steiner Elmore Routing Tree)
- Boese-Kahng-McCoy-Robins, TCAD-95
- MVERT algorithm (Minimum Violation Elmore Routing
Tree) - Hou-Hu-Sapatnekar, TCAD-99
- BINO alg. (Buffer Insertion and Non-Hanan
Optimization) - Hu-Sapatnekar, ISPD-99
9Definitions
- Given Net N with source s and connected by tree
T. - Radius of net N distance from the source to the
furthest sink. - Radius of a routing tree r(T) length of the
longest path from the root to a leaf. - Cost of an edge distance between two endpoints
(other weights are ok). - Cost of a routing tree cost(T) sum of the edge
costs in T. - minpathG(u,v) shortest path from u to v in G
- distG(u,v) cost of minpathG(u,v).
r(T)
source s
radius of the net
routing tree
10First Idea Bounded Radius Minimum Spanning Tree
Cong-Kahng-Robin-Sarrafzadeh-Wong, ICCD-91
- Basic Idea Restrict the tree radius while
minimizing the routing cost - Bounded radius minimum spanning tree problem
(BRMST) - Given a net N with radius R,
- find a minimum cost tree with radius r(T)?(1 ?)R
- Parameter ? controls the trade-off between radius
and cost - ? ? minimum spanning tree ? 0 shortest path
tree
source ? 1 radius 1.77 cost 4.26
source ? ? radius 4.03 cost 4.03
source ? 0 radius 1 cost 4.95
trade-off between radius and the cost of routing
trees
11BPRIM Algorithm for Bounded-Radius Minimum
Spanning Trees
- Given net N with source s and radius R, and
parameter ?. - Grow a connected subtree T from the source, one
node at a time - At each step, choose the closest pair x ?T and y
? N-T - If distT(s, x) cost(x,y) ?(1?)R, add (x,y)
- Else backtrace along minpathT(s,x) to find x
such that distT(s, x) cost(x, y) ? R, then
add (x, y) - Slack ?R is introduced at each backtracing so we
do not have to backtrace too often.
x
x
y
y
s
s
x
distT(s,x)cost(x,y) ?(1 ?)R
distT(s,x)cost(x,y) ?R
12Experimental Results of BPRIM Algorithm
13Worst-Case Ratio of BPRIM Algorithm for Small Nets
14A Pathological Example for BPRIM Algorithm
- BPRIM can be arbitrarily bad.
x
x
y
y
all leaves connect directly to the source
source s
source s
Optimal Solution
Solution by BPRIM
15An Improved Algorithm BRBC Cong-Kahng-Robin-Sar
rafzadeh-Wong, T-CAD92
- Construct MST and SPT, Q MST
- Construct list L -- a depth-first tour of MST
- Traverse L while keeping a running total S of
traversed edge costs, when reaching Li - If S ? ? distSPT(s, Li) then add minpathSPT(s,
Li) to Q and reset S 0, - Else continue traverse L
- Construct the shortest path tree T of Q.
s
s
s
10
9
8
L
4
3
7
Li
Li
2
6
1
5
Graph Q
if SdistL(Li,Li) ? ? distSPT(S, Li) then add
minpathSPT(s, Li) to Q and reset S 0
depth-first tour L MST
16BRBC Trees Have Bounded Radius
Graph Q
s
- Theorem 1 r(T) ?(1 ? ) R
- Proof For any vertex x, let y be the last vertex
before x in L that we add minpathSPT(s, L). - By the choice of y, we have
- distL(y,x) ? ? distSPT(s,x) ? ? R
- Therefore,
- distQ(s,x) ? distQ(s,y) distQ(y,x)
- ? distSPT(s,y) distL(y,x)
- ?R?R (1 ?)R
-
x
y
s
tour L
distSPT(s,y) ?R
y
x
distL(y,x) ? ? distSPT(S, x) ? ?R
17BRBC Trees Have Bounded Cost
- Theorem 2 cost(T) ? (12/ ? ) cost(MST).
- Proof Let v1 v2 vk be the vertices that we add
minpathSPT(s,vi) - Note that T is a subgraph of Q
- Idea borrowed from Awarbuch - Baratz - Peleg,
PODC-90
Graph Q
s
s
tour L
vi-1
vi
distL(vi-1,vi) ? ? distSPT(S, vi)
18Experimental Results of BRBC Algorithm
Radius, as fraction of MST radius
Cost, as fraction of MST cost
19Bounded Radius Steiner Trees
- For any weighted graph, we construct a bounded
radius Steiner tree - s.t.
- r(T) ? (1?) R and
- where TOPT is the optimal Steiner Tree
- On a Manhattan plane, we construct a bounded
radius rectilinear Stiner tree - s.t.
- r(T) ? (1?) R and
- On a Euclidean plane, we construct a bounded
radius Steiner tree - s.t.
- r(T) ? (1?) R and
20Prim-Dijkstra Algorithm
Prims MST
Dijkstras SPT
Trade-off
21Prims and Dijkstras Algorithms
- d(i,j) length of the edge (i, j)
- p(j) length of the path from source to j
- Prim d(i,j)
- Dijkstra d(i,j) p(j)
p(j)
d(i,j)
22The Prim-Dijkstra Trade-off
- Prim add edge minimizing d(i,j)
- Dijkstra add edge minimizing p(j) d(i,j)
- Trade-off c(p(j)) d(i,j) for 0 ? c ? 1
- When c0, trade-off Prim
- When c1, trade-off Dijkstra
23Conventional Rectilinear Steiner Tree
- Extensive studies, even outside the VLSI design
community - Minimize total wire length
- 1-Steiner and iterated 1-Steiner Kahng-Robins,
1992 - One steiner point is added at each step
- Good performance but slow O(n4logn)
- BOI Borah et al, TCAD 94
- O(n2) algorithm
- Also proposed O(nlogn) algorithm, but
implementation very complicated and never done. - Recent result
- Efficient Steiner Tree Construction Based on
Spanning Graphs p. 152 , H. Zhou ISPD 2003 - O(nlogn)
- Highly Scalable Algorithms for Rectilinear and
Octilinear Steiner Trees p. 827 by A.B. Kahng,
I.I. Mándoiu, A.Z. Zelikovsky ASPDAC 2003 - O(nlog2n)
24Hanans Result on Rectilinear Steiner Tree
- Hanan, SIAM J. Appl. Math. 1966
- For rectilinear steiner tree construction, there
exists a routing tree with minimum total wire
length on the grid formed by horizontal and
vertical lines passing through source and sinks.
Hanan nodes
source
Hanan Grid
25Rectilinear Steiner Arborescence Algorithm
Rao-Sadayappan-Hwang-Shor, Algorithmica92
- Given n nodes lying in the first quadrant
- Purpose is to maintain shortest paths from source
to sink and minimize total wire length - RSA algorithm
- Start with a forest of n single-node A-trees.
- Iteratively substituting min(p,q) for pair of
nodes p, q - where min(p,q) (minxp, xq, minyp, yq).
- The pair p, q are chosen to maximize min(p,q)
over all current nodes.
p
q
min(p,q)
26Example of RSA Algorithm
27Performance of RSA Algorithm
- Time complexity O(n log n) when implemented using
a plane-sweep technique. - Wirelength of the tree by RSA algorithm ?
2?Optimal solution (2 ? wirelength of minimum
Rectilinear Steiner Arborescence)
28Interconnect Designs Under Distributed RC Delay
Model Cong-Leung-Zhou, DAC93
- Routing Tree T connects the source with a set of
sinks - plk(T) pathlength from sink k to source in T
Z0
Z0
driver
Z0
Fd
Z0
Z0
Z0
Z0
R0
C0
29Interconnect Topology Design Formulation Under
Distributed RC Delay Model
30Comparison of Three Types of Trees
MST
SPT
QMST
MST cost 9(optimal) 11 10
SPT cost 37 29
(optimal) 31 QMST cost 45
36 34 (optimal)
31Impact of Resistance Ratio
- Definition Rd/R0
- Driver resistance versus unit wire resistance
- Determined by the Technology
- Reduce device dimension
- Impact on Interconnect Optimization
- Why Minimum-cost shortest path tree is useful
R0
Rd
Rd/R0
32A-Tree
- Shortest path rectilinear Steiner tree
- Generalization of RSA by Rao et al
- Advantage of A-tree
- Always a SPT (t2(T) is minimum)
- Minimizing t1(T) ? Minimizing t3(T) for most
A-trees - Objective Minimize the total wirelength of
A-tree - Simultaneous minimization of t1(T),t2(T) and
t3(T).
A-Tree
Steiner Tree
33A-tree Algorithm
- Start with a forest of n single-node A-trees
- Apply a sequence of moves
- Grow an existing A-tree, or
- Combine two A-trees into a new one
- Terminate when only one A-tree is left
34Definitions for A-tree Algorithm
- Definitions
- Fk is the forest constructed after the kth move
by the A-tree algorithm - T(Fk) is the minimum-cost rectilinear
arborescence containing Fk as a subgraph - p dominates q if px? qx and py ? qy.
- DOM(p, Fk) the set of nodes in Fk dominated by p
- Given a node p, define
- NW(p) (x, y) x lt px, y gt py.
- SE(p), SW(p), NE(p), N(p), S(p), W(p), E(p)
similarly. - Given node p, define
- MF(p, Fk) -- nodes in DOM(p, Fk) with minimum
rectilinear distance from p, - df(p, Fk) -- the distance from p to any node in
MF(p, Fk). - mfwest (p, Fk) -- the one with smallest
x-coordinate in MF(p, Fk). - mfsouth (p, Fk) defined similarly.
- Given p as a root in Fk, define
- mx(p, Fk) -- the node in NW(p) ? ROOT(Fk), not
blocked from p and have the minimum horizontal
distance from p. - dx(p, Fk) -- the horizontal distance between p
and mx(p, Fk) - (or ? if mx(p, Fk) doesnt exist)
- my(p, Fk), dy(p, Fk) similarly defined
35Type-1 Safe Move
Combine two A-trees into a new one
mx
dx(p, Fk) ? df(p, Fk) dy(p, Fk) ? df(p, Fk)
p
mf
Move p to mfwest(p, Fk)
my
36Type-2 Safe Move
mx
p
p
my
mfsouth
dx(p, Fk) ? df(p, Fk) dy(p, Fk) lt df(p, Fk)
mx
p
Move southward, p to p with length mindisty(mfs
outh(p, Fk), p), dy(p, Fk)
p
mfsouth
my
37Type-3 Safe Move
mx
p
p
mfwest
dx(p, Fk) lt df(p, Fk) dy(p, Fk) ? df(p, Fk)
my
mx
p
Move westward, p to p with length mindistx(mfwe
st(p, Fk), p), dx(p, Fk)
p
mfwest
my
38Heuristic Moves
- Type-1 and Type-2 Heuristic Moves
- Combines two A-trees into a new one
p
p
H1 select node p such that pmfwest(p, Fk) is
farthest away from the source and introduce a
path from p to p
p1
H2 select two nodes p1, p2 such that
p(min(p1)x, (p2)x, min(p1)y, (p2)y) is
farthest away from the source and connect p1 and
p2 to p
p2
p
39Optimality of a Move
- Definitions
- Fk is the forest constructed after the kth move
by the A-tree algorithm - T(Fk) is the minimum-cost rectilinear Steiner
A-tree containing Fk as a subgraph - T(F0) is the optimal A-tree
- T(FM) is the A-tree constructed by our algorithm
- Optimality of a Move
- If cost(T(Fk))cost(T(Fk1)) then the (k1)th is
called an optimal move - Theorem All three types of safe moves are
optimal moves
40Lower Bound Computation of Optimal A-Tree
- Lower Bound Computation
- After the kth move, we can compute an upper bound
ERRORk of cost(T(Fk!)) - cost(T(Fk)) - Note that
- (i) ERRORk 0 if the kth move is a safe move
- (ii) ERRORk gt0 if the kth move is a heuristic
move - cost(T) ? cost(T) ?k ERRORk
- T constructed by the A-tree algorithm
- T optimal A-tree
- Results obtained from the Lower Bound
Computation - 94 of the moves are safe moves
- 45 of the A-trees constructed by the algorithm
are optimal - the A-trees constructed by the algorithm are at
most 4 from optimal
41Results on A-Tree Construction
- Difficulty no polynomial-time optimal algorithm
is known - A-tree heuristic
- Efficient heuristic algorithm based on three
types of optimal operations and two types of
heuristic operations - Efficient lower bound computation for optimal
A-trees - Experimental Results
- 45 of A-trees constructed are optimal
- total wirelength is 4 above optimal
- reduce interconnect delay up to 12 in 0.5 um
CMOS designs compared to Steiner trees, and up to
40 in MCM designs
42Some Recent Results
- Efficiency and other routing architectures
- Efficient Steiner Tree Construction Based on
Spanning Graphs p. 152 , ISPD 2003 - H. Zhou (Northwestern University)
- Highly Scalable Algorithms for Rectilinear and
Octilinear Steiner Trees p. 827 at ASPDAC 2003 - A.B. Kahng, I.I. Mándoiu, A.Z. Zelikovsky
- Constructing Exact Octagonal Steiner Minimal
Trees p. 1 , GLSVLSI 2003 (Best Paper Award?) - C. S. Coulston (Penn State Erie, USA)
- Not necessarily useful right now, but interesting
to read
43Additional Slides (for curious minds)
44Steiner Elmore Routing Tree (SERT)
HeuristicBoese-Kahng-McCoy-Robins, TCAD95
- Use Elmore Delay Model directly in construction
of routing tree T. - Add nodes to T one-by-one like Prims MST
algorithm. - Two versions
- SERT Algorithm
- At each step, choose v ? T and u ? T s.t. the
maximum Elmore-delay to any sink has minimum
increase. - SERT-C Algorithm
- SERT with identified critical sink
- First connect the critical sink to the source by
a shortest path, - then continues as in SERT, except that we
minimize the Elmore delay of the critical sink
rather than the max. delay.
45Steps of SERT Algorithm
7
7
7
8
8
8
3
3
3
6
6
6
4
4
4
1
1
1
5
5
5
source
source
source
2
2
2
9
9
9
7
7
7
8
8
8
3
3
3
6
6
6
5
5
5
4
4
4
1
1
1
source
source
source
2
2
2
9
9
9
46Examples of SERT-C Construction
7
7
7
8
8
8
6
6
6
3
3
3
5
5
4
4
4
1
1
1
5
source
source
2
2
2
9
9
9
source
c) Node 5 critical
a) Node 2 or 4 critical
b) Node 3 or 7 critical (also 1-Steiner tree)
7
7
7
8
8
8
3
6
6
3
6
3
5
4
4
4
1
1
1
5
5
source
source
2
2
2
9
9
9
source
d) Node 6 critical
f) Node 9 critical
e) Node 8 critical (also SERT)
47Some Theoretical Results on Rectilinear Steiner
Tree Under Elmore Delay Model
- When minimizing a weighted sum of Elmore delay to
the sinks, there exists an optimal tree on the
Hanan Grid. - When minimizing maximum Elmore delay to the
sinks, optimal tree may not lie on the
Hanan-grid.
Example
(1,3)(c0)
(2,3)(c0)
(1.63,2)(c0.37)
unit R1, unit c1
(0,0)(R1.75)
(3,0)(c0)
source
(1.5,0)
48Non-Hanan Routing Hou-Hu-Sapatnekar, TCAD-99
- The problem formulation is
- Minimize total wire length
- Subject to arrival time constraints on sinks
- or
- Minimize maximum delay
- Shown in Boese-Kahng-McCoy-Robins, TCAD95 that
non-Hanan routing is needed to obtain optimal
solutions. - Previous algorithms consider routing on Hanan
grid. - Non-Hanan routing is considered here.
- Elmore delay model is used.
49Effect of Non-Hanan Steiner Node
(Closest Connection) CC
Root
- The delays are concave functions.
- Positive weighted sum of concave functions is
concave. - So for weighted sink delay objective, it is
enough to consider only Root and CC (Hanan
Nodes). - However, maximum of concave functions is not
concave. - Consideration of non-Hanan nodes is necessary.
50MVERT Algorithm
- MVERT Maximum delay Violation Elmore Routing
Tree - Phase 1 Initial Tree Construction
- Construct a tree using a procedure similar to
SERT - (i.e., greedy add one node at a time)
- Minimize maximum delay violation rather than
maximum delay - Considers only candidate points on the Hanan grid
- Phase 2 Cost Improvement
- The tree constructed in Phase 1 may be
conservative. - Can modify the connection point to minimize the
wire length.
51Finding the Optimal Reconnect Point
- The maximum violation function is a piecewise
concave function. - The best connection point can be found by
performing binary search on the concave pieces.
52Non-Hanan Routing with AWE
- Hu-Sapatnekar, ISPD-99
- The fidelity of Elmore delay is good in general
- i.e., an optimal solution according to Elmore
delay should also be nearly optimal according to
actual delay. - However, Elmore delay is over-estimating, and
hence not good at satisfying delay bounds. - 4th order AWE is used instead (accurate enough
and easy to calculate),
53BINO Algorithm
- BINO Buffer Insertion and Non-Hanan Optimization
- Consider buffer insertion together with routing
simultaneous. - The problem formulation is
- Minimize total wire length and buffer cost
- Subject to arrival time constraints on sinks
- Phase 1 Initial Tree Construction
- Using an algorithm similar to SERT (called SART)
- But replace Elmore delay with 4th order AWE
- Phase 2 Buffer Insertion and Non-Hanan
Optimization - Using an algorithm similar to MVERT (called
MVART) - But replace Elmore delay with 4th order AWE
- Buffer insertion handled by some greedy
heuristics.