Interconnect Optimizations - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Interconnect Optimizations

Description:

Interconnect Optimizations – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 60

Provided by: Shiy9

Learn more at: https://www.mtu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Interconnect Optimizations

1
Interconnect Optimizations
2
A scaling primer

Ideal process scaling
Device geometries shrink by s ( 0.7x)
Device delay shrinks by s
Wire geometries shrink by s
R/m r/(ws.hs) r/s2
Cc/m (hs).e/(Ss) Cc
C/m similar
R/m doubles, C/m and Cc/m unchanged

3
Interconnect role

Short interconnect
Used to connect nearby cells
Minimize wire C, i.e., use short minwidth wires
Medium to long-distance (global) interconnect
Size wires to tradeoff area vs. delay
Increasing width ? Capacitance increases,
Resistance decreases Need to find acceptable
tradeoff - wire sizing problem
Fat wires
Thicker cross-sections in higher metal layers
Useful for reducing delays for global wires
Inductance issues, sharing of limited resource

4
Cross-Section of A Chip
5
Block scaling

Block area often stays same
cells, nets doubles
Wiring histogram shape invariant
Global interconnect lengths dont shrink
Local interconnect lengths shrink by s

6
Interconnect delay scaling

Delay of a wire of length l
tint (rl)(cl) rcl2 (first order)
Local interconnects
tint (r/s2)(c)(ls)2 rcl2
Local interconnect delay unchanged (compare to
faster devices)
Global interconnects
tint (r/s2)(c)(l)2 (rcl2)/s2
Global interconnect delay doubles
unsustainable!
Interconnect delay increasingly more dominant

7
Buffer Insertion For Delay Reduction
8
Analysis of Simple RC Circuit
i(t)
R
v(t)
vT(t)
C

state variable
Input waveform
9
Analysis of Simple RC Circuit
Step-input response
match initial state
output response for step-input
10
Delays of Simple RC Circuit

v(t) v0(1 - e-t/RC) -- waveform
under step input v0u(t)
v(t)0.5v0 ? t 0.7RC
i.e., delay 0.7RC (50 delay)
v(t)0.1v0 ? t 0.1RC
v(t)0.9v0 ? t 2.3RC
i.e., rise time 2.2RC (if defined as time from
10 to 90 of Vdd)
Commonly used metric TD RC ( Elmore
delay)

11
Elmore Delay
Delay
12
Elmore Delay

Driver is modeled as R
Driver intrinsic gate delay t(B)
Delay ?all Ri ?all Cj downstream from Ri RiCj
Elmore delay at n2 R(B)(C1C2)R(w)C2
Elmore delay at n1 R(B)(C1C2)

n1
n2
R(B)
B
R(w)
C1
C2
13
Elmore Delay

For uniform wire
No matter how to lump, the Elmore delay is the
same

x
unit wire capacitance c unit wire resistance r
C
14
Delay for Buffer
u
v
u
C(b)
C
Driver resistance
Input capacitance
Intrinsic buffer delay
15
Buffers Reduce Wire Delay
x/2
x/2
R
C
rx/2
rx/2
R
cx/4
cx/4
cx/4
cx/4
C
?t
t_unbuf R( cx C ) rx( cx/2 C ) t_buf
2R( cx/2 C ) rx( cx/4 C ) tb t_buf
t_unbuf RC tb rcx2/4
x
16
Combinational Logic Delay
Register Primary Input
Register Primary Output
Combinational Logic
clock

Combinational logic delay lt clock period

17
Example of Static Timing Analysis
2
7/4/-3
9/6/-3
5/3/-2
3
11
3
20/17/-3
23/20/-3
7
2
4
4/7/3
18/18/0
3
8/8/0
11/11/0

Arrival time input -gt output, take max
Required arrival time output -gt input, take min
Slack required arrival time arrival time

18
Buffers Improve Slack
RAT 300 Delay 350 Slack -50
slackmin -50
RAT 700 Delay 600 Slack 100
RAT Required Arrival Time Slack RAT - Delay
RAT 300 Delay 250 Slack 50
Decouple capacitive load from critical path
slackmin 50
RAT 700 Delay 400 Slack 300
19
ITRS projections
20
Buffered global interconnects Intuition

Interconnect delay r.c.l2
Now, interconnect delay ? r.c.li2 lt r.c.l2
(where l S lj )
since S (lj 2) lt (S lj )2
(Of course, account for buffer delay also)

21
Optimal inter-buffer length

First order (lumped parasitic, Elmore delay)
analysis
Assume N identical buffers with equal
inter-buffer length l (L Nl)
For minimum delay,

22
Optimal interconnect delay

Substituting lopt back into the interconnect
delay expression

Delay grows linearly with L (instead of
quadratically)
23
Optimized interconnect delay scaling

Rewriting the optimal interconnect delay
expression,
With optimally sized buffers (using dT/dh 0),

24
Optimized interconnect delay scaling

After scaling,
(instead of )
Even with optimal (re-)buffering, interconnects
scale worse than devices
For global interconnects, L doesnt shrink. So

25
Buffered nets
26
Total buffer count

Ever-increasing fractions of total cell count
will be buffers
70 in 32nm

27
Buffer Insertion

Timing optimization
Slew optimization

28
Timing Driven Buffering Problem Formulation

Given
A Steiner tree
RAT at each sink
A buffer type
RC parameters
Candidate buffer locations
Find buffer insertion solution such that the
slack at the driver is maximized

29
Candidate Buffering Solutions
30
Candidate Solution Characteristics

Each candidate solution is associated with
vi a node
ci downstream capacitance
qi RAT

vi is a sink ci is sink capacitance
v is an internal node
31
Van Ginnekens Algorithm
Candidate solutions are propagated toward the
source Dynamic Programming
32
Solution Propagation Add Wire
x
(v1, c1, q1)
(v2, c2, q2)

c2 c1 cx
q2 q1 rcx2/2 rxc1
r wire resistance per unit length
c wire capacitance per unit length

33
Solution Propagation Insert Buffer
(v1, c1, q1)
(v1, c1b, q1b)

c1b Cb
q1b q1 Rbc1 tb
Cb buffer input capacitance
Rb buffer output resistance
tb buffer intrinsic delay

34
Solution Propagation Merge
(v, cl , ql)
(v, cr , qr)

cmerge cl cr
qmerge min(ql , qr)

35
Solution Propagation Add Driver
(v0, c0, q0)
(v0, c0d, q0d)

q0d q0 Rdc0 slackmin
Rd driver resistance
Pick solution with max slackmin

36
Example of Solution Propagation

r 1, c 1
Rb 1, Cb 1, tb 1
Rd 1

2
2
(v1, 1, 20)
Add wire
(v2, 3, 16)
(v2, 1, 12)
v1
v1
Insert buffer
Add wire
Add wire
(v3, 5, 8)
(v3, 3, 8)
v1
v1
slack 5
slack 3
Add driver
Add driver
37
Example of Merging
Left candidates
Right candidates
Merged candidates
38
Solution Pruning

Two candidate solutions
(v, c1, q1)
(v, c2, q2)
Solution 1 is inferior if
c1 gt c2 larger load
and q1 lt q2 tighter timing

39
Pruning When Insert Buffer
They have the same load cap Cb, only the one with
max q is kept
40
Generating Candidates
From Dr. Charles Alpert
41
Pruning Candidates
42
Candidate Example Continued
43
Candidate Example Continued
After pruning
44
Merging Branches
45
Pruning Merged Branches
46
Van Ginneken Example
(20,400)
Wire C10,d150
Buffer C5, d30
(30,250) (5, 220)
(20,400)
Buffer C5, d50 C5, d30
Wire C15,d200 C15,d120
(30,250) (5, 220)
(45, 50) (5, 0) (20,100) (5, 70)
(20,400)
47
Van Ginneken Example Contd
(30,250) (5, 220)
(45, 50) (5, 0) (20,100) (5, 70)
(20,400)
(5,0) is inferior to (5,70). (45,50) is inferior
to (20,100)
Wire C10
(30,250) (5, 220)
(20,100) (5, 70)
(30,10) (15, -10)
(20,400)
Pick solution with largest slack, follow arrows
to get solution
48
Basic Data Structure
Worse load cap
(c1, q1)
(c2, q2)
(c3, q3)
Better timing

Sorted list such that
c1 lt c2 lt c3
If there is no inferior candidates q1 lt q2 lt q3

49
Prune Solution List
Increasing c
(c1, q1)
(c2, q2)
(c3, q3)
(c4, q4)
N
N
q1 lt q2 ?
q1 lt q3 ?
q1 lt q4 ?
Prune 2
Prune 3
Y
Y
N
Prune 3
q2 lt q4 ?
q2 lt q3 ?
Y
N
Prune 4
q3 lt q4 ?
N
Prune 4
q3 lt q4 ?
50
Pruning In Merging
Left candidates
Right candidates
ql1 lt ql2 lt qr1 lt ql3 lt qr2
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
Merged candidates (cl1cr1, ql1) (cl2cr1,
ql2) (cl3cr1, qr1) (cl3cr2, ql3)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
51
Van Ginneken Complexity

Generate candidates from sinks to source
Quadratic runtime
Adding a wire does not change candidates
Adding a buffer adds only one new candidate
Merging branches additive, not multiplicative
Linear time solution list pruning
Optimal for Elmore delay model

52
Multiple Buffer Types
http//vlsitechnology.org/html/cells/vsclib013

r 1, c 1
Rb 1, Cb 1, tb 1
Rb2 0.5, Cb2 2, tb2 0.5
Rd 1

2
2
(v1, 1, 20)
(v2, 3, 16)
v1
(v2, 2, 14)
(v2, 1, 12)
v1
v1
53
Handle Polarity
Negative
-
Positive
-
-
-
-
-
-
54
Consider Cost/Power

A solution is also characterized by cost w
A solution is inferior if it is poor on all of c,
q and w
At source, a set of solutions with tradeoff of q
and w
w can be
total capacitance
or the number of buffers

55
Cost-Slack Trade-off
56
Data Organization
Sorted in ascending order of (c, q)
0
(c1, q1)
(c2, q2)
(c3, q3)
1
(c4, q4)
(c5, q5)
(c6, q6)
2
(c7, q7)
(c8, q8)
(c9, q9)
(c10, q10)
3
4
(c11, q11)
buffers inserted
57
Pruning Considering Cost
(ci , qi , wi) is inferior to (ck , qk , wk) if
ci gt ck , qi lt qk , wi gt wk
Prune order
Pruning within a list is same as before
0
(c1, q1)
(c2, q2)
(c3, q3)
1
(c4, q4)
(c5, q5)
(c6, q6)
2
(c7, q7)
(c8, q8)
(c9, q9)
w
How to prune a solution with wk from a set of
solutions with w ? wk?
58
Blockage Recognition

Delete insertion points that run over blockages

59
References

L.P.P.P. van Ginneken, Buffer placement in
distributed RC-tree networks for minimal Elmore
delay, ISCAS 1990, 865 -868.
J. Lillis, C.-K. Cheng, and T. T. Lin, Optimal
wire sizing and buffer insertion for low power
and generalized delay model, IEEE J. Solid-State
Circuits, 31(3), pp. 437-447, 1996.
W. Shi and Z. Li, An O(nlogn) time algorithm for
optimal buffer insertion, Proc. DAC 2003, pp.
580-585.