Interconnect Optimizations - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Interconnect Optimizations

Description:

Interconnect Optimizations – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 60
Provided by: Shiy9
Learn more at: https://www.mtu.edu
Category:

less

Transcript and Presenter's Notes

Title: Interconnect Optimizations


1
Interconnect Optimizations
2
A scaling primer
  • Ideal process scaling
  • Device geometries shrink by s ( 0.7x)
  • Device delay shrinks by s
  • Wire geometries shrink by s
  • R/m r/(ws.hs) r/s2
  • Cc/m (hs).e/(Ss) Cc
  • C/m similar
  • R/m doubles, C/m and Cc/m unchanged

3
Interconnect role
  • Short interconnect
  • Used to connect nearby cells
  • Minimize wire C, i.e., use short minwidth wires
  • Medium to long-distance (global) interconnect
  • Size wires to tradeoff area vs. delay
  • Increasing width ? Capacitance increases,
    Resistance decreases Need to find acceptable
    tradeoff - wire sizing problem
  • Fat wires
  • Thicker cross-sections in higher metal layers
  • Useful for reducing delays for global wires
  • Inductance issues, sharing of limited resource

4
Cross-Section of A Chip
5
Block scaling
  • Block area often stays same
  • cells, nets doubles
  • Wiring histogram shape invariant
  • Global interconnect lengths dont shrink
  • Local interconnect lengths shrink by s

6
Interconnect delay scaling
  • Delay of a wire of length l
  • tint (rl)(cl) rcl2 (first order)
  • Local interconnects
  • tint (r/s2)(c)(ls)2 rcl2
  • Local interconnect delay unchanged (compare to
    faster devices)
  • Global interconnects
  • tint (r/s2)(c)(l)2 (rcl2)/s2
  • Global interconnect delay doubles
    unsustainable!
  • Interconnect delay increasingly more dominant

7
Buffer Insertion For Delay Reduction
8
Analysis of Simple RC Circuit
i(t)
R
v(t)
vT(t)
C

state variable
Input waveform
9
Analysis of Simple RC Circuit
Step-input response
match initial state
output response for step-input
10
Delays of Simple RC Circuit
  • v(t) v0(1 - e-t/RC) -- waveform
  • under step input v0u(t)
  • v(t)0.5v0 ? t 0.7RC
  • i.e., delay 0.7RC (50 delay)
  • v(t)0.1v0 ? t 0.1RC
  • v(t)0.9v0 ? t 2.3RC
  • i.e., rise time 2.2RC (if defined as time from
    10 to 90 of Vdd)
  • Commonly used metric TD RC ( Elmore
    delay)

11
Elmore Delay
Delay
12
Elmore Delay
  • Driver is modeled as R
  • Driver intrinsic gate delay t(B)
  • Delay ?all Ri ?all Cj downstream from Ri RiCj
  • Elmore delay at n2 R(B)(C1C2)R(w)C2
  • Elmore delay at n1 R(B)(C1C2)

n1
n2
R(B)
B
R(w)
C1
C2
13
Elmore Delay
  • For uniform wire
  • No matter how to lump, the Elmore delay is the
    same

x
unit wire capacitance c unit wire resistance r
C
14
Delay for Buffer
u
v
u
C(b)
C
Driver resistance
Input capacitance
Intrinsic buffer delay
15
Buffers Reduce Wire Delay
x/2
x/2
R
C
rx/2
rx/2
R
cx/4
cx/4
cx/4
cx/4
C
?t
t_unbuf R( cx C ) rx( cx/2 C ) t_buf
2R( cx/2 C ) rx( cx/4 C ) tb t_buf
t_unbuf RC tb rcx2/4
x
16
Combinational Logic Delay
Register Primary Input
Register Primary Output
Combinational Logic
clock
  • Combinational logic delay lt clock period

17
Example of Static Timing Analysis
2
7/4/-3
9/6/-3
5/3/-2
3
11
3
20/17/-3
23/20/-3
7
2
4
4/7/3
18/18/0
3
8/8/0
11/11/0
  • Arrival time input -gt output, take max
  • Required arrival time output -gt input, take min
  • Slack required arrival time arrival time

18
Buffers Improve Slack
RAT 300 Delay 350 Slack -50
slackmin -50
RAT 700 Delay 600 Slack 100
RAT Required Arrival Time Slack RAT - Delay
RAT 300 Delay 250 Slack 50
Decouple capacitive load from critical path
slackmin 50
RAT 700 Delay 400 Slack 300
19
ITRS projections
20
Buffered global interconnects Intuition
  • Interconnect delay r.c.l2
  • Now, interconnect delay ? r.c.li2 lt r.c.l2
    (where l S lj )
  • since S (lj 2) lt (S lj )2
  • (Of course, account for buffer delay also)

21
Optimal inter-buffer length
  • First order (lumped parasitic, Elmore delay)
    analysis
  • Assume N identical buffers with equal
    inter-buffer length l (L Nl)
  • For minimum delay,

22
Optimal interconnect delay
  • Substituting lopt back into the interconnect
    delay expression

Delay grows linearly with L (instead of
quadratically)
23
Optimized interconnect delay scaling
  • Rewriting the optimal interconnect delay
    expression,
  • With optimally sized buffers (using dT/dh 0),



24
Optimized interconnect delay scaling
  • After scaling,

  • (instead of )
  • Even with optimal (re-)buffering, interconnects
    scale worse than devices
  • For global interconnects, L doesnt shrink. So




25
Buffered nets
26
Total buffer count
  • Ever-increasing fractions of total cell count
    will be buffers
  • 70 in 32nm

27
Buffer Insertion
  • Timing optimization
  • Slew optimization

28
Timing Driven Buffering Problem Formulation
  • Given
  • A Steiner tree
  • RAT at each sink
  • A buffer type
  • RC parameters
  • Candidate buffer locations
  • Find buffer insertion solution such that the
    slack at the driver is maximized

29
Candidate Buffering Solutions
30
Candidate Solution Characteristics
  • Each candidate solution is associated with
  • vi a node
  • ci downstream capacitance
  • qi RAT

vi is a sink ci is sink capacitance
v is an internal node
31
Van Ginnekens Algorithm
Candidate solutions are propagated toward the
source Dynamic Programming
32
Solution Propagation Add Wire
x
(v1, c1, q1)
(v2, c2, q2)
  • c2 c1 cx
  • q2 q1 rcx2/2 rxc1
  • r wire resistance per unit length
  • c wire capacitance per unit length

33
Solution Propagation Insert Buffer
(v1, c1, q1)
(v1, c1b, q1b)
  • c1b Cb
  • q1b q1 Rbc1 tb
  • Cb buffer input capacitance
  • Rb buffer output resistance
  • tb buffer intrinsic delay

34
Solution Propagation Merge
(v, cl , ql)
(v, cr , qr)
  • cmerge cl cr
  • qmerge min(ql , qr)

35
Solution Propagation Add Driver
(v0, c0, q0)
(v0, c0d, q0d)
  • q0d q0 Rdc0 slackmin
  • Rd driver resistance
  • Pick solution with max slackmin

36
Example of Solution Propagation
  • r 1, c 1
  • Rb 1, Cb 1, tb 1
  • Rd 1

2
2
(v1, 1, 20)
Add wire
(v2, 3, 16)
(v2, 1, 12)
v1
v1
Insert buffer
Add wire
Add wire
(v3, 5, 8)
(v3, 3, 8)
v1
v1
slack 5
slack 3
Add driver
Add driver
37
Example of Merging
Left candidates
Right candidates
Merged candidates
38
Solution Pruning
  • Two candidate solutions
  • (v, c1, q1)
  • (v, c2, q2)
  • Solution 1 is inferior if
  • c1 gt c2 larger load
  • and q1 lt q2 tighter timing

39
Pruning When Insert Buffer
They have the same load cap Cb, only the one with
max q is kept
40
Generating Candidates
From Dr. Charles Alpert
41
Pruning Candidates
42
Candidate Example Continued
43
Candidate Example Continued
After pruning
44
Merging Branches
45
Pruning Merged Branches
46
Van Ginneken Example
(20,400)
Wire C10,d150
Buffer C5, d30
(30,250) (5, 220)
(20,400)
Buffer C5, d50 C5, d30
Wire C15,d200 C15,d120
(30,250) (5, 220)
(45, 50) (5, 0) (20,100) (5, 70)
(20,400)
47
Van Ginneken Example Contd
(30,250) (5, 220)
(45, 50) (5, 0) (20,100) (5, 70)
(20,400)
(5,0) is inferior to (5,70). (45,50) is inferior
to (20,100)
Wire C10
(30,250) (5, 220)
(20,100) (5, 70)
(30,10) (15, -10)
(20,400)
Pick solution with largest slack, follow arrows
to get solution
48
Basic Data Structure
Worse load cap
(c1, q1)
(c2, q2)
(c3, q3)
Better timing
  • Sorted list such that
  • c1 lt c2 lt c3
  • If there is no inferior candidates q1 lt q2 lt q3

49
Prune Solution List
Increasing c
(c1, q1)
(c2, q2)
(c3, q3)
(c4, q4)
N
N
q1 lt q2 ?
q1 lt q3 ?
q1 lt q4 ?
Prune 2
Prune 3
Y
Y
N
Prune 3
q2 lt q4 ?
q2 lt q3 ?
Y
N
Prune 4
q3 lt q4 ?
N
Prune 4
q3 lt q4 ?
50
Pruning In Merging
Left candidates
Right candidates
ql1 lt ql2 lt qr1 lt ql3 lt qr2
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
Merged candidates (cl1cr1, ql1) (cl2cr1,
ql2) (cl3cr1, qr1) (cl3cr2, ql3)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
(cl1, ql1) (cl2, ql2) (cl3, ql3)
(cr1, qr1) (cr2, qr2)
51
Van Ginneken Complexity
  • Generate candidates from sinks to source
  • Quadratic runtime
  • Adding a wire does not change candidates
  • Adding a buffer adds only one new candidate
  • Merging branches additive, not multiplicative
  • Linear time solution list pruning
  • Optimal for Elmore delay model

52
Multiple Buffer Types
http//vlsitechnology.org/html/cells/vsclib013
  • r 1, c 1
  • Rb 1, Cb 1, tb 1
  • Rb2 0.5, Cb2 2, tb2 0.5
  • Rd 1

2
2
(v1, 1, 20)
(v2, 3, 16)
v1
(v2, 2, 14)
(v2, 1, 12)
v1
v1
53
Handle Polarity
Negative
-
Positive
-
-
-
-
-
-
54
Consider Cost/Power
  • A solution is also characterized by cost w
  • A solution is inferior if it is poor on all of c,
    q and w
  • At source, a set of solutions with tradeoff of q
    and w
  • w can be
  • total capacitance
  • or the number of buffers

55
Cost-Slack Trade-off
56
Data Organization
Sorted in ascending order of (c, q)
0
(c1, q1)
(c2, q2)
(c3, q3)
1
(c4, q4)
(c5, q5)
(c6, q6)
2
(c7, q7)
(c8, q8)
(c9, q9)
(c10, q10)
3
4
(c11, q11)
buffers inserted
57
Pruning Considering Cost
(ci , qi , wi) is inferior to (ck , qk , wk) if
ci gt ck , qi lt qk , wi gt wk
Prune order
Pruning within a list is same as before
0
(c1, q1)
(c2, q2)
(c3, q3)
1
(c4, q4)
(c5, q5)
(c6, q6)
2
(c7, q7)
(c8, q8)
(c9, q9)
w
How to prune a solution with wk from a set of
solutions with w ? wk?
58
Blockage Recognition
  • Delete insertion points that run over blockages

59
References
  • L.P.P.P. van Ginneken, Buffer placement in
    distributed RC-tree networks for minimal Elmore
    delay, ISCAS 1990, 865 -868.
  • J. Lillis, C.-K. Cheng, and T. T. Lin, Optimal
    wire sizing and buffer insertion for low power
    and generalized delay model, IEEE J. Solid-State
    Circuits, 31(3), pp. 437-447, 1996.
  • W. Shi and Z. Li, An O(nlogn) time algorithm for
    optimal buffer insertion, Proc. DAC 2003, pp.
    580-585.
Write a Comment
User Comments (0)
About PowerShow.com