Title: Transistor and Gate Sizing
1Transistor and Gate Sizing
- Prof. David Pan
- dpan_at_ece.utexas.edu
- Office ACES 5.434
Thanks to Jason Cong, Chris Chu, David Kung
2Transistor/Gate Sizing Optimization
- Given Logic network with or without cell library
- Find Optimal size for each transistor/gate to
minimize delay, or area or power under delay
constraint - Transistor sizing versus gate sizing (gt device
sizing)
3Device Sizing
- Device sizing is one of the key techniques for
circuit optimization - For standard cell type of designs, gate sizing
- For example of inverters, INV-A, INV-B, INV-C, ,
INV-N, , each having a different driving
capabilities. - For microprocessor or custom designs, more
fine-grained control, transistor sizing - For each transistor
- Main techniques
- Analytical formula e.g., driver sizing
Lin-Linholm, JSSC75 - Greedy algorithm Cong et al, 1996, Chen et
al, ICCAD 1998 - Mathematical programming
- Static sizing based on timing analysis and
consider all paths at once Fishburn-Dunlop,
ICCAD85Sapatnekar et al., TCAD93
Berkelaar-Jess, EDAC90Chen-Onodera-Tamaru,
ICCAD95 - Dynamic sizing based on timing simulation and
consider paths activated by given patterns Conn
et al., ICCAD96
4(I) Driver Sizing
- Given
- A chain of cascaded drivers driving a load
- Ignore the interconnect between drivers (i.e.
assume driver and load CL is closer enough) - Obtain
- Optimize the driver sizes to minimize delay, or
minimize total area while meeting target delay - Lin-Linholm, JSSC75 a classic result without
wiring consideration - Delay and area/power tradeoff
5An Early Work on Driver Sizing
Lin-Linholm, JSSC75
d1
dk
d2
CL
- Constant stage ratio,
- if the number of drivers is not fixed,
- Interconnect is modeled as a lumped capacitor
6Driver Sizing with Power Minimization
- Example Rabaey, 1996
- An on-chip min-size inverter under 1.2um CMOS,
with C010fF, t00.2ns, drives an off-chip load
CL20pF, tB10ns, CL/C02000
- Delay optimal driver sizing
- Power Optimal driver sizing find min N s.t.
7(II) Gate Sizing by Greedy Algorithm
- Cong et al, 1996 formulated the gate sizing
into a weighted delay formulation - Each sink has some weight according to its
criticality - Minimize the total weighted delay to all sinks
- It can be shown that greedy algorithm (a.k.a.
local refinement) will converge to an optimal
solution to minimize the weighted delay - Size one gate at a time
- Due to the convexity nature (under the Elmore
delay model) - Chen et al, ICCAD 1998 extends the LR into
Lagrangian relaxation and use the Lagragian
multiplier as weight to solve constrained
problems
8Gate Sizing by Local Refinement
i
.
xi
.
Area of gate i
Delay of gate i
Delay of predecessors
9The Minimum
10Iterate until converge
11Dominance Property
Key Idea!
12Convergence
13(III) Transistor Sizing with Convex Programming
- Problem statement (delay as a constraint)
- minimize Area(x)
- subject to Delay(x) ? Tspec
- or
- minimize Power(x)
- subject to Delay(x) ? Tspec
Comb. Logic
14Mathematical Background
- n - dimensional space
- Any ordered n-tuple x (x1, x2, ... , xn) can
be thought of as a point in an n-dimensional
space - f(x1,x2, ..., xn) is a function on the
n-dimensional space - Convex functions
- f(x) is a convex function if given
- any two points x a and x b, the
- line joining the two points lies
- on or above the function
- Nonconvex f
f(x)
f(x)
x
xa
xb
x
xb
xa
15Math Background (Contd)
- Convex functions in two dimensions
-
- f(x1,x2) x12 x22
- Formally, f(x) is convex if
- f(? xa 1 - ? xb) ? ? f(xa) 1 - ? f(xb) 0
? ?? 1
Another way to check convex function f gt 0
16Math Background (Contd)
- Convex sets
- A set S is a convex set if given any two points
xa and xb in the set, the line joining the two
points lies entirely within the set - Examples
- Shape of Shape of a
- Wyoming pizza
- Nonconvex Sets
- Shape of CA Silhouette of
- the Taj Mahal
17Math Background (Contd)
- Mathematical characterization of a convex set S
- If x1, x2 ??S, then
- ? x1 (1 - ?) x2 ??S, for 0 ? ?? 1
- If f(x) is a convex function, f(x) ? c is a
convex set - An intersection of convex sets is a convex set
x 2
x 1
18Math Background (Contd.)
- Convex programming problem
- minimize convex function f(x)
- such that ??fi(x) ? ci
- Global minimum value is unique!
- (Nonrigorous) explanation
- (from The Handwavers Guide
- to the Galaxy)
f(x)
x
xa
xb
19Math Background (Contd.)
- A posynomial is like a?polynomial except
- all coefficients are positive
- exponents could be real numbers (positive or
negative) - Are these posynomials?
- 6.023 x11.23 4.56 x13.4 x27.89 x3-0.12
- x1 - 9.78 x24.2 x3-9.1
- (x1 2 x2 2 x3 5)/x1 (x3 2 x4 3)/x3
20Math Background (Contd.)
- In any posynomial function f(x1, x2, ... , xn),
- substitute xi exp(zi) to get F(z1, z2, ... ,
zn) - Then F(z1, z2, ... , zn) convex function in
(z1,... , zn) ! - minimize (posynomial objective in xis)
- s.t. (posynomial function in xis)i ? K for 1 ? i
? m - xi exp(zi)
- minimize (convex objective)
- over a convex set
- Therefore, any local minimum is a global minimum!
21Properties of Transistor Sizing under the Elmore
Model
- x is the set (vector) of transistor sizes
- minimize Area(x) subject to Delay(x) ? Tspec
- Area(x) ? i 1 to n x i (posynomial!)
- Each path delay ? R C
- R ? xi-1, C ? xi ??posynomial path delay
function - Delay(x) ? Tspec ????Pathdelay(x) ? Tspec for
all paths - Therefore, problem has a unique global min. value
22TILOS (TImed LOgic Synthesis)
- Philosophy
- Since min. value is unique, a simple method
should find it! - Problem
- minimize Area(x) subject to Delay(x) ? Tspec
- Strategy
- Set all transistors in the circuit to minimum
size - Find the critical path (largest delay path)
- Reduce delay of critical path, but with a minimal
increase in the objective function value - (TILOS is a registered trademark of Lucent
Technologies. The DA Group of Lucent was acquired
by Cadence in 1998)
23TILOS (Contd.)
- minimize Area(x) subject to Delay(x) ? Tspec
- Find ?D/?A for all transistors on critical path
- Bump up the size of transistor with the largest
?D/?A - x i ? M x i a (default M 1 a 1 contact
head width)
OUT
IN
Critical Path
Circuit
24Sensitivity Computation
- D(w) K Rprev (Cu . w)
- Ru . C / w
- ?D/?w Rprev . Cu - Ru . C / w2
- Could minimize path delay by setting derivative
to zero - Problem may cause another path delay to become
very high!
Rprev
C
w
1
Cu, Ru are unit width transistor input
capacitance and effective resistance
25Why Isnt This THE Perfect Solution?
- Problems with interacting paths
- (1) Better to size A than to size all
- of B, C and D
- (2) If X-E is near-critical and A-D is
critical, size A (not D) - False paths, layout considerations not
incorporated - AND YET..
- TILOS (the commercial tool) gives good solutions
- It has handled circuits with big size (e.g., 250K
transistors) - It has linear time performance with increasing
circuit size
B
C
D
E
X
26iCONTRAST Sapatnekar et al, TCAD 1993
- Solves the convex optimization problem exactly
- Uses an interior point method that is guaranteed
to find the optimal solution - Can handle circuits with about thousand of
transistors
Delay spec. satisfied
Optimal solution
27(Convex) Polytopes
- Polytope n-dimensional convex polygon
- Half-space aT x ? b (aT x b is a hyperplane)
- e.g. a1 x1 a2 x2 ? b (in two dimensions)
- Polytope intersection of half-spaces, i.e.,
- a1T x ? b1
- AND a2T x ? b 2
- AND amT x ? bm
- Represented as A x ? b
28Convex Optimization Algorithm Vaidya, 1992
- (1) Enclose solution within a polytope
(invariant) - Typically, take a box represented by
- wi ? wMAX and wi ? wMIN
- as the starting polytope.
- (2) Find center of polytope, wc
- (3) Does wc satisfy constraints (timing specs)?
- Take transistor widths corresponding to wc and
perform a static timing analysis - (4) Add a hyperplane through the center so that
the solution lies entirely in one half-space - Hyperplane equation depends on feasibility of wc
29Equation of the New Half-Space
- Half-space ??f (wc) . w ? ??f (wc) . wc
- If wc is feasible
- then f objective function
- Find gradient of area function
- If wc is infeasible
- then f violated constraint
- Find gradient of critical path delay
wc
30Illustrative Example
S
S
f (w) c, f decreasing
solution
S
S
31Calculating the Polytope Center
- Finding exact centroid is computationally
expensive - Estimate center by minimizing log-barrier
function - F(x) - ?i1 to m log (aiT x - bi)
- Happy coincidence
- F(x) is a convex function!
- Physical meaning
- maximize product of perpendicular
- distances to each hyperplane
- that defines the polytope
32(IV) Other Methods and Issues
- LP-based approaches
- Model gate delay as a piecewise linear function
- Parameters
- transistor widths wn , wp
- fanout transistor widths
- input transition time
- Formulate problem as a linear program (LP)
- Use an efficient simplex package to solve LP
33Power-Delay Sizing
- minimize Power(w)
- subject to Delay(w) ? Tspec
- Area ? Aspec
- Each gate size ? Minsize
- Power dynamic power
- short-circuit power
34Dynamic Power
POST-IT
- Dynamic Power
- Power required to charge/discharge capacitances
- Pdynamic CL Vdd2 f pT
- CL load capacitance, f clock frequency, pT
transition probability - Posynomial function in ws (if pT constant)
- Constitutes dominant part of power in a
well-designed circuit - Minimize dynamic power ? minimize CL
- ? minimize all transistor sizes!
- RIGHT? (Unfortunately not!)
35Short-Circuit Power
POST-IT
- Short-circuit Power
- Power dissipated with direct Vdd to ground path
- Approximate formula by Veendrick (many
assumptions) - Pshort-ckt ???????Vdd -2VT)2???f pT
- ? transconductance,??? transition time
- Posynomial function in ws (if pT const)
- Other (more accurate) models table lookup,
curve-fitting - Less than 10-20 of total power in a
well-designed circuit - So whats the catch?
36The Catch
- Delay of gate A is large
- Therefore, the value of ??for B, C, ... , H is
large - Therefore short-circuit power for B, C, ... , H
is large - Can be reduced by reducing the delay of A
- In other words, size A!
- Tradeoff dynamic and short-circuit power!
- Minpower ? minsize
B
C
D
E
X
F
G
H
37Transistor/Gate Sizing Borah-Owens-Irwin,
ISLPD95, TCAD96
Optimal transistor size
CI int. cap
38Power Optimal Sizes and Corresponding Power
Savings
39Power-Delay Optimization
40Power, Delay and Power-Delay Curves
41Power-Delay Optimal Transistor Sizing Algorithm
- Power-Optimal initial sizing
- Timing analysis
- While exists path-delay gt target-delay
- Power-delay optimal sizing critical path
- if path-delay gt target-delay
- upsize transistor with minimum power-delay slope
- if path-delay lt target-delay
- downsize transistor with minimum power-delay
slope - Incremental timing analysis
42Effect of Transistor Sizing
43Other Results
- IBMs Einstuner
- Conn et al., ICCAD96, DAC2002
- Simulation based, thus more accurate, no false
path problems - Need good input vectors good for circuits for
which critical paths are known and limited - Relatively slower (but OK for uP macro tuning)
- Lagragian Relaxation
- Chen et al, ICCAD98, www-cad.eecs.berkeley.edu/
cad-seminar/spring00/slides/charlie.ppt - Tennakoon and Sechen, ICCAD02 much faster
- Gain based synthesis and sizing (logic effort)
- Gate sizing for other metrics
- Noise reduction Becer et al, DAC03
- Low power (combine with M-Vth) Choi, ISLPED03
44Misc. Slides
45Transistor Ordering
- Example Pradad-Roy, IWLPD94
- Problem Find the best ordering of transistors in
each gate, s.t. delay and/or power is minimized - Comment No (or little) penalty on circuit area !
46How to Determine the Best Transistor Order
- Example Carlson-Chen, DAC93
- CL0.2pF, all transistor W/L7
- Rise time of time 5ns d1/d21.23
- Rise time of time 1ns d1/d20.92
- No easy answer!
- Need to evaluate using SPICE or switch-level
simulation
47Determine the Best Transistor Order at Each Gate
- Exhaustive Search
- Enumerate all possible permutations Prasad-Roy,
IWLPD94 - Use SP-BDD to enumerate all possible ordering for
serial-parallel circuits Glebov-Blaauw-Jones,
ISLPD95 - Heuristic Search
- Try top critical (slowest input closest to output
node) - Try bottom-critical (slowest input closest to
power node) gt Choose the best Carlson-Chen,
DAC93 - Pre-characterize each cell in a fixed
library gt Connect the slowest input to pins
with the smallest delay Prasad-Roy, IWLPD94
48Optimal Transistor Ordering for Entire Circuit
- Iterative approach, local optimal ordering at
each gate - Example Prasad-Roy, IWLPD94
- Phase 1 Delay minimization
- Forward traversal to compute delay to each gate
- Backward traversal to compute slack at each
gate When encountering a gate with negative
slack Optimal transistor ordering for this gate
Forward traversal to update delay slack - Phase2 Power minimization (similar to phase 1)
49Experimental Results Prasad-Roy, IWLPD94