Title: Linear Programming for Sizing, Vth and Vdd Assignment
1Linear Programming forSizing, Vth and Vdd
Assignment
- David Chinnery, Kurt Keutzer
2Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
3Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
4The standard approach to gate sizing
- Best power reduction for minimum delay increase
for a gate is - How do we choose which gates to change to
minimize circuit power? - Is there enough timing slack to change a gate?
- Greedy approach
- Pick gate with highest sensitivity that has
sufficient slack - Iteratively change gates, until no further
changes possible
5But changing a gate affects other paths
- Considering only the highest sensitivity gate is
suboptimal.
AND2X1 delay 2ns, power 1mWAND2X2 delay 1ns,
power 2mW
AND4X1 delay 2ns, power 2mWAND4X2 delay 1ns,
power 4mW
X2
X2
X1
How do we find this solution?
max sensitivity
X2
X2
X1
X2
X1
X2
X2
X2
X1
X2
X2
X1
Power 12mWDelay 2ns
Power 10mWDelay 3ns
Power 8mWDelay 3ns
6Design Compiler is suboptimal
Design Compiler delay minimized netlist
X4
a
Power 1.71mWDelay 0.09ns
X4
X4
X2
c
f
d
ISCAS85 benchmark c17
X20
X4
b
X4
X20
X4
X4
g
e
Linear program power minimized
Design Compiler power minimized
X1
X1
Power 0.76mWDelay 0.11ns
Power 0.86mWDelay 0.11ns
a
a
X2
X2
X4
X2
X4
X2
c
c
f
f
d
d
X8
X8
X4
X2
b
b
X4
X2
X4
X4
X1
X2
X4
X2
e
g
e
g
7Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
8What do we model in the linear program?
- How do gate size/Vth/Vdd changes affect the gate,
fanins, and fanouts? - Delay and slew
- Input capacitance Cin affects fanin slews and
delays - Drive strength affects the gates slew and delay
- Slews affect delay slew of the transistive
fanout - Active power (? is the switching activity of a
given wire) - Input capacitance Cin affects dynamic power
?inCinVdd2 - Internal capacitance and short circuit current
change internal power - Dynamic switching power of load changes with Vdd,
?outCloadVdd2 - Slew affects internal power of transitive fanout
from the gates fanins - Static power
- Gates leakage changes exponentially with Vth
- PMOS leakage of gates fanouts change with
driving Vdd
9What do we model in the linear program?
- How do gate size/Vth/Vdd changes affect the gate,
fanins, and fanouts? - Delay and slew
- Input capacitance Cin affects fanin slews and
delays - Drive strength affects the gates slew and delay
- Slews affect delay slew of the transistive
fanout impact is additive - Active power (? is the switching activity of a
given wire) - Input capacitance Cin affects dynamic power
?inCinVdd2 - Internal capacitance and short circuit current
change internal power - Dynamic switching power of load changes with Vdd,
?outCloadVdd2 - Slew affects internal power of transitive fanout
from the gates fanins - Only affects short circuit power, which is small
part of total power - Static power
- Gates leakage changes exponentially with Vth
- PMOS leakage of gates fanouts change with
driving Vdd - Different PMOS leakage when Vin gt Vdd was not
modeled in the libraries
allmajor
95
5
10Impact of slew propagating on path delay
- Calculate maximum transitive fanout delay/slew
bmax sensitivity if Dsingt0 - Calculate minimum transitive fanout delay/slew
bmin sensitivity if Dsinlt0 - Several options optimality varies within about
2 - Calculate b over all alternate cells for a gate
not just current cell pessimistic - Typical values are bmin 0.0 and bmax 0.3 over
alternate cells and all conditions - Calculate b about current sin and Cload
conditions optimistic if outside range - Typical values are bmin 0.1 and bmax 0.2 for
current cells about current conditions
11Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
12How to choose the best alternative cell?
- ?P is just one summed value for the gate
- ?d depends on the timing arc
- If gate PN drive strengths are balanced, then
change in delay on each timing arc will be
similar - Could use worst delay change on any timing arc
- With multiple voltages
- Must include added/removed level converter delay
- Rise and fall delay changes differ, e.g. if Vin gt
Vdd, ?drise gt ?dfall due to input delay from VDDH
to VDDL - Could use total change in delay on all timing
arcs - On average the two approaches are within 1, but
in some cases one can be 4 better than the other - Havent tried weighting by slack on each timing
arc - Then pick the best alternative cell for a gate
from ?d and ?P as described in the paper.
13Encoding alternative cells in the linear program
- A gate may only change to the best cell
alternative - Variable g?0,1 assigned by the linear program
indicates whether the alternative cell is used - g 0 alternative is not used
- g 1 alternative is used
- g?(0,1) might be able to use alternative need
to pick appropriate thresholds - In the linear program, the change in delay and
power by switching to the alternate gate are
multiplied by g. For example
14Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
15Linear program constraints
- Objective
- Minimizing power with delay constraint Tmax
- Minimizing delay (want lt Tmax), with small weight
on power - Maximum delay out of combinational outputs
- Example delay constraint
16Optimization flow
switching activities state probabilities SAIF
calculate cell with best delay reduction for
each gate
delay minimization with Design Compiler
linear program to reduce delay to below Tmax,
without increasing power too much
calculate cell with best power reduction for
each gate
linear program to minimize powerand ensure TTmax
change gates to reduce delay
TTmax
change gates with sufficient slack
no
yes
TTmax
no
- Calculate max and min Dd/Dsin and Dsout/Dsin for
each cell from library - Open source COIN LP solver from IBM
- Cell assignment thresholds of g gt 0.01 for delay
reduction (Ddlt0), and g gt 0.99 for power (Ddgt0)
yes
?P ? ?
yes
no
17Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
18Experimental conditions
- Same setup as used by University of Michigan
researchers - Input slew of 0.1ns and output port load of 3fF
- PowerArc characterized libraries for
STMicroelectronics 0.13um HCMOS9D process - Temperature 25C, VDD 1.2V, Vth 0.23V
- 9 sizes for inverter XL, X1, X2, X3, X4, X8,
X12, X16, X20 - 4 sizes for NAND2, NAND3, NOR2 and NOR3 XL, X1,
X2, X4 - ISCAS85 combinational benchmarks larger
combinational benchmarks from Professor Nikolics
group - Synthesized from Verilog by folks at the
University of Michigan - We used Design Compiler to minimize the delay
- Switching activities and state probabilities
determined from VCS with 10,000 iterations with
random, independent inputs - Results in the remainder of the presentation are
updated from the paper (optimization improvements
several corrections).
19Linear program (LP) sizing vs. Design Compiler
(DC)
20The delay minimization phase is very important!
Tmax is 0.934ns
5.90mW
31 powersavings
minimum of 4.12mW
21Runtime vs. circuit size
- Time to set up the linear program, analyzing
delay and power for alternate cells for gates, is
linear with circuit size - Runtime for the LP solver dominates for larger
circuits, varying between about O(V) and
O(V2) - The LP solver uses the simplex method, with worst
case exponential runtime - There are methods with guaranteed polynomial
runtime
22Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
23Voltage scaling, multi-Vdd/Vth issues
- Single voltage scaling vs. multi-Vdd and/or
multi-Vth - Multiple supply costs layout density, routing
resources - Multiple thresholds costs more masks
- Synchronous vs. asynchronous level converters?
- Cant have low VDDL input into high VDDH gate due
to static power consumption - Either convert to VDDH at flip-flop,synchronous
level converter - Or have asynchronous level converters
- Must factor in power and delay overhead for
converter - 80ps for level converter flip-flop, no power
overhead reduced power due to switching Cin at
VDDL - Characterized delay and power overheads for
asynchronous level converters from Sarvesh
Kulkarni - How many Vth values? How many Vdd values?
- When are they useful?
24Conditions for multi-Vth comparison
- Starting from the Design Compiler delay minimized
high Vth netlists with low Vth gates substituted - No need to re-optimize at low Vth, e.g. for
c7552 - High Vth, min delay T 0.847ns, P 27.9mW
(0.2 leakage) - Low Vth, min delay T 0.695ns, P 43.6mW (5.0
leakage) - Low Vth substituted for high Vth delay minimized
T 0.695ns, P 47.9mW (5.0 leakage) - Delay is as good as low Vth delay minimized
result - Vth values in multi-Vth runs 0.23V, 0.14V, 0.08V
25Multi-Vth at 1 leakage for high Vth and 1.0Tmin
26Multi-Vdd at 1 leakage for high Vth and 1.2Tmin
27Multi-Vdd with 80ps level converter flip-flops
28Outline
- The standard approach to gate sizing is
suboptimal - Which circuit parameters to model in the linear
program - Specifying the best alternate cell for each gate
- Formulating the linear program and optimization
flow - Sizing results versus Design Compiler
- Comparison of multi-Vth and multi-Vdd
- Conclusions
29Conclusions
- This linear programming approach to
simultaneously assign gate sizes, Vth and Vdd
appears to scale well with circuit size, and
gives better results than the greedy heuristic
used by Design Compiler for sizing - At 1.1Tmin, average power savings of 12.0
- At 1.2Tmin, average power savings of 16.6
- Post-optimization with Design compiler gets
another 1 to 3 savings - Its not good at delay reduction at a tight delay
constraint, would do better by combining with a
good delay minimizer - Having reduced power with a very good gate sizer,
the benefits of multi-Vth and multi-Vdd are not
that large at a fairly tight delay constraint - Dual-Vth gives 5 savings vs. optimal single Vth
at 1.0Tmin and 1.2Tmin, in the 1 leakage at
Vth0.23V scenario - Triple-Vth only improves this by a further 1.3
at 1.0Tmin - Dual-Vdd gives about 4 savings vs. single Vth at
1.2Tmin - Including 80ps level converter flip-flop delays
reduces this to about 2 - To take advantage of lower Vdd, need to relax the
delay constraint - The power savings from sizing were larger than
these!
30Acknowledgements
- The Semiconductor Research Corporation supported
this research (task id 915.001). - STMicroelectronics provided technology files for
their 0.13um HCMOS9D process. - Libraries characterized with PowerArc for this
0.13um process were provided by Sarvesh Kulkarni,
Ashish Shrivastava and Professor Dennis Sylvester
at University of Michigan. - Huffman, SOVA_EPR4 and R4_SOVA benchmarks were
from Professor Borivoje Nikolics group at UC
Berkeley.
31Extra slides
32Adaptive optimization flow
calculate best delay reduction for each gate
delay minimization with Design Compiler
linear program to reduce delay to below Tmax,
without increasing power too much
calculate best power reduction for each gate
linear program to minimize powerand ensure TTmax
change gates to reduce delay
gt 20 iterations?
change gates with sufficient slack
yes
no
TTmax
TTmax
yes
no
no
yes
gt 3 delay iterations?
yes
no
gt 20 iterations?
no
yes
yes
try different optimization parameters
no
?P ? 1 over 5 LP runs
33Detailed flow
sensitivity max?P/?d, ?dlt0, over a gates
alternate cells
delay minimization with Design Compiler
linear program to assign g weighted by
sensitivity, min((max-0.01,TTmax ))0.01S(gDP))
sensitivity min?P/?d, ?Plt0, over a gates
alternate cells
linear program to assign g weighted by
sensitivity, TTmax,min(S(gDP))
change gates to reduce delay(g gt 0.01)
TTmax
change gates with sufficient slack (g gt 0.99)
no
yes
TTmax
no
yes
?P ? ?
Sometimes need to weight power less in delay
minimization step (e.g. 0.001) and allow more
delay reduction (e.g. 0.02) to get back to TltTmax.
yes
no
34LP solver runtime vs. Design Compiler
- Runtime for the linear program solver and Design
Compiler both vary between about O(V) and
O(V2) - Design Compiler is about an order of magnitude
faster (its running on a much slower machine)
35Delay and power for a level converter
Larger gates increase delay due to loading fanins
increase due to level shifter delay power
Alternate cells for an inverter driving a 1.2V
gate needs a level converter if it changes to
0.8V.
Need to consider cases where both Dd lt 0 and DP lt
0.
36Delay and power for a level converter flip-flop
delay increase with LCFF slight power reduction
as driving at 0.8V instead of 1.2V
80ps due to LCFF
Alternate cells for an inverter (size X4) driving
an output port. It needs an LCFF if it changes
to 0.8V.
37Conditions for multi-Vth comparison (Scenario B)
38Multi-Vth comparison at 8 leakage for high Vth