Linear Programming for Sizing, Vth and Vdd Assignment - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Linear Programming for Sizing, Vth and Vdd Assignment

Description:

combinational gate level Verilog netlist. no. linear program to minimize power. and ensure TTmax ... Synthesized from Verilog by folks at the University of Michigan ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 39
Provided by: dep6155
Category:

less

Transcript and Presenter's Notes

Title: Linear Programming for Sizing, Vth and Vdd Assignment


1
Linear Programming forSizing, Vth and Vdd
Assignment
  • David Chinnery, Kurt Keutzer

2
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

3
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

4
The standard approach to gate sizing
  • Best power reduction for minimum delay increase
    for a gate is
  • How do we choose which gates to change to
    minimize circuit power?
  • Is there enough timing slack to change a gate?
  • Greedy approach
  • Pick gate with highest sensitivity that has
    sufficient slack
  • Iteratively change gates, until no further
    changes possible

5
But changing a gate affects other paths
  • Considering only the highest sensitivity gate is
    suboptimal.

AND2X1 delay 2ns, power 1mWAND2X2 delay 1ns,
power 2mW
AND4X1 delay 2ns, power 2mWAND4X2 delay 1ns,
power 4mW
X2
X2
X1
How do we find this solution?
max sensitivity
X2
X2
X1
X2
X1
X2
X2
X2
X1
X2
X2
X1
Power 12mWDelay 2ns
Power 10mWDelay 3ns
Power 8mWDelay 3ns
6
Design Compiler is suboptimal
Design Compiler delay minimized netlist
X4
a
Power 1.71mWDelay 0.09ns
X4
X4
X2
c
f
d
ISCAS85 benchmark c17
X20
X4
b
X4
X20
X4
X4
g
e
Linear program power minimized
Design Compiler power minimized
X1
X1
Power 0.76mWDelay 0.11ns
Power 0.86mWDelay 0.11ns
a
a
X2
X2
X4
X2
X4
X2
c
c
f
f
d
d
X8
X8
X4
X2
b
b
X4
X2
X4
X4
X1
X2
X4
X2
e
g
e
g
7
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

8
What do we model in the linear program?
  • How do gate size/Vth/Vdd changes affect the gate,
    fanins, and fanouts?
  • Delay and slew
  • Input capacitance Cin affects fanin slews and
    delays
  • Drive strength affects the gates slew and delay
  • Slews affect delay slew of the transistive
    fanout
  • Active power (? is the switching activity of a
    given wire)
  • Input capacitance Cin affects dynamic power
    ?inCinVdd2
  • Internal capacitance and short circuit current
    change internal power
  • Dynamic switching power of load changes with Vdd,
    ?outCloadVdd2
  • Slew affects internal power of transitive fanout
    from the gates fanins
  • Static power
  • Gates leakage changes exponentially with Vth
  • PMOS leakage of gates fanouts change with
    driving Vdd

9
What do we model in the linear program?
  • How do gate size/Vth/Vdd changes affect the gate,
    fanins, and fanouts?
  • Delay and slew
  • Input capacitance Cin affects fanin slews and
    delays
  • Drive strength affects the gates slew and delay
  • Slews affect delay slew of the transistive
    fanout impact is additive
  • Active power (? is the switching activity of a
    given wire)
  • Input capacitance Cin affects dynamic power
    ?inCinVdd2
  • Internal capacitance and short circuit current
    change internal power
  • Dynamic switching power of load changes with Vdd,
    ?outCloadVdd2
  • Slew affects internal power of transitive fanout
    from the gates fanins
  • Only affects short circuit power, which is small
    part of total power
  • Static power
  • Gates leakage changes exponentially with Vth
  • PMOS leakage of gates fanouts change with
    driving Vdd
  • Different PMOS leakage when Vin gt Vdd was not
    modeled in the libraries

allmajor
95
5
10
Impact of slew propagating on path delay
  • Calculate maximum transitive fanout delay/slew
    bmax sensitivity if Dsingt0
  • Calculate minimum transitive fanout delay/slew
    bmin sensitivity if Dsinlt0
  • Several options optimality varies within about
    2
  • Calculate b over all alternate cells for a gate
    not just current cell pessimistic
  • Typical values are bmin 0.0 and bmax 0.3 over
    alternate cells and all conditions
  • Calculate b about current sin and Cload
    conditions optimistic if outside range
  • Typical values are bmin 0.1 and bmax 0.2 for
    current cells about current conditions

11
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

12
How to choose the best alternative cell?
  • ?P is just one summed value for the gate
  • ?d depends on the timing arc
  • If gate PN drive strengths are balanced, then
    change in delay on each timing arc will be
    similar
  • Could use worst delay change on any timing arc
  • With multiple voltages
  • Must include added/removed level converter delay
  • Rise and fall delay changes differ, e.g. if Vin gt
    Vdd, ?drise gt ?dfall due to input delay from VDDH
    to VDDL
  • Could use total change in delay on all timing
    arcs
  • On average the two approaches are within 1, but
    in some cases one can be 4 better than the other
  • Havent tried weighting by slack on each timing
    arc
  • Then pick the best alternative cell for a gate
    from ?d and ?P as described in the paper.

13
Encoding alternative cells in the linear program
  • A gate may only change to the best cell
    alternative
  • Variable g?0,1 assigned by the linear program
    indicates whether the alternative cell is used
  • g 0 alternative is not used
  • g 1 alternative is used
  • g?(0,1) might be able to use alternative need
    to pick appropriate thresholds
  • In the linear program, the change in delay and
    power by switching to the alternate gate are
    multiplied by g. For example

14
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

15
Linear program constraints
  • Objective
  • Minimizing power with delay constraint Tmax
  • Minimizing delay (want lt Tmax), with small weight
    on power
  • Maximum delay out of combinational outputs
  • Example delay constraint

16
Optimization flow
switching activities state probabilities SAIF
calculate cell with best delay reduction for
each gate
delay minimization with Design Compiler
linear program to reduce delay to below Tmax,
without increasing power too much
calculate cell with best power reduction for
each gate
linear program to minimize powerand ensure TTmax
change gates to reduce delay
TTmax
change gates with sufficient slack
no
yes
TTmax
no
  • Calculate max and min Dd/Dsin and Dsout/Dsin for
    each cell from library
  • Open source COIN LP solver from IBM
  • Cell assignment thresholds of g gt 0.01 for delay
    reduction (Ddlt0), and g gt 0.99 for power (Ddgt0)

yes
?P ? ?
yes
no
17
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

18
Experimental conditions
  • Same setup as used by University of Michigan
    researchers
  • Input slew of 0.1ns and output port load of 3fF
  • PowerArc characterized libraries for
    STMicroelectronics 0.13um HCMOS9D process
  • Temperature 25C, VDD 1.2V, Vth 0.23V
  • 9 sizes for inverter XL, X1, X2, X3, X4, X8,
    X12, X16, X20
  • 4 sizes for NAND2, NAND3, NOR2 and NOR3 XL, X1,
    X2, X4
  • ISCAS85 combinational benchmarks larger
    combinational benchmarks from Professor Nikolics
    group
  • Synthesized from Verilog by folks at the
    University of Michigan
  • We used Design Compiler to minimize the delay
  • Switching activities and state probabilities
    determined from VCS with 10,000 iterations with
    random, independent inputs
  • Results in the remainder of the presentation are
    updated from the paper (optimization improvements
    several corrections).

19
Linear program (LP) sizing vs. Design Compiler
(DC)
20
The delay minimization phase is very important!
Tmax is 0.934ns
5.90mW
31 powersavings
minimum of 4.12mW
21
Runtime vs. circuit size
  • Time to set up the linear program, analyzing
    delay and power for alternate cells for gates, is
    linear with circuit size
  • Runtime for the LP solver dominates for larger
    circuits, varying between about O(V) and
    O(V2)
  • The LP solver uses the simplex method, with worst
    case exponential runtime
  • There are methods with guaranteed polynomial
    runtime

22
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

23
Voltage scaling, multi-Vdd/Vth issues
  • Single voltage scaling vs. multi-Vdd and/or
    multi-Vth
  • Multiple supply costs layout density, routing
    resources
  • Multiple thresholds costs more masks
  • Synchronous vs. asynchronous level converters?
  • Cant have low VDDL input into high VDDH gate due
    to static power consumption
  • Either convert to VDDH at flip-flop,synchronous
    level converter
  • Or have asynchronous level converters
  • Must factor in power and delay overhead for
    converter
  • 80ps for level converter flip-flop, no power
    overhead reduced power due to switching Cin at
    VDDL
  • Characterized delay and power overheads for
    asynchronous level converters from Sarvesh
    Kulkarni
  • How many Vth values? How many Vdd values?
  • When are they useful?

24
Conditions for multi-Vth comparison
  • Starting from the Design Compiler delay minimized
    high Vth netlists with low Vth gates substituted
  • No need to re-optimize at low Vth, e.g. for
    c7552
  • High Vth, min delay T 0.847ns, P 27.9mW
    (0.2 leakage)
  • Low Vth, min delay T 0.695ns, P 43.6mW (5.0
    leakage)
  • Low Vth substituted for high Vth delay minimized
    T 0.695ns, P 47.9mW (5.0 leakage)
  • Delay is as good as low Vth delay minimized
    result
  • Vth values in multi-Vth runs 0.23V, 0.14V, 0.08V

25
Multi-Vth at 1 leakage for high Vth and 1.0Tmin
26
Multi-Vdd at 1 leakage for high Vth and 1.2Tmin
27
Multi-Vdd with 80ps level converter flip-flops
28
Outline
  • The standard approach to gate sizing is
    suboptimal
  • Which circuit parameters to model in the linear
    program
  • Specifying the best alternate cell for each gate
  • Formulating the linear program and optimization
    flow
  • Sizing results versus Design Compiler
  • Comparison of multi-Vth and multi-Vdd
  • Conclusions

29
Conclusions
  • This linear programming approach to
    simultaneously assign gate sizes, Vth and Vdd
    appears to scale well with circuit size, and
    gives better results than the greedy heuristic
    used by Design Compiler for sizing
  • At 1.1Tmin, average power savings of 12.0
  • At 1.2Tmin, average power savings of 16.6
  • Post-optimization with Design compiler gets
    another 1 to 3 savings
  • Its not good at delay reduction at a tight delay
    constraint, would do better by combining with a
    good delay minimizer
  • Having reduced power with a very good gate sizer,
    the benefits of multi-Vth and multi-Vdd are not
    that large at a fairly tight delay constraint
  • Dual-Vth gives 5 savings vs. optimal single Vth
    at 1.0Tmin and 1.2Tmin, in the 1 leakage at
    Vth0.23V scenario
  • Triple-Vth only improves this by a further 1.3
    at 1.0Tmin
  • Dual-Vdd gives about 4 savings vs. single Vth at
    1.2Tmin
  • Including 80ps level converter flip-flop delays
    reduces this to about 2
  • To take advantage of lower Vdd, need to relax the
    delay constraint
  • The power savings from sizing were larger than
    these!

30
Acknowledgements
  • The Semiconductor Research Corporation supported
    this research (task id 915.001).
  • STMicroelectronics provided technology files for
    their 0.13um HCMOS9D process.
  • Libraries characterized with PowerArc for this
    0.13um process were provided by Sarvesh Kulkarni,
    Ashish Shrivastava and Professor Dennis Sylvester
    at University of Michigan.
  • Huffman, SOVA_EPR4 and R4_SOVA benchmarks were
    from Professor Borivoje Nikolics group at UC
    Berkeley.

31
Extra slides
32
Adaptive optimization flow
calculate best delay reduction for each gate
delay minimization with Design Compiler
linear program to reduce delay to below Tmax,
without increasing power too much
calculate best power reduction for each gate
linear program to minimize powerand ensure TTmax
change gates to reduce delay
gt 20 iterations?
change gates with sufficient slack
yes
no
TTmax
TTmax
yes
no
no
yes
gt 3 delay iterations?
yes
no
gt 20 iterations?
no
yes
yes
try different optimization parameters
no
?P ? 1 over 5 LP runs
33
Detailed flow
sensitivity max?P/?d, ?dlt0, over a gates
alternate cells
delay minimization with Design Compiler
linear program to assign g weighted by
sensitivity, min((max-0.01,TTmax ))0.01S(gDP))
sensitivity min?P/?d, ?Plt0, over a gates
alternate cells
linear program to assign g weighted by
sensitivity, TTmax,min(S(gDP))
change gates to reduce delay(g gt 0.01)
TTmax
change gates with sufficient slack (g gt 0.99)
no
yes
TTmax
no
yes
?P ? ?
Sometimes need to weight power less in delay
minimization step (e.g. 0.001) and allow more
delay reduction (e.g. 0.02) to get back to TltTmax.
yes
no
34
LP solver runtime vs. Design Compiler
  • Runtime for the linear program solver and Design
    Compiler both vary between about O(V) and
    O(V2)
  • Design Compiler is about an order of magnitude
    faster (its running on a much slower machine)

35
Delay and power for a level converter
Larger gates increase delay due to loading fanins
increase due to level shifter delay power
Alternate cells for an inverter driving a 1.2V
gate needs a level converter if it changes to
0.8V.
Need to consider cases where both Dd lt 0 and DP lt
0.
36
Delay and power for a level converter flip-flop
delay increase with LCFF slight power reduction
as driving at 0.8V instead of 1.2V
80ps due to LCFF
Alternate cells for an inverter (size X4) driving
an output port. It needs an LCFF if it changes
to 0.8V.
37
Conditions for multi-Vth comparison (Scenario B)
38
Multi-Vth comparison at 8 leakage for high Vth
Write a Comment
User Comments (0)
About PowerShow.com