Title: A Provably Good Approximation Algorithm for Power Optimization Using Multiple Supply Voltages
1A Provably Good Approximation Algorithmfor Power
OptimizationUsing Multiple Supply Voltages
Hung-Yi Liu,Wan-Ping Lee,and Yao-Wen
Chang National Taiwan University
2Outline
- Introduction
- Problem Formulation
- Algorithms
- Experimental Results
- Conclusions
3Multiple-Supply-Voltage (MSV) Design
- be effective for dynamic power reduction
- dynamic power 0.5kfCVdd2
- k switching activity f clock frequency C
load capacitanceVdd supply voltage - Trade timing slacks for power reduction
undertiming requirements - low Vdd reduces power consumption but slows
device speed - high Vdd improves device performance but incurs
morepower consumption
low Vdd
high Vdd
gate level MSV design
4Our Contributions
- Prove the NP-hardness of our problem
- Propose an efficient approximation algorithm,
which is - theoretically one order faster than the recent
work - optimal for a restricted version of the NP-hard
problem - an a2-approximation
- a is the constant ratio of the maximum to minimum
Vdds
n of functional units k of available
Vddsd of Vdds in the final design
5Input of Voltage Partitioning Problem (VPP)
- a set F of n functional units, u1,,un each ui
has - a load capacitance ci
- an initially assigned Vdd vi, s.t. the timing
requirement is satisfied - a set of k available voltages
- an integer d 2 ( of voltage domains in a final
design)
d 2
6 functional units
0.8 1.0 1.1 1.2
4 available voltages
6Output of VPP
- Define the energy e(F) of a set F of m functional
units as - where v(F) maxi1,,m vi
- Find a d-partition, F1,, Fd, such that the
total energy e(F1)e(Fd) is the minimum
e(u1,,u6) (0.22.03.01.51.01.5)1.22
13.25
e(u1, u2,u3)e(u4,u5,u6) (0.22.03.0)1.02
(1.51.01.5)1.22 10.96
energy saving (13.25 10.96) / 13.25 17.28
7NP-Hardness of VPP
- Prove the NP-hardness by problem restriction of
VPP to the NP-complete number partitioning
problem (NPP) - NPP Given a set of positive integers, p1,,pn,
determine if there is a bi-partition, P1, P2,
of these integers, s.t. - proof sketch (reducing NPP to VPP)
- in VPP, set d 2
- in VPP, set the only available Vdd 1.0
- in VPP, set the capacitances equal to the
integers - apply Jensens inequality to answer NPP
- Cannot solve VPP in polynomial time unless P NP
8Overview of Our Algorithm
- Stage 1 functional unit rearrangement
- rearrange functional units into a non-decreasing
order sorted by their initial Vdds - Stage 2 ordered voltage partitioning
- apply dynamic programming to optimally solve the
orderedVPP (OVPP)
functional units
initial Vdds
1.0 0.8 1.0 1.2 1.0
1.1
0.8 1.0 1.0 1.0 1.1
1.2
9Functional Unit Rearrangement
- Avoid spreading high-initial-Vdd functional
unitsinto each cluster - the energy of a set of functional units is
dominated by the maximum initial Vdd in the set - Apply the concept of bucket sort for the
rearrangement
10Dual-Vdd Partitioning
orderedVdd
optimal bi-partition cut
0.8 1.0 1.0 1.0 1.1
1.2
2
visited functional unit
un-visited functional unit
examined cut position
11Triple-Vdd Partitioning
orderedVdd
optimal tri-partition cutsleft right
0.8 1.0 1.0 1.0 1.1
1.2
3
u1
u4
u2
u3
u5
p 3
2
u6
2
4
5
2
5
6
functional unit
optimal left-cut position
examined right-cut position
If the case p 5 has the minimum total
energy,the optimal triple-Vdd partition is u1,
u2,u3,u4, and u5,u6.
12Algorithm Optimality and Complexity
- Our dynamic-programming-based algorithm is
optimal for the ordered VPP (OVPP) - Any algorithm is an a2-approximation algorithm
for VPP - a is the constant ratio of the maximum to the
minimumavailable Vdds - if the maximum (minimum)available Vdd is 1.2
(0.8) V, a2 2.25 - Our algorithm never produces solutions achieving
the performance bound a2 - Our algorithm requires O(kn) time and O(n) space
to find bi- and tri-partitions - k (n) is the of available Vdds (functional
units)
13Experiment Setup
- Platform C, Sun Blade-2000 workstation (900
MHz CPU) running SunOS 5.9 - Benchmark generated by a pseudo-random
data-flow-graph generator Dick et al., CODES-98 - the size of benchmarks ranges from 1,000 to
10,000 - the delay and capacitance of a functional unit
are with means 100ns and 20pF, respectively - Available Vdd 0.8, 1.0, 1.2, 1.4, and 1.6 (V)
14Triple-Vdd Partitioning
- Power saving ranges in 67.3468.02
- our algorithm saves exactly same power as the
previous work - Our algorithm can finish each partitioning in
0.04 seconds - the speedups over the previous work range in
36255X
15Conclusions
- Have proven that VPP is NP-hard
- Have proposed an approximation algorithm, which
- runs theoretically one order faster than the
previous work - is optimal for ordered VPP
- Have shown that our algorithm runs empirically
fast and the solution quality still equals the
recent work
16 Thank You! Hung-Yi Liu daniel_at_eda.ee.ntu.edu.tw