Title: Siddharth Garg, Diana Marculescu
1System-level Mitigation of WID Leakage Variations
using Body-bias Islands
- Siddharth Garg, Diana Marculescu
- Electrical and Computer Engineering
- Carnegie Mellon University
2Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
3Introduction
Process variations increase with technology
scaling Affect both leakage and frequency
Die-to-die (D2D) and within-die (WID) variations
in process parameters WID expected to be
dominant source of process variations in ITRS
roadmap
SourceFreidberg et al., ISQED05
4Adaptive Body-bias
- Powerful technique to mitigate variations in
leakage power dissipation Tschanz et al.,
JSSC02 - Each fabricated die tested and assigned
appropriate body-bias voltage based on measured
leakage current - Global Body-bias single body-bias voltage for
entire die - Multiple Body-bias die partitioned into
body-bias islands
VBBNgt0 (Forward Body-bias) leakage,
delay VBBNlt0 (Reverse Body-bias) leakage,
delay
5Body-bias Islands
PE 1
PE 2
t1
- Given
- Embedded system with P PEs
- A task graph mapped to PEs
- Variability profile of each PE
t4
t2
PE 3
t3
- Find
- Optimum partitioning into K
- body-bias islands (KltP)
- Minimize leakage under system
- performance constraints
PE 1
PE 2
Body-bias voltage
PE 3
6Body-biasing Impact
- SPICE results for a BPTM
90 nm technology
7Related Work
- Variability mitigation using voltage/frequency
islands - Recover performance by running each island at
frequency dictated by process variations Garg
and Marculescu, TODAES08 - Chip multiprocessor performance characterization
under impact of process variations Herbert and
Marculescu, DAC08 - Reduce power using independently tunable supply
voltage for each voltage island Das et al.,
ICCD07 - Body-bias island partitioning
- Gate-level partitioning Kulkarni et al.,
ICCAD06 - Circuit-level implementation details and
post-fabrication design space analysis Nakamura
et al., CICC08
8Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
9Body-bias Island Partitioning
- Start by assuming finest granularity partitioning
10Clustering Criteria
- Which processors should be clustered together?
- Cluster processors with similar body-bias voltage
distributions - Intuition what if distributions were exactly the
same? - Avoid clustering processors with large
contribution to total leakage - Should be allowed to independently select
optimum body-bias
11Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
12Computing Body-bias Distributions
- Computing body-bias distribution for each
processor - Monte-Carlo Method generate N chip instances,
compute optimum body-bias values for each chip - Used for gate-level partitioning by Kulkarni et
al. ICCAD06 - Slow but accurate
- Proposed Analytical technique
- Formulate problem as an instance of Robust Convex
Optimization Ben-Tal et al., Math. Prog.04 - Significant speed-up over MC technique with
minimal loss in accuracy and quality of solution
13Mathematical Formulation
- P Number of processors in the system (known)
- Li Leakage power random variable for processor i
(known) - Vi Body-bias random variable for processor i
(unknown) - Goal
-
- Subject to system-level performance constraints
- Search space all possible distributions of the
random vector V! - Computationally infeasible
Expected leakage after finest-grained
body-biasing
14Adjustable Robust Optimization
- Idea from Adjustable Robust Optimization
literature -
- For a system with 4 PEs, for example
- Actually, we use
where - Intuition Inversely proportional dependence of
body-bias on leakage power dissipation
L1 L2 L3 L4
S11 S12 S13 S14 S21 S22 S23 S24 S31 S32
S33 S34 S41 S42 S43 S44
V1 V2 V3 V4
Unknown real numbers
Known random vector
15Convex Program
- Problem Formulation
-
-
-
- Objective function is convex minimization over
real numbers - Constraints
- Latency constraint for every path in the task
graph - Yields a set of convex second-order cone
constraints - More details in the paper
16Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
17Clustering
- Solution of convex program yields joint
distribution of body-bias voltages for each PE in
the system - Define a distance metric dij between each pair
of PEs i and j - Measure of the desirability of PEs to be assigned
to same island
Large PEs
Similarity of body-bias distributions
18Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
19Post-fabrication Tuning
- After fabrication, leakage power dissipation of
each island is measured for each die sample - Optimum body-bias voltage set to minimize leakage
power dissipation for each die under performance
constraints - Body-bias voltages can only be controlled in
discrete steps (e.g., 32 levels using 5-bit
control by Tschanz et al. JSSC02) - Greedy algorithm starts with optimum, continuous
body-bias voltage assignments from a quadratic
program (QP) - Initialize Each voltage set to closest higher
discrete level - Iterate In each iteration, voltage of island
that provides greatest leakage reduction from
moving to lower level is reduced - Stop Iterations stop when no move is possible
without violating performance constraints
20Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
21Experimental Setup
- Results based on telecom and auto-industry
benchmarks from the E3S embedded benchmark suite - mapped to heterogeneous set of processors also
from E3S data - 3s15 of the mean of gate length, spatial
correlations die out at half the die-length
Friedberg et al., ISQED05 - All results compared with Monte-Carlo (MC) and
design time only (Static) method
22Results Accuracy of Proposed Vs. MC
Partitioning using Proposed
Results for auto-16 benchmark
Partitioning using MC
23Results Leakage Savings
24Outline
- Introduction and Motivation
- Problem Formulation
- Computing Body-bias Distributions
- Clustering
- Post-fabrication Voltage Assignment
- Experimental Results
- Conclusion
25Conclusion and Future Work
- Efficient and accurate algorithm to partition an
application specific embedded system into
body-bias islands - Average 28 reduction in mean and 48 reduction
in s.d. of leakage power dissipation for both
Proposed and MC, with a 38X speed-up in run-time
over MC. - Static performs particularly poorly in reducing
s.d. of leakage power dissipation, even with
increasing body-bias islands - Solution with 32 voltage levels within 2 of
solution with continuous range of body-bias
values - Future work
- Consider joint task-mapping, floorplanning and
body-bias island partitioning to minimize leakage
variations