Title: Process Variation: Modeling, Impact and Reduction Techniques
1Process Variation Modeling, Impact and Reduction
Techniques
- Yu Ching Chang Sabiha Hasan
2References
- Impact of Die-to-Die and Within-Die Parameter
Fluctuations on the Maximum Clock Frequency
Distribution for Gigascale Integration by Keith A
Bowman, Steven G. Duval1 , James D. Meindl.
Georgia Institute of Technology, Atlanta GA. - 1Intel Corp. Santa Clara, CA.
- Parameter Variations and Impact on Circuits and
Microarchitecture by Shekhar Borkar, Tanay
Karnik, Siva Narendra, James Tschanz, Ali
Keshavarzi, Vivek De.Circuit Research, Intel
Labs.
3Impact of Die-to-Die and Within-Die Parameter
Fluctuations on the Maximum Clock Frequency
Distribution for Gigascale Integration
4Process Variation
- Die-to-Die
- Affects every chip equally
- Lot-to-lot and wafer-to-wafer
- Eg. Processing temp, equipment properties, wafer
polishing, placement - A portion of within-wafer
- Eg. Resist thickness across wafer
- Within-Die
- Causes non-uniformity of electrical
characteristics across the chip - Systematic
- Eg. Aberrations in the stepper lens
- Random
- Eg. The placement of dopant atoms in the device
channel region
5The Impact of Process Variation
- Over-estimation impacts the design effort
- Increase in design time
- Increase in die size
- Rejection of otherwise good designs
- Missed market window
- Under-estimation impacts the manufacturing effort
- Compromised product performance
- Loss in overall yield
- Increase in the silicon debug time
6Contributions of the FMAX Distribution Model
- Traditionally, die-to-die variations are the
major concern, but as transistor feature size
scales down, within-die variations become more
and more significant - Both die-to-die and within-die parameter
fluctuations significantly influence the FMAX
distribution - Within-die primarily impacts the mean
- Die-to-die determines the majority of FMAX
variance - The model follows closely a wafer sort data for
0.25-µm microprocessor - The impact of parameter fluctuations is forecast
for the 180, 130, 100, 70 and 50-nm technology.
7FMAX Distribution Model
8 Statistically Generated Critical Path Delay
Distribution for D2D and WID Variations
Path 1 Path 2 Path 3 Nominal Path
Normalized µTcp 1.00 0.77 0.51 1.00
D2DsTcp/µTcp () 8.63 8.59 9.74 8.99
WIDsTcp/µTcp () 2.65 3.19 3.32 3.05
- The critical path delay density functions can be
modeled as normal distributions -
9FMAX Distribution Model
10FMAX Distribution Model
11The Impact of WID Variations on Maximum Critical
Path Delay Distribution
- A chip may contain many critical paths, all of
which must satisfy the worst case constraint - D2D distribution is independent of the of
critical paths, since it affects each path on the
chip equally - Only one distribution is needed for all critical
paths - WID variations can have non-uniform effects on
different critical paths and result in different
distributions for different paths - Completely dependent paths (correlation 1)
only one distribution is required - If not completely dependent (0 correlation lt
1) different paths must be statistically combined
12The WID Maximum Critical Path Delay Distribution
- The probability of one critical path satisfying
tmax - The probability of satisfying tmax for a chip of
Ncp critical paths - The chips WID maximum critical path delay
density function
13The Impact of WID Variations on Maximum Critical
Path Delay Distribution
14FMAX Distribution Model
15Combining the D2D and WID Maximum Critical Path
Delay Distributions
- The deviations in delay from Tcp,nom of D2D and
WID variations - The maximum critical path delay
- The maximum critical path delay density function
16FMAX Distribution Model
17Mapping the Max Critical Path Delay Distribution
to the Maximum Clock Frequency Distribution
- The max clock frequency
- The relationship between the critical path
probability and the FMAX probability
18FMAX Model Verification
19Generic Critical Path Model
- Motivation
- Critical path of D2D and WID are calculated using
statistical simulators - Process files empirically calculated to fit
measured IV data - Unclear how these parameters scale for future
generation
- The critical path delay
- For the critical path delay distribution, a WID
fluctuation model empirically derived through an
analysis of manufacturing data is used - Represents systematic within-die variations
- Device-to-device correlation as a function of the
distance between devices - Correlation is significantly influenced by
specific manufacturing capabilities
20Results of GCP Analysis
- Completely dependent gates
- Completely independent gates
- The GCP model establishes boundaries of the
actual FMAX distribution with two extreme cases
of completely systematic and completely random
WID fluctuations
21Impact of PV on Future FMAX Distributions
22Summary
- A model for maximum clock frequency (FMAX)
distribution is presented - Model predictions agrees closely with measured
data in mean, variance and shape - Model reveals that within-die variations
primarily impact the FMAX mean while die-to-die
variations the variance.
23Outline
- Variations
- Process, supply voltage, and temperature
- Impact of variations on circuits and
microarchitecture - Variation tolerance and reduction
- Process, circuit, and microarchitecture
techniques - Summary
24Frequency Leakage Current
0.18 micron 1000 samples
30
20X
25Vt Distribution
0.18 micron 1000 samples
30mV
26Frequency Distribution
27Isb Distribution
28Supply Voltage Variation
29Supply Voltage Variation
- Max Vcc specified as a reliability limit for a
process and Min Vcc is required for the target
performance. - Variations in switching activity across the die
and diversity of type of logic results in uneven
power dissipation across the die -gt results in
uneven supply voltage distribution gt temperature
hotspots across the die gt causing sub threshold
leakage variation across the die. - Power delivery impedance does not scale with Vcc
because packaging and platform technology. do not
follow the scaling trend of the CMOS process.
Therefore current delivery drops. - From the figure we see a droop in the voltage
would lead to a degradation in performance.
30Temperature Variation
Cache
70ºC
Core
120ºC
- In chip temperature variations have always posed
a challenge for performance and packaging. - Both device and interconnect performance is
affected by high temperature causing performance
degradation. - Temp. variation may also cause performance
mismatches between communicating blocks on the
same chip leading to logic or functional
failures.
31Circuit Design Tradeoffs
- Higher probability of target frequency with
- Larger transistor sizes
- Higher Low-Vt usage
- But with power penalty
32Impact of Critical Paths
- With increasing of critical paths
- Both s and m become smaller
- Lower mean frequency
33Impact of Logic Depth
- As the number of logic gates that determine the
frequency of operation reduces, the impact of
variation in device parameter increases. - Figure shows that for a 49 stage WID critical
path delay distribution is 4x smaller than device
saturation current. Whereas for a test chip with
16 stage critical path it is comparable.
34mArchitecture Tradeoffs
- Higher target frequency with
- Shallow logic depth
- Larger number of critical paths
- But with lower probability
35Variation Tolerance and Reduction
36Forward Body Bias
Router chip with body bias
1.5
1.5
1.2V
1.2V
110
C
110
C
1
1
Normalized
Normalized
operating frequency
operating frequency
450mV
450mV
0.5
0.5
0
0
0
200
400
600
0
200
400
600
Forward body bias (mV)
Forward body bias (mV)
FBB increases circuit frequency SD leakage
37Reverse Body Bias
- Method for reducing Leakage current.
- Fig shows variation of ICC with RBB for Lnom, Lwc
and the actual measured chip ICC. Optimal (min
Leakage at 500mV). - At higher values of RBB, junction leakage current
increases and overall power goes up. - Also effectiveness of RBB reduces as channel
Length gets smaller (due to short channel
effects) and lower channel doping.
RBB reduces SD leakage Less effective with
shorter L, lower VT, scaling
38Adaptive Body Bias--Results
Apply RBB
Apply FBB
Too Leaky
Too Slow
100
60
Accepted die
20
0
Higher Frequency ?
39Vcc Variation Reduction
- On die decoupling capacitors reduce DVcc
- Cost area, and gate oxide leakage concerns
40Temperature Control
Tmax frequency power
Tmax frequency power
Temperature
Temperature
Throttle
Throttle
Time (usec)
Time (usec)
- When temperature exceeds the threshold
- Lower freq (activity)
- Lower Vcc
41Summary
- Parameter variations will become worse with
technology scaling. - Robust variation tolerant circuits and
microarchitectures needed. - Multi-variable design optimizations considering
parameter variations. - Major shift from deterministic to probabilistic
design.