Title: Life is CMOS: Why Chase the Life After
1Life is CMOSWhy Chase the Life After?
- George Sery
- Shekhar Borkar
- Vivek De
- Intel Corporation
2Outline
- Integration, performance, and power
- Device technology scaling challenges
- Leakage and leakage control techniques
- Micro architecture directions
- Summary
3Transistors
1B transistor integration capability
4Frequency
5Performance
Applications will demand TIPS performance
6Power Dissipation
Unconstrained power could reach 1,000s of watts
7Outline
- Integration, performance, and power
- Device technology scaling challenges
- Leakage and leakage control techniques
- Micro architecture directions
- Summary
8Physical Gate Length Trend
8
Facilitated by 248, 193, 157, EUV lithography
evolution
9Supply Voltage Scaling
2.5V
1.8V
1.5V
1.3V
1.1V
0.85V
0.7V
0.6V
9
10Delay Scaling Trends
1.45 Tera Hertz
0.85V
Vcc0.75V
10
1115nm NMOS Transistor
2.63 THz _at_ 0.8V
11
12Source/Drain Leakage (Ioff)
13Gate Leakage
40 DVgs ? 5X IG
14THz Transistor Architecture
High K gate
Fully Depleted Channel
Raised S/D
- Eliminates subsurface leakage
- Solves high resistance
- Minimizes gate leakage
- Eliminates floating body effect
- Minimizes soft error rates
- 50 lower junction capacitance that PD SOI
14
15Raised SD with Depleted Substrate
15
16High K Reduces Gate Leakage
1E1
1E0
TiO2
1E-1
HfO2
)
2
1E-2
Ta2O5
ZrO2
SiO2
1E-3
_at_1V (A/cm
1E-4
OX
1E-5
J
Al2O3
1E-6
1E-7
1E-8
16
0.50
1.00
1.50
2.00
2.50
3.00
3.50
2
C
(
m
F/cm
)
ox
17High-K Gate Dielectric Formed Using Atomic Layer
Deposition
Step 4
- Sequential introduction of precursors ZrCl4(g),
H2O(g) - Surface reaction between substrate each
precursor until saturation
17
18DST Outperforms PD SOI
Ideal
- DST achieves steeper sub-threshold slope and
lower Ioff than both PD SOI and bulk Si - Up to 100x lower leakage than partially
depleted SOI
18
19Raised Source and Drain Reduces Parasitic
Resistances and Improves IDSAT
1.3V
Raised S/D
1.3V
No Raised S/D
19
30 increase in drive current!
30 increase in drive current!
20 DST Enables Future Voltage Scaling
DST
BULK CMOS
20
21Outline
- Integration, performance, and power
- Device technology scaling challenges
- Leakage and leakage control techniques
- Micro architecture directions
- Summary
22Leakage Power
Excessive sub-threshold leakage power
23Dual VT Design Technique
Leakage 3X smaller (Active Standby) No
performance loss
24Leakage Control
Stack Effect
Body Bias
Sleep Transistor
Vbp
Vdd
Ve
Logic Block
Equal Loading
Vbn
-Ve
2-10X reduction
2-1000X reduction
25Effectiveness of RBB
RBB less effective at shorter L and lower VT
A. Keshavarzi et. al., 1999 2001
International Symp. Low Power Electronics
Design (ISLPED)
26SD Leakage of Stacks
Stack leakage is 5-10X smaller
S. Narendra et. al., 2001 International Symp.
Low Power Electronics Design (ISLPED)
27Stack Forcing
Delay Penalty
Leakage Reduction
Equal Loading
Stack Forcing Vs Longer L
Circuit technique provides additional VTs
28Exploiting Natural Stacks
32-bit Kogge-Stone adder
Y. Ye et. al., 1998 Symp. VLSI Circuits
29Outline
- Integration, performance, and power
- Device technology scaling challenges
- Leakage and leakage control techniques
- Micro architecture directions
- Summary
30Performance Efficiency of mArch
- Implications (in the same technology)
- New microarchitecture 2-3X die area of the last
uArch - Provides 1.5-1.7X performance of the last uArch
We are on the wrong side of a Square Law
31Design EfficiencymArch
In the same process technology, compare Scalar ?
Super-scalar ? Dynamic ?
Netburst 2-3X Growth in area 1.4X Growth in
Integer Performance 1.7X Growth in Total
Performance 2-2.5X Growth in Power
Pollacks Rule in actionPower inefficiency
32Improve mArch Efficiency
Thermals Power Delivery designed for full HW
utilization
Single Thread
ST
Wait for Mem
Multi-Threading
Wait for Mem
MT1
Wait
MT2
MT3
Multi-threading improves performance without
impacting thermals power delivery
33Special-Purpose HW
- Special-purpose performance ? more MIPS/mm²
- SIMD integer and FP instructions in several ISAs
- Integration of other platform components, e.g.
memory controller, graphics - Special-purpose logic, programmable logic, and
separately programmable engines
Improve power efficiency with Valued Performance
34Exploit MemoryLow PD
- Large on die caches provide
- Increased Data Bandwidth Reduced Latency
- Hence, higher performance for much lower power
35Summary
- CMOS scaling is alive and well
- Lower power, Terahertz operation, Scalable beyond
65nm node - Key enabling elements demonstrated
- Depleted Substrate Transistor with raised source
drain - High k gate, 15nm CMOS gate length
- Design, micro architecture approaches continue
key enabling - Significant focus on leakage and speed-power
optimization techniques