Title: Mehdi Alimadadi, Samad Sheikhaei,
1Energy Recovery from High-frequency Clocks using
DC-DC Converters
Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux,
Shahriar Mirabbasi, William
Dunford University of British Columbia,
Canada Patrick Palmer University of Cambridge,
UK
2Problem
- Clock power in high-performance CPUs
CPU Year Clock Power Power for Clock Clock Power
Intel McKinley 2002 (180nm) 1 GHz 130W 33 43W
Intel Montecito 2005 (90nm) 2.5 GHz 85W 30 25W
IBM Power 6 2007 (65nm) 5 GHz gt 100W 22 gt 22W
- Cause
- Charge big clock capacitor Cclk with energy
- Discharge Cclk energy to GND (WASTE IT!!)
- Repeat every clock cycle
3Primary Contribution of This Work
- Primary contribution
- Discharge Cclk using DC-DC converter instead of
GND - Use converter to power useful load (Rload)
- Integrated clock drivers with DC-DC converters
- Net savings in power
? Voltage feedback (for regulation)
Useful Load
4Summary Results
- Explore 3 main DC-DC power converter topologies
- Buck converter our previous work ISSCC 2007
- Boost converter this paper ISVLSI 2008
- Buck-boost converter this paper ISVLSI 2008
- 90nm layouts, 3GHz operation, lt 0.3mm2
Clock-only power (input) Extra power to operate converter (input) Converter output power clock energy recovered
Buck converter ISSCC2007 40mW 16mW 26mW 50
Boost converter 100mW 25mW 28mW 20
Buck-boost converter 100mW 72mW 48mW 30
5Background
6Background Typical Clocking Architecture
Bottom mesh
Final H-tree
Clock Source
Level 3 Gaters Final drivers
Level 1 Level 2 H-tree
7Background Typical Clocking Architecture
- Clock distribution
- Majority of energy used by final drivers
- Levels 1, 2
- H-trees
- Tunable delays (CVDs) to eliminate skew
- Low-swing, differential ? low power, noise
immunity - 5W of power
- Level 3
- Gaters reduce clock activity 50-85 (Power6)
- Cant eliminate all activity ? still need a clock
to compute - Final clock drivers
- Full-rail swing ? tapered inverters drive
hundreds latches, high power - H-tree with ends shorted by Mesh ? low skew, high
power - 15W to 40W of power
8Background Reducing Clock Power
- Clock distribution
- Low-swing (differential) signals
- Final drivers need full-rail
- Resonant clocking (saves 80)
- Final drivers need square clock
- Final clock drivers
- Adiabatic switching
- Low-performance, lt 100MHz
- Double-edge clocking
- Feasible, but complex flip-flops, larger loads
- Compatible with energy recovery in this paper
9Background Switch Mode Power Supplies
- Basic DC-DC converter topologies
- Buck
- Step down
- 0?? Vout ? VDD
- Boost
- Step up
- VDD ?? Vout
- Buck-boost
- Negative step up/down
- Vout ? 0
10Background Switch Mode Power Supplies
- DC-DC buck converter
- CMOS inverter as power switches
- Implementation of zero-voltage switching (ZVS)
- Turn on NMOS when Vinv 0
- Turn on PMOS when VinvVdd
11Background
- ISSCC 2007 Design
- ZVS ? delay circuit
- Integrated clock driver / power converter
12Integration of Clock and SMPS
- CPU clock 3GHz clock and large Cclk
- SMPS large Mp, Mn drive chain
13Integration of Clock and SMPS
- Combine the driver circuits
14Key Concept Energy Recycling
- Benefits
- Shared driver chain
- Cclk added to SMPS
- Red path
- NMOS drains Cclk ? wastes charge!
- Blue path
- Delay NMOS turn-on ? recovers clock charge!
- ZVS (zero voltage switching) in power electronics
15ZVS Detailed Operation
- ZVS delay circuit D
- Delay only rising edge of Vn
- Implemented inside the clock chain
16ZVS Detailed Operation (Mode 1)
D Duty cycle Tsw Switching period
- Mode 1 (0 lt t lt D?Tsw)
- Mp is ON
- Current builds up in the inductor
- Cclk charges up
17ZVS Detailed Operation (Mode 2)
- Mode 2 (D?Tsw lt t lt D?TswTzvs)
- Both power transistors are OFF
- Inductor current discharges Cclk
- Cclk charge is recycled to output load
D Duty cycle Tsw Period Tzvs ZVS delay
18ZVS Detailed Operation (Mode 3)
D Duty cycle Tsw Period Tzvs ZVS delay
- Mode 3 (D?TswTzvs lt t lt Tsw)
- Mn turns ON when Vclk ? 0
- ZVS for Mn
- Inductor current decreases linearly
19Detailed Operation
- ZVS delay circuit for Mn
- Delay rising edge of Vn
20Detailed Operation
- ZVS delay circuit for Mn
- Falling edges of Vp and Vn are synchronized
21Simulation Voltages
22 Simulation Currents
23Effective Efficiency
- How to measure power efficiency after clock
drivers are integrated with DC-DC converters ? - Converter gets free energy from clock
- Effective efficiency how efficient a regular
(standalone) power converter must be to equal the
efficiency of integrated clock/power converter - Raw efficiency Effective efficiency
24Buck Converter Simulation Results
- Open loop converter (no regulation)
- Higher efficiency at lowest duty cycle
becauseonly a fixed amount of energy is
available from Cclk
25ISSCC 2007
- 90nm test chip 1mm2, buck converter 0.27mm2
26Buck Converter Chip Measurement vs. Simulation
Results
- Chip Measurement Simulation
(3GHz)
27ISVLSI 2008New Design 1
28Boost Converter
- Basic operation
- Vclk provides power timing
- 0th order result Vout D/(1-D)Vdd
29Boost Converter
30Boost Converter Simulation Results
- Open loop converter (no regulation)
- Higher efficiency at lowest duty cycle
becauseonly a fixed amount of energy is
available from Cclk
31ISVLSI 2008New Design 2
32Buck-boost Converter
- Basic operation
- Vclk provides power timing
- 0th order result Vout -D2/(1-D)Vdd
33Buck-boost Converter
34Buck-boost Converter
- Open loop converter (no regulation)
- Higher efficiency at lowest duty cycle
becauseonly a fixed amount of energy is
available from Cclk
35Results and Comparisons
36Summary Results
Clock-only power (input) Extra power to operate converter (input) Converter output power clock energy recovered
Buck converter ISSCC2007 40mW 16mW 26mW 50
Boost converter 100mW 25mW 28mW 20
Buck-boost converter 100mW 72mW 48mW 30
- 90nm layouts, 3GHz operation, lt 0.3mm2
37Comparative Results
- IBM Power6 100W_at_1V, 341mm2 ? Cclk 13pF/mm2
- Other work fully on-chip DC-DC buck converter
- S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A
Multi-Stage Interleaved Synchronous Buck
Converterwith Integrated Output Filter in a
0.18µm SiGe Process," ISSCC 2006, pp. 356357 - 27mm2, 45MHz
- 65 power efficiency
- This work
- 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor
area, 3GHz - Cclk 20pF, equiv to 1.6mm2 of Power6 area
- DC-DC converter adds 12.5 area overhead
- LC filter 310pH inductor, 350pF capacitor
- L and C similar and dominate layout area ? can
stack to cut area in half - Buck 75 185 effective power efficiency (50
recovered) - Boost 25 110 effective power efficiency (20
recovered) - Buck-boost 20 66 effective power efficiency
(30 recovered)
38Conclusion
- Key concepts
- High switching frequency ? saves area
- Combined drivers ? saves area and switching loss
- Recycled charge ? converter load discharges Cclk
- ZVS delay circuit ? lower power loss
- Limitations
- Regulation needs variable duty cycle clock
- May introduce additional clock jitter
- Mostly suitable for edge-triggered blocks
- (no latches)
- Future work
- Lots of improvements to make!
39Thank you!