Title: Low Power, Fast, Robust Clock Signal Distribution
1Low Power, Fast, RobustClock Signal Distribution
Clock Source
Block 1
Block 2
FF
FF
FF
FF
FF
FF
FF
FF
2Traditional Clock Tree Design
3Early clock trees
- Early trees used wire width adjustment to balance
loads to reduce skew
4Buffers
- By placing buffers intelligently, power actually
reduced.
5Our first try
- Process variation, unbalanced loads, inter-line
coupling, inductive effects
6Our automated CT generator
- Balanced tree Process variation, unbalanced
loads, inter-line coupling, inductive effects
7Chip Slide
8Single Block
9Tests ObservationsResults
- Runtime
- Approach 2 Saves 20
- X16 Saves 17
- Skew Process Variation
- X16 Better by 50
Hours
mW
ns
10Tests ObservationsBuffer Area
- Buffer Area
- X8 gt X16 gt X32
70 21
DB Units
23 14.5
11Cool Delay Graphs
12Advantages
- Quick
- Brainless
- Tool approximations basically met simulation
results and actual performance - Works well on small designs
Disadvantages
- Lower limit on skew and therefore frequency
- Bad results for medium to large designs
- Not optimized for power minimization
13Design
Ease of Design
Minimal skew
Minimal power
H-Tree
Small regions
?
Design dependent
Design independent
Grid
Giant signals
Binary tree
?
Multi-VDD
Sloooowwwww systems (adiabatic systems, sine wave
clock) High VT systems
14Filling those boxes
- Traditional trees to distribute clock signals can
only get worse as frequency and chip size
increase. - Itd be nice if we could use some properties of
the high frequency and thin lines of recent
technologies in some way to benefit clock
distribution. - Higher frequency and longer lines introduce
inductive effects and with parasitic capacitance,
a transmission line model and/or resonant
circuits comes to mind.
15Ideas
- To reduce skew Somehow link clock tree as in a
grid. - Problem Power hungry as present grids are
designed (Alpha 21264). - To reduce power Use resonance to propagate
signals through grid instead of exclusively
driving from external sources. - Problem How do you do that?
16Resonant Drivers
- Blip drivers generate rounded pulses.
- The all-resonant driver requires no input signal
but still generates a rounded signal. - Superimposing a finite number of signal harmonics
gives squarer wave. But constant current hurts
power consumption performance.
W. C. Athas, Theory and Practical Implementation
of Harmonic Resonant Rail Driver, 2001 ISLPED
17Voltage Pulse-fed Harmonic Resonant Rail Driver
- Using a voltage pulse-fed systems adjusts swing
from -Vdd to Vdd, to 0 to Vdd as well as
reducing power consumption. The R between the
input and output removes high frequency
components.
W. C. Athas, Theory and Practical Implementation
of Harmonic Resonant Rail Driver, 2001 ISLPED
18Blip Grid
19Blip Grid
20Blip Grid
21Hows this become a clock Distribution network?
- Challenge The signal needs to be propagated and
locked to synchronize signal across region. - Problems with using these drivers
- Pulse-fed, square-wave generating circuits too
complicated to use in propagation - All-resonant blip driver has rounded pulses
Hurts skew and power. - Area hungry
22Design
Ease of Design
Minimal skew
Minimal power
H-Tree
Small regions
?
Design dependent
Design independent
Grid
Grid?
Giant signals
H-Tree
Binary tree
?
Multi-VDD
Sloooowwwww systems (adiabatic systems, sine wave
clock) High VT systems
Binary tree
23Other Ideas Using Transmission Line properties to
Reduce Power
- For L gt ?/4 at high frequencies, swing at output
of A is reduced. Power can be reduced at
frequencies gt 3GHz. 66 at 5GHz in article read.
(Only appropriate for global clock network).
M. Mizuno, On-chip Multi-GHz Clocking with
Transmission Lines, 2000 IEEE ISSCC
24Other Ideas Using Transmission Line properties to
Reduce Power
- ??? Local clocking networks have large
capacitance which can cause degradation of rise
time caused by Z0CL. A low Z0 is not appropriate
for long lines.
M. Mizuno, On-chip Multi-GHz Clocking with
Transmission Lines, 2000 IEEE ISSCC
25Design
Ease of Design
Minimal skew
Minimal power
H-Tree
Small regions
?
Design dependent
Design independent
Grid
Giant signals
H-Tree
Binary tree
?
Multi-VDD
Sloooowwwww systems (adiabatic systems, sine wave
clock) High VT systems
Binary tree
26New Clock Signal Distribution Technology Claims
- Name Rotary Traveling-Wave Oscillator Arrays
- Clock Distribution Claims
- Gigahertz-rate
- Multi-phase (3600)
- Square wave signal
- Low jitter
- Scalable
- Potential for low power, low-skew, no need for
clock domains.
J. Wood, T. C. Edwards, Rotary Traveling-wave
Oscillator Arrays A New Clock Technology, IEEE
Journal of Solid-State Circuits, November 2001,
Vol. 36, No. 11
27Design
Ease of Design
Minimal skew
Minimal power
Rotary Traveling-Wave Oscillator?
H-Tree
Small regions
Design dependent
Design independent
Grid
Giant signals
Binary tree
?
?
Multi-VDD
Sloooowwwww systems (adiabatic systems, sine wave
clock) High VT systems
28New Clock Signal Distribution Technology Basics
- Closed switch starts wave around loop.
- Ideally, signal will propagate forever once
started.
J. Wood, T. C. Edwards, Rotary Traveling-wave
Oscillator Arrays A New Clock Technology, IEEE
Journal of Solid-State Circuits, November 2001,
Vol. 36, No. 11
29Architecture
- Capacitance and inductance control speed of
signal around loop. - Loops linked by hard wired connections.
- Inverters placed to keep signal from degrading
and to lock rotation direction.
30Performance
- Lperlength, Cperlength, and vp equations
explained - Heavily loaded, vp can be 0.03 x c 9 x 106 m/s
- fc ? vp / (2 x RingLength)
- ??? Performance limited by transit time, not
inverter propagation time. - ??? Rise/fall times controllable by setting
cutoff frequency of transmission lines.
Fcutoff 1/ ( 2??LlumpClump )
31More Performance metrics
- Changes in temperature and supply voltage.
- Changes in frequency due to supply voltage
variation
32Power
- 84mA source whereas line current equals 200mA.
- Losses related to I2R rather than CV2f. Reducing
R, increasing C, can increase I2R to unmanagable
levels.
33Surges
- Simultaneous switching surges low because of the
distributed switching times of the inverters.
34Tapoff Issues and Stub Loading
- Tapoff possible to extract clock. Generally can
be looked at as capacitive stub on ring. To
minimize signal distortion, ?p ltlt ?r, ?f - Increasing of C indefinitely will reduce
impedance and eventually Rloop lt 2Z0 and
oscillation will stop.
35Logic
- Two-phase latched logic is thought to be most
compatible. Two-phase reduces problem with
current surges. - Four-phase domino logic is possible if two loops
of RTO available around logic.
36Layout questions
- Top level for clock nets
- Via parasitics insignificant due to size of
inverters? - Crossovers
- Two levels necessary to implement
- Electromigration
- Giant current In the future surges of gt 500 mA
possible - Inverters
- Located directly under lines
- Giant. Hard to switch? Use capacitors instead of
transistor capacitance?
37Coupling Issues
- Strong magnetic field mitigated by running routed
lines at 900 to loop lines on next lower metal
layer. Works as electrostatic shield. - ??? Substrate magnetic fields are to be expected.
38Interesting CAD
- A tool that targets a fixed operating frequency.
Pads with capacitance to fix impedance
discontinuities. Rotary Explorer.
39What Id like to do
- Better understand problems with design in paper.
- Come to conclusions on technology.
- Investigate compatible logic and applications for
clock set-up. - Check out CAD tools available.
- Solidify background on interconnect.