Title: CMOS VLSI DESIGN
1CMOS VLSI DESIGN
- Kasin Vichienchom
- kvkasin_at_kmitl.ac.th
- Lecture6
2Timing Issues
- Clock Non-Ideality
- Clock Skew
- Jitter
- Clock Distribution Networks
- Metrics
- Types
- Examples
3Clock Non-Ideality
- Clock skew ( phase shift between clocks)
- Spatial variation in temporally equivalent clock
edges deterministic random, tSK - Due to RC delay in clock distribution which is
depending on the difference in distance, clock
load, clock driver, process variation - Clock jitter (clock period variation)
- Temporal variations in consecutive edges of the
clock signal modulation random noise - Cycle-to-cycle (short-term) tJS
- Long term tJL
- Variation of the pulse width
- Important for level sensitive clocking
Source Jan Rabaey
4Clock Non-Ideality
Skew and Jitter
Source Jan Rabaey
5Clock Skew-Negative
Source Jan Rabaey
6Clock Skew-Positive
Source Jan Rabaey
7Effect of Clock Skew
Constrain TCLK tctq tmax tsetup
8Effect of Clock Skew
If tmax t1 setup time violation If tmax
t2 zero clocking (load old data)
Constrain TCLK tctq tmax tsetup d
9Effect of Clock Skew
If tmin t1 double clocking (take data of the
next state) If tmin t2 hold time violation
Constrain tctq tmin thold d
10Clock Distribution
- Clock Distribution Metrics
- Power
- Area
- Skew
- Types of Clock Distribution
- Tree
- Grid
- Hybrid
- Case Studies DEC Alpha µProcessor
Source Jan Rabaey
11Metrics-Power
Power dissipation of clock network P aCLVDD2f
a 1
Very large
Source Jan Rabaey
12Metrics-Area
- Clock networks consume silicon area (clock
drivers, PLL, etc.) and routing area - Routing area is most vital
- Top-level metals are used to reduce RC delays
- These levels are precious resources (unscaled)
- Power routing, clock routing, key global signals
- By minimizing area used, we also reduce wiring
capacitance and power - Typical number Intel Itanium 4 of M4/5 used
in clock routing
Source Jan Rabaey
13Clock Slew Rate
- To maintain signal integrity and latch
performance, minimum slew rates are required - Too slow clock is more susceptible to noise,
latches are slowed down, eats into timing budget - Too fast burning too much power, overdesigned
network, enhanced ground bounce - Rule-of-thumb tr and tf of clock are each
between 10-20 of clock period - Example 1 GHz clock tr tf 100-200 ps
Source Jan Rabaey
14Clock Technology Trend
- Heavily pipelined designs
- more latches
- more capacitive load for clock
- Larger chips
- more wirelength needed to cover the entire die
- Complexity
- more functionality and devices
- more clocked elements (loads)
- Dynamic logic
- more clocked elements (loads)
Source Jan Rabaey
15Grid System
- No RC delay matching
- Large Power (huge drivers)
Source Jan Rabaey
16Grid System
- Gridded clock distribution was common on earlier
DEC Alpha microprocessors - Advantages
- Skew determined by grid density and not overly
sensitive to load position - Clock signals are available everywhere
- Tolerant to process variations
- Usually yields extremely low skew values
Source Jan Rabaey
17Grid System
Disadvantages
- Huge amounts of wiring power
- Wire cap large
- Strong drivers needed pre-driver cap large
- Routing area large
- To minimize all these penalties, make grid pitch
coarser - Skew gets worse
- Losing the main advantage
- Dont overdesign let the skew be as large as
tolerable - Grids arent feasible for most designs due to
power
Source Jan Rabaey
18H-Tree System
- Original H-tree (Bakoglu)
- One large central driver
- Recursive H-style structure to match wirelengths
- Halve wire width at branching points to reduce
reflections
Source Jan Rabaey
19H-Tree System
- Drawback of original tree concept
- slew degradation along long RC paths
- unrealistically large central driver
- Clock drivers can create large temperature
gradients (ex. Alpha 210640 30 C) - non-uniform load distribution
- Inherently non-scalable (wire resistance
skyrockets) - Solution to some problems
- Introduce intermediate buffers along the way
- Specifically at branching points
Source Jan Rabaey
20Buffered H-Tree
Advantages Ideally zero-skew Can be low
power (depending on skew requirements) Low area
(silicon and wiring) CAD tool friendly
(regular) Disadvantages Sensitive to
process variations Local clocking loads
are inherently non-uniform
Source Jan Rabaey
21Realistic H-Tree
a balanced load segment (tile)
RC matched for an 400 MHz IBMs 1998
Microprocessor
Source Jan Rabaey
22H-Tree
Balancing H-Tree
Some techniques a) Introduce dummy loads b)
Snaking of wirelength to match delays
Source Jan Rabaey
23Hybrid-System
Globally Tree Power requirements are reduced
compared to global grid Smaller routing
requirements, frees up global tracks Trees are
easily balanced at the global level Keeps
global skew low (with minimal process variation)
Source Jan Rabaey
24Hybrid-System
Locally Grid Smaller grid distribution area
allows for coarser grid pitch Lower power
in interconnect Lower power in predrivers
Routing area reduced Local skew is kept
very small Easy access to clock by simply
tapping off grid
Source Jan Rabaey
25Case Studies DEC 21164
tcycle 3.3ns
- 2 phase single wire clock, distributed globally
- 2 distributed driver channels
- Reduced RC delay/skew
- Improved thermal distribution
- 3.75nF clock load
- 58 cm final driver width
- Local inverters for latching
- Conditional clocks in caches to reduce power
- More complex race checking
- Device variation
tskew 150ps
trise 0.35ns
Clock waveform
Location of clock driver on die
Source Jan Rabaey
26Case Studies DEC 21164
Source Jan Rabaey
27Case Studies DEC 21264
EV6 (Alpha 21264) Clocking 600 MHz 0.35 micron
CMOS
Global clock waveform
- 2 Phase, with multiple conditional buffered
clocks - 2.8 nF clock load
- 40 cm final driver width
- Local clocks can be gated off to save power
- Reduced load/skew
- Reduced thermal issues
- Multiple clocks complicate race checking
Source Jan Rabaey
28Case Studies DEC 21264
EV6 (Alpha 21264) Clocking 600 MHz 0.35 micron
CMOS
Source Jan Rabaey
29Case Studies DEC 21264
GCLK Skew (at Vdd/2 Crossings)
GCLK Rise Times (20 to 80 Extrapolated to 0 to
100)
Source Jan Rabaey
30Case Studies DEC EV7
Active Skew Management and Multiple Clock Domains
- widely dispersed drivers
- DLLs compensate static and low-frequency
variation - divides design and verification effort
- DLL design and verification is added work
- tailored clocks
Source Jan Rabaey
31DEC Alpha Clocking
- EV4 21064 0.75 µm, 200 MHz 1992
- Single global clock driver, 5 levels of buffering
- 35 cm driver, 3.25 nF, 40 power
- EV5 21164 0.5 µm, 300 MHz 1995
- One central, two side clock drivers
- 58 cm driver, 3.75 nF, 40 power
- EV6 21264 0.35µm, 600 MHz 1998
- Clock grid, 4 window panes, hierarchical, gated
clock domains - 40 cm driver, 2.8 nF
- EV7 0.18µm, 1.2 GHz 2002
- Multiple clock domains, DLLs
Source Jan Rabaey
32Itanium 2 H-Tree
- Four levels of buffering
- Primary driver
- Repeater
- Second-level
- clock buffer
- Gater
- Route around
- obstructions
Source Harris
33Conclusion
- Getting the clock everywhere on a die at the
exact same time is difficult - Requires a lot of power to reduce skew (big
drivers, wide wires, etc.) - Balanced H-trees are in common use
- Design automation tools exist to synthesize these
trees - Clocks must
- be robust to variations/noise,
- have relatively sharp slew rates