Title: Minimal Skew Clock Synthesis Considering TimeVariant Temperature Gradient
1Minimal Skew Clock Synthesis Considering
Time-Variant Temperature Gradient
- Hao Yu, Yu Hu, Chun-Chen Liu and Lei He
- EE Department, UCLA
- Presented by Yu Hu
- Partially supported by SRC task 1116.
2Introduction
- Both process and operation variations cause
uncertainties and may lead to design failure or
over-design. - Process variations have been actively studied.
- Statistical timing analysis
- Stochastic optimization
- Post-silicon configuration
- Stochastic optimization for operation variations
below has been largely ignored - Fluctuation of crosstalk noise and P/G network
noise due to different input vectors - Time-variant on-chip temperature map over
different workloads - This work is the first in-depth study on clock
synthesis considering time-variant temperature
variations
3Limitation of Existing Work
- The existing work ChoICCAD05 ignores the
time-variant temperature variations and assumes a
fixed temperature map - Different work loads lead to different
temperature maps (e.g., two SPEC2000
applications Ammp and Gzip) - Optimizing skew for one application hurts the
skew for another application, this conflict is
solved in this work
4Outline
- Modeling and Problem Formulation
- Algorithms
- Experimental Results
- Conclusions
5Stochastic Temperature Model
- The temperature map is unique for each
application or program phase - can be obtained by uArch-level simulation
- For each region of the chip, temperature is
characterized by its mean and variance over a
number of maps - Primary component analysis (PCA) to decide of
maps - Temperature correlation measured as covariance
between regions is high over SPEC2000 benchmark
set
(i,j) Correlation between region i and j
6Problem Formulation
- Given
- The source, sinks and an initial tree embedding
- A set of temperature maps for a benchmark set
- Design freedoms
- Re-embedding of clock tree
- Cross link insertion
- To minimize the worst case
- skew among given
- temperature maps
7Outline
- Modeling and Problem Formulation
- Algorithms
- Experimental Results
- Conclusions
8Bottom-up Greedy-based Re-embedding
Re-embedding option
Sink
Original merging point
9Bottom-up Greedy-based Re-embedding
New merging point
10Delay and Skew with Re-embedding
- Perturbed Modified Nodal Analysis (MNA)
- x is for source, sinks and merging point
- L selects sink responses
- Defining a new state variable with both nominal
(x) and sensitivity (?x) key to triangulate the
system - Structured and parameterized state matrix
The number of re-embedding options I5N is huge!
(N is number of merging points)
11Compressing Solution Space by Temperature
Correlation
- Motivation
- Highly correlated merging points should be
re-embedded in the same fashion - Solution
- Calculate correlation between two merging points
based on temperature correlations - Cluster merging points based on correlation
strength - Perform the same re-embedding for all points
within one cluster
12Temperature Correlation Driven Clustering
- Correlation matrix C of merging points is
low-ranked, and Singular Value Decomposition
(SVD) reveals the rank K - Partition the merging points into K clusters
(K-Means) - Maximize the correlation strength within each of
K clusters
- K 4, N 70
- Reduced from 570 to 54
13Recap of Skew Calculation with Re-embedding
K ltlt N
Delay and Skew
14Simultaneous Re-embedding and Cross Link Insertion
- Decide crosslink candidates according to
Rajaram, DAC04 - Cluster crosslink candidates again based on the
temperature correlation - Calculate skew sensitivities w.r.t. crosslink and
re-embedding candidates - In a fashion similar to the previous triangular
block-wise MOR - Bottom-up select the best crosslink or
re-embedding
15Outline
- Modeling and Problem Formulation
- Algorithms
- Experimental Results
- Conclusions
16Experimental Settings
- Temperature maps are obtained by
micro-architecture level power-temperature
transient simulator Liao,TCAD05 with 6
SPEC2000 applications - 100 temperature maps, one for each 10 million
clock cycles - Compare four algorithms (two categories)
- Traditional optimization under nominal
temperature and Elmore delay - DME deferred merging-point embedding to minimize
wire-length for zero-skew - xlink cross-link insertion Rajaram, ICCAD'04
- The proposed algorithms with temperature
variation and high-order delay model - re-embed re-embedding
- xlink Re-embed simultaneously re-embedding and
cross-link insertion
17Skew Distribution Over 100 Temperature Maps
- XR cross link insertion re-embedding
- DME Deferred Merging points Embedding
18Worst-case Skew
- For tree structure, re-embed reduces the
worst-case skew by 3x on average (up to 20x)
compared to DME. - For non-tree structure, xlinkre-embed reduces
the worst-case skew by 30 on average (up to 7x)
compared to xlink.
ps
19Wire Length
- For tree structure, re-embed has less than 1
wire length overhead compared to DME - For non-tree structure, xlinkre-embed has 5
LESS wire length compared to xlink.
20Runtime
- Temperature-aware optimizations (re-embed and
xlinkre-embed) are about 10x slower compared to
DME and xlink, respectively, but - Our work uses high-order delay model
- DME and xlink use Elmore delay
21Conclusions
- Studied the clock optimization for workload
dependent temperature variation - Reduced the worst-case skew by up to 7X with LESS
wire-length compared to best existing method - Correlation-aware modeling and optimization
paradigm can be extended to handle PVT
variations, and more design freedoms - Temperature Aware Microprocessor Floorplanning
Considering Application Dependent Power Load
Chu et al, ICCAD07 - Efficient Decoupling Capacitance Budgeting
Considering Operation and Processing Variations
Shi et al, finalist for Best Paper, ICCAD07
22Thank you!
- SRC TechCon 2007
- Hao Yu (graduated), Yu Hu (presenter),
- Chun-Chen Liu and Lei He (PI)
- Minimal Skew Clock Embedding Considering Time
Variant Temperature Gradient