Minimal Skew Clock Synthesis Considering TimeVariant Temperature Gradient - PowerPoint PPT Presentation

About This Presentation
Title:

Minimal Skew Clock Synthesis Considering TimeVariant Temperature Gradient

Description:

... work is the first in-depth study on clock synthesis considering time-variant ... Cluster merging points based on correlation strength ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 23
Provided by: yuhu
Learn more at: http://eda.ee.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Minimal Skew Clock Synthesis Considering TimeVariant Temperature Gradient


1
Minimal Skew Clock Synthesis Considering
Time-Variant Temperature Gradient
  • Hao Yu, Yu Hu, Chun-Chen Liu and Lei He
  • EE Department, UCLA
  • Presented by Yu Hu
  • Partially supported by SRC task 1116.

2
Introduction
  • Both process and operation variations cause
    uncertainties and may lead to design failure or
    over-design.
  • Process variations have been actively studied.
  • Statistical timing analysis
  • Stochastic optimization
  • Post-silicon configuration
  • Stochastic optimization for operation variations
    below has been largely ignored
  • Fluctuation of crosstalk noise and P/G network
    noise due to different input vectors
  • Time-variant on-chip temperature map over
    different workloads
  • This work is the first in-depth study on clock
    synthesis considering time-variant temperature
    variations

3
Limitation of Existing Work
  • The existing work ChoICCAD05 ignores the
    time-variant temperature variations and assumes a
    fixed temperature map
  • Different work loads lead to different
    temperature maps (e.g., two SPEC2000
    applications Ammp and Gzip)
  • Optimizing skew for one application hurts the
    skew for another application, this conflict is
    solved in this work

4
Outline
  • Modeling and Problem Formulation
  • Algorithms
  • Experimental Results
  • Conclusions

5
Stochastic Temperature Model
  • The temperature map is unique for each
    application or program phase
  • can be obtained by uArch-level simulation
  • For each region of the chip, temperature is
    characterized by its mean and variance over a
    number of maps
  • Primary component analysis (PCA) to decide of
    maps
  • Temperature correlation measured as covariance
    between regions is high over SPEC2000 benchmark
    set

(i,j) Correlation between region i and j
6
Problem Formulation
  • Given
  • The source, sinks and an initial tree embedding
  • A set of temperature maps for a benchmark set
  • Design freedoms
  • Re-embedding of clock tree
  • Cross link insertion
  • To minimize the worst case
  • skew among given
  • temperature maps

7
Outline
  • Modeling and Problem Formulation
  • Algorithms
  • Experimental Results
  • Conclusions

8
Bottom-up Greedy-based Re-embedding
Re-embedding option
Sink
Original merging point
9
Bottom-up Greedy-based Re-embedding
New merging point
10
Delay and Skew with Re-embedding
  • Perturbed Modified Nodal Analysis (MNA)
  • x is for source, sinks and merging point
  • L selects sink responses
  • Defining a new state variable with both nominal
    (x) and sensitivity (?x) key to triangulate the
    system
  • Structured and parameterized state matrix

The number of re-embedding options I5N is huge!
(N is number of merging points)
11
Compressing Solution Space by Temperature
Correlation
  • Motivation
  • Highly correlated merging points should be
    re-embedded in the same fashion
  • Solution
  • Calculate correlation between two merging points
    based on temperature correlations
  • Cluster merging points based on correlation
    strength
  • Perform the same re-embedding for all points
    within one cluster

12
Temperature Correlation Driven Clustering
  • Correlation matrix C of merging points is
    low-ranked, and Singular Value Decomposition
    (SVD) reveals the rank K
  • Partition the merging points into K clusters
    (K-Means)
  • Maximize the correlation strength within each of
    K clusters
  • K 4, N 70
  • Reduced from 570 to 54

13
Recap of Skew Calculation with Re-embedding
K ltlt N
Delay and Skew
14
Simultaneous Re-embedding and Cross Link Insertion
  • Decide crosslink candidates according to
    Rajaram, DAC04
  • Cluster crosslink candidates again based on the
    temperature correlation
  • Calculate skew sensitivities w.r.t. crosslink and
    re-embedding candidates
  • In a fashion similar to the previous triangular
    block-wise MOR
  • Bottom-up select the best crosslink or
    re-embedding

15
Outline
  • Modeling and Problem Formulation
  • Algorithms
  • Experimental Results
  • Conclusions

16
Experimental Settings
  • Temperature maps are obtained by
    micro-architecture level power-temperature
    transient simulator Liao,TCAD05 with 6
    SPEC2000 applications
  • 100 temperature maps, one for each 10 million
    clock cycles
  • Compare four algorithms (two categories)
  • Traditional optimization under nominal
    temperature and Elmore delay
  • DME deferred merging-point embedding to minimize
    wire-length for zero-skew
  • xlink cross-link insertion Rajaram, ICCAD'04
  • The proposed algorithms with temperature
    variation and high-order delay model
  • re-embed re-embedding
  • xlink Re-embed simultaneously re-embedding and
    cross-link insertion

17
Skew Distribution Over 100 Temperature Maps
  • XR cross link insertion re-embedding
  • DME Deferred Merging points Embedding

18
Worst-case Skew
  • For tree structure, re-embed reduces the
    worst-case skew by 3x on average (up to 20x)
    compared to DME.
  • For non-tree structure, xlinkre-embed reduces
    the worst-case skew by 30 on average (up to 7x)
    compared to xlink.

ps
19
Wire Length
  • For tree structure, re-embed has less than 1
    wire length overhead compared to DME
  • For non-tree structure, xlinkre-embed has 5
    LESS wire length compared to xlink.

20
Runtime
  • Temperature-aware optimizations (re-embed and
    xlinkre-embed) are about 10x slower compared to
    DME and xlink, respectively, but
  • Our work uses high-order delay model
  • DME and xlink use Elmore delay

21
Conclusions
  • Studied the clock optimization for workload
    dependent temperature variation
  • Reduced the worst-case skew by up to 7X with LESS
    wire-length compared to best existing method
  • Correlation-aware modeling and optimization
    paradigm can be extended to handle PVT
    variations, and more design freedoms
  • Temperature Aware Microprocessor Floorplanning
    Considering Application Dependent Power Load
    Chu et al, ICCAD07
  • Efficient Decoupling Capacitance Budgeting
    Considering Operation and Processing Variations
    Shi et al, finalist for Best Paper, ICCAD07

22
Thank you!
  • SRC TechCon 2007
  • Hao Yu (graduated), Yu Hu (presenter),
  • Chun-Chen Liu and Lei He (PI)
  • Minimal Skew Clock Embedding Considering Time
    Variant Temperature Gradient
Write a Comment
User Comments (0)
About PowerShow.com