A Sliding Window Scheme for Accurate Clock Mesh Analysis

About This Presentation

Title:

A Sliding Window Scheme for Accurate Clock Mesh Analysis

Description:

aj. 6. Previous Work on Clock Mesh Analysis ... Interconnect reduction using AWE [Bailey et. al. 2001, DEC] Moment matching technique. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 20

Provided by: hai95

Category:

more less

Transcript and Presenter's Notes

Title: A Sliding Window Scheme for Accurate Clock Mesh Analysis

1
A Sliding Window Scheme for Accurate Clock Mesh
Analysis

Hongyu Chen2, C.Y. Yeh3, G. Wilke4, S. Reddy1, H.
Nguyen1, W. Walker1, R. Murgai1
1.Fujitsu Laboratories of America, Inc., CA, USA
2.University of California, San Diego, CA, USA
3.University of California, Santa Barbara, USA
4. UFRGS, Brazil

2
Outline

Problem Statement
Mesh based architectures
Sliding Window Scheme
Improving the SWS Accuracy
Optimal Window Size Selection
Conclusions

3
Mesh-based Clock Architectures

Excellent for low skew, jitter
Used in modern processors
Difficult to analyze
v.s. Tree better performance, more routing
resource usage, no existing tool support

4
Pure Mesh Architecture

Three components
-n x n (uniform) mesh
(Uniform) array of k x k buffers drives the mesh
at grid nodes.
-Global tree drives mesh buffers
-Local distribution
FFs directly connected to nearest mesh segment

5
Clock Net Analysis Problem

Goal Given a mesh based clock architecture,
compute latency (delay) from the clock root to
each flip-flop.
Needed to determine clock cycle and/or timing
violations.
Skew imposes constraints on min and max logic
path delays.
-long path analysis aj ? ai ?logic_max
tset_up - Tcycle
-short path analysis aj ? ai ?logic_min -
thold
Mesh architectures difficult to analyze for a
real design Huge number of circuit nodes in the
model.
Needed for accuracy.
Large number of metal loops present in the mesh
structure.
Design with 64x64 clock mesh and 200K FFs exceeds
HSPICE capacity.

6
Previous Work on Clock Mesh Analysis

Break clock mesh into tree apply smoothing
algorithm to redistribute mesh loads IBM patent
March 2001 Restle et. al. 2001
No accuracy results shown.
Interconnect reduction using AWE Bailey et. al.
2001, DEC
Moment matching technique.
Orthogonal to our scheme.
Sizing a clock mesh given latency constraints
Desai et. al. 1996, DEC
Break mesh into tree
use approximate model of delay
Vandenberghe et. al. 1997
Use dominant time constant as measure of delay.
Use semi-definite programming
Show results only for smaller meshes.

7
Sliding Window-based Simulation (SWS)

Proposed new sliding window-based scheme for mesh
analysis
Two nodes on the mesh that are far from each
other have little electrical impact on each other
Insight RC mesh constitutes a cascaded low pass
filter, each driver only has a local effect
Model mesh with two different resolutions
Detailed model for mesh elements close to nodes
being measured.
Simplified model for other nodes.

8
Sliding Window-based Simulation (SWS)
Preserve detailed circuit inside window
Ca Total load inside the rectangle area
a
b
Lump capacitance and remove resistors outside
window (except on the mesh itself)
9
Benefits of SWS

Reduces memory usage by simplifying the model to
be simulated (for each window).
64x64 mesh 100K FFs distributed uniformly over
the chip.
Assuming 1-pi model for interconnect (2 nodes per
segment).
Golden model needs 308K nodes (8K nodes for 4K
mesh segments 300K nodes for the FFs)
SWS with window size of 16x16
Needs 29K nodes (8K for mesh 21K for 7K FFs)
10X reduction in model size!

a
Cw/2
Cw/2
10
Benefits of SWS (contd)

Run-time
Assume SPICE run-time is O(N1.5) N number of
nodes.
Each window simulation 101.5 32X faster than
golden simulation.
16 simulations cover the mesh.
Overall speed-up factor 2.
Can complete on fine meshes.
Is very accurate.
Suited to parallelization or grid-computing.

11
SWS Accuracy
12
Improving Accuracy of SWS

Noticed large errors outside window and in the
window periphery.
Solution
Add border to the window w new window w.
Detailed model within w.
Delay measurement only for FFs inside w.
Ignore noisy FFs in the border w - w.
New windows are overlapping.

Improves accuracy at the expense of runtime.

13
Flow of Improved SWS
14
Accuracy of SWS with Border
15
Accuracy of SWS With Border
16
Optimum Window Size

Main Concerns
Memory Limit Fit one simulation into memory
Run time reduction Spice runtime O(Na), a1.5
Parallelism Prefer smaller window for parallel
execution
Experimental Studies on a 64 by 64 mesh

17
Optimum Window 64x64 mesh, 100K FFs
18
Simulation with a real industry design

About 300k FFs
Parallel execution on a 4 processor machine

19
Conclusions

Clock mesh analysis is a difficult problem
Proposed a new sliding-window based scheme to
analyze clock meshes with respect to latency to
FFs.
Accurate to within 1 of HSPICE.
Can complete on large mesh design when HSPICE
could not.
Is parallelizable.
Determined strategy for picking optimum window
size.
Future work
Combine the SWS with model order reduction
techniques
Jitter analysis