Title: A Sliding Window Scheme for Accurate Clock Mesh Analysis
1 A Sliding Window Scheme for Accurate Clock Mesh
Analysis
- Hongyu Chen2, C.Y. Yeh3, G. Wilke4, S. Reddy1, H.
Nguyen1, W. Walker1, R. Murgai1 - 1.Fujitsu Laboratories of America, Inc., CA, USA
- 2.University of California, San Diego, CA, USA
- 3.University of California, Santa Barbara, USA
- 4. UFRGS, Brazil
2Outline
- Problem Statement
- Mesh based architectures
- Sliding Window Scheme
- Improving the SWS Accuracy
- Optimal Window Size Selection
- Conclusions
3Mesh-based Clock Architectures
- Excellent for low skew, jitter
- Used in modern processors
- Difficult to analyze
- v.s. Tree better performance, more routing
resource usage, no existing tool support
4Pure Mesh Architecture
- Three components
- -n x n (uniform) mesh
- (Uniform) array of k x k buffers drives the mesh
at grid nodes. - -Global tree drives mesh buffers
- -Local distribution
- FFs directly connected to nearest mesh segment
5Clock Net Analysis Problem
- Goal Given a mesh based clock architecture,
compute latency (delay) from the clock root to
each flip-flop. - Needed to determine clock cycle and/or timing
violations. - Skew imposes constraints on min and max logic
path delays. - -long path analysis aj ? ai ?logic_max
tset_up - Tcycle - -short path analysis aj ? ai ?logic_min -
thold - Mesh architectures difficult to analyze for a
real design Huge number of circuit nodes in the
model. - Needed for accuracy.
- Large number of metal loops present in the mesh
structure. - Design with 64x64 clock mesh and 200K FFs exceeds
HSPICE capacity.
6Previous Work on Clock Mesh Analysis
- Break clock mesh into tree apply smoothing
algorithm to redistribute mesh loads IBM patent
March 2001 Restle et. al. 2001 - No accuracy results shown.
- Interconnect reduction using AWE Bailey et. al.
2001, DEC - Moment matching technique.
- Orthogonal to our scheme.
- Sizing a clock mesh given latency constraints
- Desai et. al. 1996, DEC
- Break mesh into tree
- use approximate model of delay
- Vandenberghe et. al. 1997
- Use dominant time constant as measure of delay.
- Use semi-definite programming
- Show results only for smaller meshes.
7Sliding Window-based Simulation (SWS)
- Proposed new sliding window-based scheme for mesh
analysis - Two nodes on the mesh that are far from each
other have little electrical impact on each other - Insight RC mesh constitutes a cascaded low pass
filter, each driver only has a local effect - Model mesh with two different resolutions
- Detailed model for mesh elements close to nodes
being measured. - Simplified model for other nodes.
8Sliding Window-based Simulation (SWS)
Preserve detailed circuit inside window
Ca Total load inside the rectangle area
a
b
Lump capacitance and remove resistors outside
window (except on the mesh itself)
9Benefits of SWS
- Reduces memory usage by simplifying the model to
be simulated (for each window). - 64x64 mesh 100K FFs distributed uniformly over
the chip. - Assuming 1-pi model for interconnect (2 nodes per
segment). - Golden model needs 308K nodes (8K nodes for 4K
mesh segments 300K nodes for the FFs) - SWS with window size of 16x16
- Needs 29K nodes (8K for mesh 21K for 7K FFs)
- 10X reduction in model size!
a
Cw/2
Cw/2
10Benefits of SWS (contd)
- Run-time
- Assume SPICE run-time is O(N1.5) N number of
nodes. - Each window simulation 101.5 32X faster than
golden simulation. - 16 simulations cover the mesh.
- Overall speed-up factor 2.
- Can complete on fine meshes.
- Is very accurate.
- Suited to parallelization or grid-computing.
11SWS Accuracy
12Improving Accuracy of SWS
- Noticed large errors outside window and in the
window periphery. - Solution
- Add border to the window w new window w.
- Detailed model within w.
- Delay measurement only for FFs inside w.
- Ignore noisy FFs in the border w - w.
- New windows are overlapping.
- Improves accuracy at the expense of runtime.
13Flow of Improved SWS
14Accuracy of SWS with Border
15Accuracy of SWS With Border
16Optimum Window Size
- Main Concerns
- Memory Limit Fit one simulation into memory
- Run time reduction Spice runtime O(Na), a1.5
- Parallelism Prefer smaller window for parallel
execution - Experimental Studies on a 64 by 64 mesh
17Optimum Window 64x64 mesh, 100K FFs
18Simulation with a real industry design
- About 300k FFs
- Parallel execution on a 4 processor machine
19Conclusions
- Clock mesh analysis is a difficult problem
- Proposed a new sliding-window based scheme to
analyze clock meshes with respect to latency to
FFs. - Accurate to within 1 of HSPICE.
- Can complete on large mesh design when HSPICE
could not. - Is parallelizable.
- Determined strategy for picking optimum window
size. - Future work
- Combine the SWS with model order reduction
techniques - Jitter analysis
-