Title: Interconnect Planning, Synthesis, and Layout for Performance, Signal Reliability and Cost Optimization SRC Task ID: 605.001
1Interconnect Planning, Synthesis, and Layout for
Performance, Signal Reliability and Cost
OptimizationSRC Task ID 605.001
- PI Prof. Jason Cong (UCLA)
- Students Lei He, David Pan, Xin Yuan
- Mentors Dr. Prakash Arunachalam (Intel)
- Dr. Norman Chang (HP)
- Dr. Wilm Donath (IBM)
- Dr. Stefan Rusu
(Intel)
2Project Overview
- Objective investigate an interconnect-centric
design flow and methodology, consisting of - Interconnect Planning
- Interconnect Synthesis
- Interconnect Layout
3Key Issues in Interconnect Planning
- Three levels of planning
- Interconnect architecture planning (pre-design)
- Interconnect planning with RTL-floorplan
- Interconnect planning with physical-level
floorplan - Enabling tools interconnect estimation models
for interconnect synthesis/layout
4Review Accomplishments in Year 1
- Efficient (constant time) and accurate (90)
interconnect delay estimation models for 2-pin
nets under different interconnect optimization
algorithms Cong-Pan, IWLS98, SRC/TECHCON98,
ASPDAC99 - Optimal wire sizing (OWS)
- Simultaneous driver and wire sizing (SDWS)
- Simultaneous buffer insertion/sizing and wire
sizing (BISWS) - Interconnect architecture planning
Cong-Pan,DAC99 - Propose a unified wire-width planning framework
- Obtain a surprising result that our
pre-determined two-width can achieve close to
optimal solution for a large wire length range ! - Can handle different objective functions
5Accomplishments in Year 2
- Efficient and accurate interconnect estimation
models for multiple-pin nets Cong-Pan, TAU99 - Buffer block planning for interconnect-driven
floorplanning Cong-Kong-Pan, ICCAD99 - Further study on interconnect architecture
planning
6Interconnect Estimation for Multiple-Pin Nets
- Objective estimate delay/area under different
interconnect optimizations (e.g, OWS, BISWS)
quickly (100K - 1M nets per second) - Different targets
- 1. Minimize the delay to a single critical sink
(SCS) - 2. Minimize the maximum delay (defined as the
tree delay) for multiple critical sinks (MCS) -
7Estimation for Multiple-Pin Nets
- Very difficult
- No closed-form wire shaping or buffer insertion
- All available optimization algorithms are
iterative based - Multiple critical sinks may exist at the same
time ! - Our approach
- Reduce the multiple-pin net estimation problem
into one or several 2-pin net estimation
problems, then use our previous (Year 1) results
8Reduction for OWS of SCS
Cs1
S1
G
S3
Sk
Csk
S2
Cs2
Single-Line-Multiple-Load (SLML)
9OWS for SCS
- Transform SLML to SLSL (i.e., 2-pin net)
lk
l2
l1
R
d
Sk
W
C1
Ck
C2
Ck-1
10OWS for SCS
- Transform SLML to SLSL (i.e., 2-pin net)
lk
l2
l1
R
d
Sk
W
CL
C0
11Delay/Area Estimation for OWS/SCS
- Closed-form delay estimation for the critical sink
where
,
W(x) is Lamberts W function defined as
12 Summary for Interconnect Estimation
- Develop delay and area estimation models for
multiple-pin nets with consideration of various
interconnect optimizations - Consider different optimization objectives
- Single critical sink (SCS)
- Multiple critical sinks (MCS)
- Apply various optimization alternatives
- Optimal wire sizing (OWS)
- Buffer insertion/sizing and wire sizing (BISWS)
13Delay/Area Comparison with TRIO
- Rd 180ohm, C1 100 fF, C2 10 fF
- One internal load, l1 0.1 to 0.9 x l (l 5,
10 or 20 mm) - Max. allowable wire width is 20x min. width
wire is segmented in every 10um.
14Accomplishments in Year 2
- Efficient and accurate interconnect estimation
models for multiple-pin nets (Cong-Pan, TAU99) - Buffer block planning for interconnect-driven
floorplanning (Cong-Kong-Pan, ICCAD99) - Further study on interconnect architecture
planning
15Motivation for BBP
- For high-performance DSM designs, many buffers
may be inserted to optimize/meet interconnect
delay (e.g., up to 800,000 for 50nm tech.,
Cong97, SRC Work Paper) - The introduction of so many buffers will
significantly change a floorplan thus shall be
planned to ensure timing/design convergence. - Need proper buffer block planning (BBP) to
address - buffer location constraints (e.g., hard IP
blocks) - dead area utilization
- regularity for ease of layout and power/ground
network sharing
16Buffer Block Planning Problem
buffer blocks
white space
grey (soft) block (limited buffers)
black (hard) block (no buffer allowed)
- Given initial floorplan, buffer capacity for
each soft block, and performance constraint for
each net - Output optimal location/dimension of buffer
blocks such that the overall chip area and the
number of buffer blocks are minimized.
17Feasible Regions for BI
- Feasible region is the maximal region that a
buffer can be placed to meet given delay
constraint.
1 buffer
driver
CL
k buffers
driver
CL
18Feasible Regions for BI
- We obtain the closed-form formula for FRs
- Important observation even under tight delay
constraint, FR for BI can still be pretty large! - gt FR provides a lot flexibility to plan buffer
location
19Feasible Regions for BI
- FR extended to 2-dimension with obstacles
sink
source
20Overview of BBP Algorithm
- 1. Build polar graphs for given floorplan
- 2. Build tile data structure
- 3. For each tile, compute its area slacks
- 4. Compute FR(s) for each net
- 4. While (there exists some buffer to be
inserted) - Pick_A_Tile ? that can insert most buffers w/o
area penalty if no such tile exists, pick the
one with most BI demand - Insert_Buffers into ? insert all those buffers
whose FRs intersect with ? to create BB w/o area
penalty or insert one buffer into ? to expand
its channel - Update chip dimension, FR, and area slacks, etc.
-
21Experimental Setting
- Two Scenarios (for buffer insertion flexibility)
- RES restricted buffer insertion position(s) as
to minimize delay - FR feasible buffer region as to meet delay
constraint - Two Algorithms (for buffer clustering)
- RDM a buffer is randomly assigned to any
feasible location - BBP buffers are assigned with appropriate
clustering - 6 MCNC 5 randomly generated circuits (0.18um
tech)
22 nets that meet delay constraints
FR provides a lot more flexibility than RES
(e.g., to avoid obstacles) during BI, thus
can better meet delay constraints
23 Area Increase () due to BI
BBP/FR can effectively cluster individual buffers
together with marginal area increase (less than
2 in all above test cases), by high utilization
of dead areas.
24Comparison of BB
BBP reduces BB from RDM by a factor of up to
3x BBP/FR further reduces BB from BBP/RES by up
to 34
25Accomplishments in Year 2
- Efficient and accurate interconnect estimation
models for multiple-pin nets (Cong-Pan, TAU99) - Buffer block planning for interconnect-driven
floorplanning (Cong-Kong-Pan, ICCAD99) - Further study on interconnect architecture
planning - Our two width-planning is still valid for certain
range (2x) of driver size variation - Currently investigating wider range of variations
26Deliverables
- Development of efficient and accurate
interconnect performance estimation models for
interconnect-driven synthesis and planning
(Completed - 30-Jun-1999) - Development of interconnect architecture planning
framework (Completed - 30-Jun-1999) - Development of efficient algorithms for
integrated interconnect planning floorplanning
capabilities at the physical level (Completed -
30-Sep-1999) - Development validation of accurate noise models
to guide the interconnect synthesis algorithm for
signal reliability (Planned - 31-Dec-1999) - Development of optimal or near-optimal
interconnect synthesis algorithm for multiple
spatially or temporally related signal nets for
performance signal reliability optimization
(Planned - 31-Dec-1999) - Development of efficient algorithms for
integrated interconnect planning floorplanning
capabilities at the RTL-level Software (Planned
- 31-Dec-2000)
27Technology Transfer
- TRIO (TRIO-Repeater-Interconnect-Optimization)
package - Integrated into Intel design technology
- Available on the web
- http//cadlab.cs.ucla.edu/trio
- IDEM (Interconnect Delay Estimation Model)
package - Prototype provided to Intel
- Package will be available this week to all SRC
member companies - http//cadlab.cs.ucla.edu/trio
- BBP (Buffer Block Planning) for physical level
floorplanning - Interest from Intel and HP
28Summary and Future Work
- Efficient and accurate interconnect estimation
models - Interconnect architecture planning
- Buffer Block Planning
- Future Work
- Noise estimation and planning
- RTL interconnect planning
29Milestones
- Development of a computational model for
interconnect architecture planning based on a
given design characterization (specified in terms
of target clock rate, interconnect distribution,
depths of logic,network, etc.) (31-Dec-1998) - Development of estimation models for interconnect
layout optimizations suitable for pre-layout
synthesis and planning (31-Dec-1998) - Development of efficient algorithms for
integrated interconnect planning and
floorplanning capabilities at the RTL-level
(31-Dec-1999) - Completion of the ongoing effort on the
development on a multi-layer general-area
gridless routing system (31-Dec-1999) - Development of optimal or near-optimal
interconnect synthesis algorithm for multiple
spatially or temporally related signal nets for
performance and signal reliability optimization
(31-Dec-1999) - Development and validation of very efficient but
accurate noise models to relate the noise with
the physical parameters to guide the interconnect
synthesis algorithm for signal reliability
optimization (31-Dec-1999) - Development of efficient algorithms for
integrated interconnect planning and
floorplanning capabilities at the physical level
(31-Dec-1999) - Development of efficient algorithms for
integrated interconnect planning and
floorplanning capabilities at the RT-level
(31-Dec-2000)
30Review Accomplishments in Year 1
- Efficient and accurate interconnect delay
estimation models for 2-pin nets (Cong-Pan,
ASPDAC99) - Example optimal wire sizing (OWS)
- Interconnect architecture planning (DAC99)
- Pre-design wire-width planning.
- Close to optimal solution by layout optimization
- Can handle different objective functions
31Interconnect Architecture Planning
- Problem pre-design interconnect planning
- wire width planning for each metal layer
- performance-area optimization
- Motivation
- there exist certain globally optimal widths for
a range of interconnect lengths - use fixed globally optimal wire widths to ease
routing at each metal layer, while still
guarantee close-to optimal performance
32Basic Approach
- For each metal pair (tier), assume certain wire
length range - Find out the best 1-width and 2-width designs
(since we show that 1-WS and 2-WS are close to
OWS in terms of both delay and area) - Different objective functions
- T (performance optimization only)
- AT4 (performance-driven but area-saving)
- ...
- Can consider different weight function ?(l)
33Overall Flow
- For each tier i
- Assume length range Lmin and Lmax
- Find W (for 1-width design) or
- W1 and W2 (for 2-width design)
- to minimize
(performance only)
(performance-driven and area-saving)
or
34Example Width Design Result for 0.10um
- Tier4 under 0.10um tech ( 8 - 23 mm)
- 2-W design 1.0, 2.0um superior than 1-W
1.98um - delay reduction up to 12.4
- area saving up to 46 !
- 2-W vs. Many-Width (Cong, ICCAD97
- delay difference less than 5.4
- area difference less than 4.7
35Recommendation for Future Tech.
- 2-WS design under objective function of AT4
- Wiring hierarchy for both performance and density
!
36Two Key Steps in BBP
- Pick_A_Tile ? two modes
- 1. If there exists dead area pick the ? that can
insert max. of buffers w/o chip area increase - 2. Otherwise pick the ? that has maximum BI
demand - Insert_Buffers into ? also two modes
- 1. If ? has dead area insert those buffers
whose FRs intersect with ? if more buffers
than the capacitity of ?, insert the most
constrained ones first gt Buffer Blocks - 2. Otherwise insert one buffer (with
tightest FR cosntraint) into ? gt minimize the
area increase due to channel expansion
37 nets that meet delay constraints
FR provides a lot more flexibility than RES
(e.g., to avoid obstacles) during BI, thus
can better meet delay constraints
38 Area Increase () due to BI
BBP/FR can effectively cluster individual buffers
together with marginal area increase (less than
2 in all above test cases), by high utilization
of dead areas.
39Comparison of BB
BBP reduces BB from RDM by a factor of up to
3x BBP/FR further reduces BB from BBP/RES by up
to 34