Title: Geen diatitel
1A Priori System-LevelInterconnect PredictionThe
Road to Future Computer Systems
Dirk Stroobandt Ghent University Electronics and
Information Systems Department
Presentation at Intel June 16th, 2000
2Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
3Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
4Why do we needa priori interconnect prediction?
- Importance of wires increases (they do not scale
as components). - For future designs, very little is known.
Roadmapping uses a priori estimation techniques. - To improve CAD tools for design layout
generation. - CAD tools have to take into account timing
constraints, area constraints, performance, power
dissipation - All these constraints wires should be as short
as possible. - Estimation at early stage aids the CAD tools in
finding a better solution through fewer design
cycle iterations.
5Why do we needa priori interconnect prediction?
- To evaluate new computer architectures
- To adhere to the increasing performance demands,
new computer architectures are needed. - Each of them must be evaluated thoroughly.
- A priori estimates immediately provide a ground
for drawing preliminary conclusions. - Different architectures can be compared to each
other. - Applications for evaluating three-dimensional
(opto-electronic) architectures, FPGAs, MCMs,...
6Components of thephysical design step
circuit
architecture
Layout generation
layout
7Circuit model
Logic block
Net
External net
Terminal / pin
Multi-terminal nets have a net degree gt 2
8Model for partitioning
8 nets cut
4 nets cut
Optimal partitioning minimal number of nets cut
9Model for partitioning
10The three basic models
11The three basic models
Optimal placement placement with minimal total
wire length over all possible placements.
- Optimal routing routing through shortest path
- requires channels with sufficiently high density
- (or enough routing layers)
- for multi-terminal nets Steiner trees
- This defines the net length for known endpoints
Placement and routing model
12Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
13Rents rule
Rents rule was first described by Landman and
Russo, 1971 For average number of terminals and
blocks per module
100
T
p Rent exponent
t average term./block
10
Measure for the complexity of the interconnection
topology
(simple) 0 ? p ? 1 (complex)
average
Rents rule
Normal values 0.5 ? p ? 0.75
1
1
1000
10
100
B
14Rents rule
If ?B cells are added, what is the increase
?T? In the absence of any other information we
guess
?T
?B
B
Overestimate many of ?T terminals connect to T
terminals and so do not contribute to the
total. We introduce a factor p (p lt1) which
indicates how self connected the netlist is
T
Statistically homogenous system
Or, if ?B ?T are small compared to B and T
15Rents rule
p
T t B
Rents rule is experimentally validated for a lot
of real circuits and for different
partitioning methodologies.
- Distinguish between
- p intrinsic Rent exponent
- p Rent exponent for
- a given placement
- p Rent exponent for
- a given partitioning
average
Rents rule
Deviation for high B and T Rents region II
(cfr. later).
16Rents rule
Rents rule is a result of the self-similarity
within circuits
Assumption interconnection complexity is equal
at all levels.
17Extension the local Rent exponent
- Variations in Rents rule
- global variations (e.g., lower complexity after
Technology mapping of the circuit, duplication) - local variations.
- Two kinds of local variations in Rents rule
- hierarchical locality some hierarchical levels
are more complex than others - spatial locality some circuit parts are more
complex than others. - Both are deviations from Rents rule that can be
modelled well.
18Hierarchical locality Rents region II
- Causes of region II
- - pin limitation problem
- - parallel to serial (complexity is moved from
space to time, number of pins is lowered) - - coding (input and output stream compact).
average
Rents rule
19Hierarchical locality region III
- For some circuits also deviation at low end
Stroobandt, GLSVLSI 99. - Mismatch between the available (library) and the
desired (design) complexity of interconnect
topology. - Only for circuits with logic blocks that have
many inputs.
T
20Hierarchical locality modelling
- Use incremental Rent exponent (proportional to
the slope of Rents curve in a single point) Van
Marck et al., ISYCS 1995.
21Spatial locality in Rents rule
- Inhomogeneous circuits different parts have
different interconnection complexity. - For separate parts
Only one Rent exponent (heterogeneous) might not
be realistic. Clustering simple parts will be
absorbed by complex parts.
22Local Rent exponent
- Higher partitioning levels Rent exponents will
merge. - Spreading of the values with steep slope
(decreasing) for complex part and gentle slope
(increasing) for simple part. - Local Rent exponent
- tangent slope of the line that combines all
partitions containing the local block(s).
1
1
2
1
T
1
2
1
1
2
2
2
2
B
23Heterogeneous Rents rule
- Suggested by Zarkesh-Ha, Davis, Loh, and
Meindl,98 - Weighted arithmetic average of the logarithm of
T - Heterogeneous Rents rule (for 2 parts)
24Use of Rents rule in CAD
- Rents rule is very powerful as a measure of the
complexity of the interconnection topology - Can aid in the partitioning process
- Benchmark generators are based on Rents rule
- Is basis for a priori estimates in CAD
25Rents rule in partitioning
- Actual goal minimize the number of pins per
module. - We should use a pin count criterion.
- External multi-terminal
- nets lead to only one
- new pin instead of two
- when cut.
- Preferring external nets
- to be cut will better keep
- clusters together.
26Rents rule in partitioning
- Solution use a new ratio value (in ratiocut
partitioning) based on terminal count
Stroobandt, ISCAS99 - Better partitions are obtained because the total
number of pins for each module is taken into
account by the cost function.
27Rents rule in partitioning
- Better (ratio cut) heuristic by using terminal
count prediction Stroobandt, ISCAS99. - Clustering property of the ratio cut use Rents
rule instead of uniformly distributed random
graph. - New ratio
- Instead of old ratio
28Rents rule in partitioning
- Important (especially in pin-limited designs)
terminal balancing Stroobandt, Swiss
CAD/CAM99. - Minimizing the terminal count alone is not enough.
Additional cost function for terminal balancing
Terminal
29Rents rule in benchmark generation
- Generating benchmarks in a hierarchical way
- Rents rule is used for estimating the number of
connections - Other parameters have to be controlled as well
- Classical parameters
- total number of gates
- total number of nets
- total number of pins
- Gate terminal distribution
- Net degree distribution
- Other issues gate functionality, redundancy,
timing constraints, ...
30Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
31Donaths hierarchical placement model
1. Partition the circuit into 4 modules of equal
size such that Rents rule applies (minimal
number of pins).
2. Partition the Manhattan grid in 4 subgrids of
equal size in a symmetrical way.
32Donaths hierarchical placement model
3. Each subcircuit (module) is mapped to a
subgrid.
4. Repeat recursively until all logic blocks are
assigned to exactly one grid cell in the
Manhattan grid.
33Donaths length estimation model
- At each level Rents rule gives number of
connections - number of terminals per module directly from
Rents rule (partitioning based Rent exponent
p) - every net not cut before (internal net) 2 new
terminals - every net previously cut (external net) 1 new
terminal - assumption ratio f (internal nets)/(nets
cut) is constant over all levels k Stroobandt
and Kurdahi, GLSVLSI98 - number of nets cut at level k (Nk) equals
- where ?1/(1f) ? depends on the total number
of nets in the circuit and is bounded by 0.5 and
1.
34Donaths length estimation model
Length of the connections at level k ?
Adjacent (A-) combination
Diagonal (D-) combination
?
Donath assumes all connection source and
destination cells are uniformly distributed over
the grid.
35Average interconnection length
- Number of connections at level k
- Average length A-combination
- Average length D-combination
- Average length level k
- Total average length with
- and 2K G total number of gates
36Results Donath
Scaling of the average length L as a function of
the number of logic blocks G
Similar to measurements on placed designs.
37Results Donath
Theoretical average wire length too high by a
factor 2
38Including optimal placement model
- Keep wire length scaling by hierarchical
placement. - Improve on uniform probability for all
connections at one level (not a good model for an
optimal placement).
Enumeration site density function (only
architecture dependent). Occupying probability
favours short interconnections (for an optimal
placement) (darker)
39Including optimal placement model
- Wirelength distributions contain two parts
- site density function and probability distribution
all possibilities requires enumeration (use
generating polynomials, Stroobandt Van Marck,
SLIP2000)
probability of occurrence shorter wires more
probable Stroobandt, VLSI Design vol. 10, 1999
40Wire length distribution
Local distributions at each level have similar
shapes (self-similarity) ? peak values
scale. Integral of local distributions equals
number of connections. Global distribution
follows peaks.
- From this we can deduct that
- For short lengths
41Occupying probability results
Use probability on each hierarchical level (local
distributions).
8
Occupying prob.
7
Donath
6
experiment
5
L
4
3
2
1
0
10000
10
100
1000
G
42Occupying probability results
Effect of the occupying probability boosting the
local wire length distributions (per level) for
short wire lengths
percent of wires
Occupying prob.
Donath
100
global trend
global trend
10
per level
per level
1
total
total
0,1
0,01
10-3
10-4
10000
1
10
100
1000
1
10
100
1000
10000
Wire length
Wire length
43Occupying probability results
Effect of the occupying probability on the total
distribution more short wires less long
wires ? average wire length is shorter
percent wires
100
Donath
10
Occupying prob.
1
10-1
10-2
10-3
10-4
10-5
1
10
100
1000
10000
Wire length
44Occupying probability results
Percent wires
60
Donath
50
-8
Occupying prob.
-23
global trend
40
30
10
20
6
10
1
10
3
4
5
6
7
8
2
9
Wire length
45Occupying probability results
Number of wires
1000
Donath
Occupying prob.
100
measurement
10
1
0,1
1
100
10
Wire length
46Davis probability function
- Introduced by Davis, De, and Meindl, IEEE T El.
Dev., 98. - Number of interconnections at distance l is
calculated for every gate separately, using
Rents rule. - Three regions gate under investigation (A),
target gates (C), and gates in between (B). - Number of connections between A and C is
calculated.
This approach alleviates the discrete effects at
the boundaries of the hierarchical levels while
maintaining the scaling behaviour.
47Davis probability function
-
-
C
B
C
C
TA?C
-
-
TAB
TBC
TB
TABC
B
B
B
C
C
B
B
A
B
B
C
C
Assumption net cannot connect A,B, and C
B
B
B
C
C
B
C
C
C
48Davis probability function
For cells placed in infinite 2D plane
49Planar wirelength model A
28
Finite system, BtotL2, no edges, approximate
form for q?(l)
50Planar wirelength model B (Davis)
29
L
Finite system, BtotL2, includes edge
effects, use q(l)
51Planar wirelength model comparison
30
Btot 1024 p 0.66
Model A
Model A Lav 4.53 Model B Lav 2.27
Model B
52Relationship between models from Davis (planar
model B) and Stroobandt (hierarchical model C)
Db(l)
Dc(l,h)
same q(l) essentially identical!
53Hierarchical wirelength model comparison
Ctot 1024 p 0.66
Model D
Model C (Stroobandt) Lav 2.05 Model D
(Donath) Lav 5.14
Model C q(l) and hierarchy Model D only
hierarchy (q(l)1)
Model C
54Planar and hierarchical model comparison
Model A
Model D
Model B
Model C
Models B (planar) and C (hierarchical ) are
equivalent if the Rent exponent used for the
probability function (depends on placement) and
the one used for the number of nets per
hierarchical level (based on partitioning) are
the same
55Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit Characterization
- Conclusions
56Extension to three-dimensional grids
57Anisotropic systems
58Anisotropic systems
- Basic method Donaths method in 3D
- Not all dimensions are equal (e.g., optical links
in 3rd D) - possibly larger latency of the optical link
(compared to intra-chip connection) - influence of the spacing of the optical links
across the area (detours may have to be made) - limitation of number of
- optical layers
- Introducing an optical cost
59Anisotropic systems
- If limited number of layers use third dimension
for topmost hierarchical levels (fewest
interconnections). - For lower levels 2D method.
- 2D and 1D partitioning are sometimes used to get
closer to the (optimal in isotropic grids) cubic
form. - Depending on the optical cost, it is advantageous
either to strive for getting to the electrical
plains as soon as possible (high optical cost,
use at high levels only) or to partition the
electrical planes first (low optical cost).
60External nets
- Importance of good wire length estimates for
external nets during the placement process - For highly pin-limited designs placement will be
in a ring-shaped fashion (along the border of the
chip).
61Wire lengths at system level
- At system level many long wires (peak in
distribution).
How to model these? Davis and Meindl
98 estimation based on Rents rule with
the floorplanning blocks as logic
blocks. IMPORTANT!
62Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
63Improving CAD tools for design layout
Digital design is complex
Computer-aided design (CAD)
- More efficient layout generation requires good
wire length estimates. - Layer assignment in routing
- effects of vias, blockages
- congestion, ...
A priori estimates are rough but can already
provide us with a lot of information.
64Evaluating new computer architectures
- Estimation for evaluating and comparing different
architectures
Circuit characterization
We need parameters to classify circuits in
classes and to optimize them. Benchmark
generation based on Rents rule.
65Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit Characterization
- Conclusions
66Technology extrapolation
What is the most power-efficient noise management
strategy?
- Evaluates impact of
- design technology
- process technology
- Evaluates impact on
- achievable design
- associated design problems
- Questions to be addressed
- Sets new requirements for CAD tools and
methodologies - Roadmaps familiar and influential example
How and when do L, SOI, SER, etc. matter?
Will layout tools need to perform process
simulation to efficiently model cross-die and
cross-wafer manufacturing variation?
67Current extrapolation systems
- Previous and ongoing efforts
- ITRS Roadmaps
- Tools SUSPENS, GENESYS, RIPE, BACPAC,
- Numerous tools in industry
- Use models for
- delay
- power
- architecture
- wirelength estimation
- ...
68GTX GSRC technology extrapolation system
- GTX is set up as a framework for technology
extrapolation - Caldwell et al., DAC 2000
69GTX
- Check it out at http//vlsicad.cs.ucla.edu/GSRC/GT
X/
70Models of achievable routing
- wirelength estimation models (Donath, )
- actual placement information
- Required versus available resources
Required versus available resources
71Models of achievable routing
Required versus available resources
limited by routing efficiency factor hr
72Models of achievable routing
Required versus available resources
limited by power/ground (signal net fraction si)
73Models of achievable routing
Required versus available resources
limited by via impact factor vi (ripple
effect) utilization factor Ui (available /
supplied area)
74Use of achievable routing models
- Optimizing interconnect process parameters for
future designs (number of layers, wire width and
pitch per layer, ...) - With given layer characteristics predict the
number of layers needed - If number of layers fixed oracle (not)
routable! - (SUSPENS, GENESYS, RIPE, BACPAC, GTX)
- Supplying objectives that guide layout tools to
promising solutions (wire planning) - Kahng, Mantik and Stroobandt, ISPD 2000
75Layer assignment in routing
- DSM design routing tools have to account for
- delay constraints
- yield
- power
-
- Conventional technique
- router assigns wires to layers
- wire sizing, repeater insertion/sizing applied
- More interesting approach
- wire sizing etc. used by router to assign wires
Kahng and Stroobandt, SLIP 2000
76Our Layer Assignment Concept
- Search for optimal layer for a wire with
- optimal wire size, number and size of repeaters
for each wire - meeting consistent stage delay constraints
- taking total repeater area constraint into
account - accounting for impact of vias
- A priori estimation techniques make it useful for
application both before / after placement - Potential applications
- improving CAD layout tools
- studying effects of technological parameters
- optimizing fabrication process
77Problem and Models
Find the optimal assignment of wires to
wiring layers subject to delay constraints and
total repeater area constraints
- Optimization objective of layers needed
- Degrees of freedom (for each wire)
- choice of layer parameters
- wire width
- number of repeaters
- size of the repeaters
78Layer Assignment method
- 2 phases optimize, then round to integers
- Phase 1
Calculate minimal delay Tmin.
79A Typical Example
Tier type 2
Tier type 1
Tier type 0
Wire width (mm)
Delay (ps)
0
2
4
0
0
2
2
4
Number of repeaters
Wirelength (mm)
80Target Delay Influence
Tier type 2
Tier type 1
Tier type 0
Delay (ps)
Wire width (mm)
Wirelength (mm)
Wirelength (mm)
81Uniform Versus Non-uniform Stacks
82Optimal Layer Stack Monotonic?
83Outline
- Why do we need a priori interconnect prediction?
- Basic models
- Rents rule with extensions and applications
- A priori wirelength prediction
- New evolutions
- Applications
- CAD (extrapolation, achievable routing, layer
assignment) - Evaluation of new computer architectures
- Circuit characterization
- Conclusions
84Opto-electronic FPGAs
Research question Is the use of massive optical
interconnects at the logic level in general
purpose electrical systems meaningful?
- Answer depends on whether the properties of
optical interconnections are comparable (or
better) than those of the electrical ones they
replace or complement - ? Such a situation presents itself in electrical
FPGAs
85Area I/O in FPGAs
- Future multi-FPGA emulators will face the
following problems - ASICs will keep growing the need for multi-FPGA
emulators will stay - FPGAs will have increasing numbers of CLBs
- Complex designs have high Rent Exponents
- ? Pin and inter-FPGA interconnect limitations
will keep increasing - Area I/O provides significant benefits in FPGAs
J. Depreitere, H. Van Marck, J. Van Campenhout,
Microprocessors and Microsystems 21, 1998, pp. 89
-- 97
86Why Area I/O in FPGAs ?
(a) gain because wires do not have to be routed
all the way to the perimeter
(a) gain because wires do not have to be routed
all the way to the perimeter (b) gain when pin
limitation problems are also considered
- Area I/O provides significant benefits in FPGAs
J. Depreitere, H. Van Marck, J. Van Campenhout,
Microprocessors and Microsystems 21, 1998, pp. 89
-- 97
87Why 3D FPGAs?
- Different asymptotic average wire length
Two-dimensional
Three-dimensional
J. Van Campenhout, H. Van Marck, J. Depreitere,
J. Dambre, IEEE J. Sel. Topics in Quant. Electr.
on Smart Phot. Comp., Interconnects, and Proc.
(5)2, 1999, pp. 306 -- 315
88Why 3D FPGAs ?
- Wire length distribution differs significantly
J. Van Campenhout, H. Van Marck, J. Depreitere,
J. Dambre, IEEE J. Sel. Topics in Quant. Electr.
on Smart Phot. Comp., Interconnects, and Proc.
(5)2, 1999, pp. 306 -- 315
89Effect of Anisotropy
- Benefits are lower if anisotropy is higher
Average wire length
Average wire length
7.5
5
Cost 24
7
6.5
4
6
Cost 16
5.5
3
5
4.5
Cost 8
2
4
3.5
1
Cost 1
3
0
2.5
1
1
4
5
10
15
20
25
2
8
6
16
10
12
14
Relative anisotropic cost
Number of layers (4096 gates)
J. Van Campenhout, H. Van Marck, J. Depreitere,
J. Dambre, IEEE J. Sel. Topics in Quant. Electr.
on Smart Phot. Comp., Interconnects, and Proc.
(5)2, 1999, pp. 306 -- 315
90What Could It Look Like ?
- A 3-D extension of the electrical on-chip
interconnect fabric - offers a highly compact and densely
interconnected multi-FPGA system - should provide an essentially 3-D routing
environment, leading to shorter average wire
lengths, hence faster systems - should provide an increased routability of
complex designs - Other (hierarchical) interconnect schemes could
be envisioned
91Overview of the built system
Each FPGA has 128 optical receivers and 128
transmitters System is designed for 80MHz (85 MHz
measured) FPGA chip contains about 165,000
transistors About 1/3 of the FPGA chip is used
for the optics
92An optical prototype
- 0.6 mm chip
- standard 145
- pin PGA
- socket
- 4 x 8 InP
- detecor
- arrays
- 4 x 8 LEDs
- (VCSELs)
93Conclusion
- Wire length estimates are becoming more and more
important. - A priori estimates can provide a lot of
information at virtually no cost. - Methods are based on Rents rule.
- Important for future research how can we build a
priori estimates into CAD layout tools? - More information at http//www.elis.rug.ac.be/dst
r/ - Check out the International Workshop on
System-level Interconnect Prediction (SLIP) at
http//vlsicad.cs.ucla.edu/SLIP2000/