Chickens, Ostriches, and the Interconnect Problem - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Chickens, Ostriches, and the Interconnect Problem

Description:

Recent scary predictions: Buffers will invade synthesis blocks ... No AWE or Spice--too slow. Arvind and Patrika can tell you what you should do... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 64
Provided by: patrick83
Category:

less

Transcript and Presenter's Notes

Title: Chickens, Ostriches, and the Interconnect Problem


1
Chickens, Ostriches, and the Interconnect Problem
  • Prof. Patrick H. Madden
  • University of Kitakyushu
  • pmadden_at_acm.org

2
nVidia chip designs....
Designs are getting larger -- and harder to
do. (Table from Chris Malachowski nVidia talk at
ISPD01).
3
Power DensityWill Get Even Worse
  • Need to Keep the Junctions Cool
  • Performance (Higher Frequency)
  • Lower leakage (Exponential)
  • Better reliability (Exponential)

Pat Gelsinger, ISSCC 2001
4
System Level PerspectiveImprovements in each
area multiply their effect
  • Architecture
  • CISC, RISC, mass-par, TransMeta
  • Synthesis
  • VHDL, Verilog, custom datapath
  • Floorplan
  • Better representations
  • Placement
  • Annealing, better partitioners, analytic,
  • Routing
  • Linsker cost functions, multi-commodity flow
  • Switch from channel to over-the-cell
  • Lithography/Fab
  • 2x improvement every 18 months, copper, strained
    silicon
  • Chip Packaging
  • Flip-chip, BGA, SIMM,
  • System Packaging
  • Quieter fans, better batteries
  • Other
  • Better software, new markets, colored plastic

5
Overview
  • Interconnect Trouble in the Next Few Years
  • Chicken or Ostrich? Im a Chicken.
  • Doing Design and Getting the Wires Right
  • The problem will not go away, so we need to have
    good ways to minimize the impact.
  • Summary and Future Research Directions

6
Related Publications
  • DATE 2003 Crosstalk Aware Detail Routing
  • R. M. Smey, P. H. Madden
  • DAC 2003 Amplified Congestion Global Routing
  • R. T. Hadsell, P. H. Madden
  • ICCAD 2003 Fractional Cut Improved Recursive
    Bisection Based Placement
  • A. Agnihotri, S. Ono, A. Khatkhate, A. Mathur, M.
    C. Yildiz, P. H. Madden
  • ISPD 2004 Mixed Size Placement
  • A. Khatkhate, C. Li, A. R. Agnihotri, S. Ono, M.
    C. Yildiz, C.-K. Koh, P. H. Madden
  • MWSCAS 2004 Clustering and Combinatorial
    Placement
  • S. Ono, P. H. Madden
  • SASIMI 2004 Lithography and Manufacturability
    Interconnect Synthesis
  • S. Pujari, R. M. Smey, Y. Tan, H. H. Madden, P.
    H. Madden
  • ICCAD 2004 White Space Allocation
  • C. Li, M. Xu, C.-K. Koh, J. Cong, P. H. Madden
  • ASPDAC 2005 Detail Placement by Branch and
    Price
  • P. Ramachandran, A. Agnihotri, S. Ono, P.
    Damodaran, H. Srihari, P. H. Madden
  • ASPDAC 2005 Buffer and Repeater Insertion
  • C. Li, C.-K. Koh, P. H. Madden
  • ASPDAC 2005 Optimality and Scalability Study

7
The most serious problem? Interconnect.
  • The next few slides are taken from Desmond
    Kirkpatrick and Prashant Saxena (Intel), from the
    ISPD04 Chicken panel

8
Interconnect Chicken Littles (an Ostrich
Viewpoint)
Interconnects are dominating! Interconnects are
dominating!
  • Chicken Little a story of mass hysteria
  • An acorn hits a chick called Chicken Little on
    the head
  • Chicken Little proclaims the sky is falling
    the sky is falling and runs down the road.
  • Each character (Henny Penny, Goosy Loosy) who
    hears her story runs behind her, propagating the
    story to the next.

Interconnect-dominated
Interconnect-driven
1995 IEDM Bohr, Interconnect Scaling The Real
Limiter to High Performance VLSI
Deep-submicron
95
98
01
04
Rise of the Interconnect Chicken Little Era
9
Interconnect Ostriches (a Chicken Viewpoint)
Interconnects dont scare me Interconnects dont
scare me
  • Ostrich a symbol of deep denial
  • Despite being an aggressive 40lb bird
  • When frightened, an Ostrich is reputed to hide
    its head in the sand

1999 ICCAD Ho/Horowitz, Interconnect Scaling
Implications for CAD - Sylvester/Keutzer used
average wires
1998 ICCAD Sylvester/Keutzer, Getting to the
Bottom of Deep Submicron -50k gates no
problem!
Copper / lowK promises
95
98
01
04
Interconnect Ostrich Malaise Era
10
Revenge of the Interconnect Chicken Littles?
  • Interconnect Buffering debate
  • Primarily a paper debate
  • Consensus global interconnect phenomenon
  • (e.g. Tau 99 Keutzer/Pillegi agree
    floorplanning will need more work)
  • Recent scary predictions
  • Buffers will invade synthesis blocks
  • Meshes / fabrics / grids of buffers will replace
    individual elements / stations

2002 ISPD Saxena, et al, The Scaling
Challenge Can Correct-by-Construction Design?
Help - 70 of cells will be interconnect buffers
at 32nm
95
98
01
04
Revenge of the Chicken Littles
11
  • Exploding buffer counts will break todays block
    design paradigms
  • All realistic scaling projections encounter this
    problem

12
Prashant Saxena is an Optimist
  • Exploding buffer counts will break everything,
    block design or not
  • We are all in a lot more trouble than you might
    expect
  • Dont worry about the ITRS roadmap for 2010.
    Were not going to get there.
  • Stop chasing Moores Law, and start using Jobs
    Law
  • Colored plastic can make people buy stuff

13
What is to be done?
  • A partial solution minimize the length of the
    interconnect.
  • Note I am firmly in the Chicken Little camp.

14
Circuit Layout
  • We have a logic diagram
  • Where to place the cells?
  • We know that were going to have to change gate
    sizes
  • We know that were going to have to insert
    buffers
  • We know that well need additional space for
    routing
  • Reserve space to handle this?
  • No -- bad idea! Very bad!
  • So if you dont reserve space.

15
Dense Placements
  • The placements from Feng Shui contain absolutely
    no internal space
  • This is intentional
  • This leaves no space for buffers, sizing,
    routing, .
  • Dont worry, it will all be OK

16
Fractional Cut Placement
  • By improving placement -- we strike at the root
    of the interconnect problem
  • Method must be scalable methods that work well
    on small problems are not relevant

17
The Placement Problem
Common approach make each gate rectangular, and
arrange them like bricks. The problem.... where
do you put each brick? (And how do you run the
wires?)
Tens of millions of tiny pieces of metal
Millions of gates
18
Leading Placement Methods
  • Force-Directed/Linear Programming
  • Simulated Annealing
  • Recursive Bisection
  • Split the logic into two groups minimize the
    number of wires between the groups
  • Place one group in the top half of the chip, and
    the other in the bottom half
  • Why use bisection? It scales better than the
    other methods, and still gets good results.

19
Bisection Based Placement
Logic elements
Semiconductor chip
20
Recursive Bisection Placement
21
Non-Traditional Approach
  • In bisection, cut lines are placed between rows
  • Our idea--ignore row boundaries, and place cut
    lines where the relative areas suggest
  • This is Fractional Cut bisection
  • Requires legalization to align cells with rows

22
Ignoring Row Boundaries
Row boundaries are in blue Black outlined
rectangles are the regions Numbers indicate
total cell areas (there may be a number of cells
in each region).
23
Experimental Results
24
This is the start of a solution.
  • But large designs are not just standard cells.

25
Mixed Block Design
Hundreds of large blocks, millions of small
cells. Placement must deal with large size
differential. Also called boulders and dust
problem.
26
Recent previous work
  • Capo Parquet - ISPD 02, ICCAD 03
  • Shred macros, global placement.
  • Form groups of standard cells, run fixed outline
    floor planner.
  • Fix macros, place standard cells.
  • mPG-ms - ASPDAC 03.
  • Coarsening - cluster macros and standard cells.
  • Refinement - large macros are fixed gradually
    removing overlaps, carry on refinement on smaller
    objects.

Objective in these works only HPWL minimization
27
Our approach
  • Global placement using Fractional Cut based
    recursive bisection.
  • Greedy legalization.
  • Branch Bound reordering on standard cells.

28
Global placement
  • Fractional cut approach (ICCAD03)
  • Recursive bisection, but cut lines are not
    restricted to row boundaries. Instead, use
    legalization after bisection.
  • Key insight bisection can handle both standard
    cells and macro blocks
  • Partition line is located based on total area of
    each side block shapes are not considered
  • Multilevel clustering based partitioner (hMetis),
    with multiple random starts
  • Large blocks have an opportunity to start on
    either side of the partition we are not locked
    in place

29
Example
30
Mixed Block Enhancement
  • Output of the Global Placer - rough distribution
    of cells/macros across the core area.
  • Area constraints and fractional cut lines ensure
    that distribution is even.
  • There is some overlap.
  • Cells and macro blocks are not row-aligned.

31
IBM01 before legalization
32
Placement legalization
  • Legalization is the stage where we remove
    overlaps, align cells with rows, and cell
    positions match the site widths in the circuit.
  • First (abandoned) approach -
  • Remove macro overlap using a recursive search
    procedure
  • Cell legalization by dynamic programming based
    method (similar to ICCAD03 paper).
  • Complex, many lines of code, and a great deal of
    work. Also not very good.
  • Better method leverage the uniform area demand
    to allow a less complex legalizer.

33
Greedy legalization
  • Sort cells/macros by left-edge locations.
  • Initialize right edge of each row.
  • For Each Object
  • Greedy assignment of an object to a row, for min
    displacement.
  • If no overlap, leave at abstract placement X
    position otherwise shift to the right to avoid
    overlap
  • Macro placement must check multiple rows
  • Update right edge profiles for the rows across
    which the object spans.
  • This method extends a prior standard cell
    legalization method by Dwight Hill (US Patent
    6,370,673)
  • Also used in Kahng/Wang APlace paper

34
Legalization
35
Experimental results
  • We tested Feng Shui 2.4 on the 18 IBM mixed block
    benchmarks on the GSRC Bookshelf web site -
  • http//www.gigascale.org/bookshelf
  • Comparison with
  • CapoParquet (I, II) ISPD2002
  • mPG-MS ASPDAC2003
  • CapoParquet (III) ICCAD2003
  • All publications focus on HPWL minimization
    timing and routing are not considered.

36
Mixed Block Placement
37
Experimental Results
As much as 51 better on some benchmarks.
Closest is around 8, for the design that doesnt
have macro blocks.
38
Upcoming ICCAD05 mixed block papers
  • Only one is able to improve on our results
  • APlace paper from Andrew Kahngs group -- and
    they use our legalization method and detail
    placer
  • Others are from 10 to 30 higher wire lengths
  • Our mixed block work is a major step forward, and
    helps mitigate the growing interconnect problem.

39
Wire Length Minimization is good.
  • But what about routing wires?
  • These slides are from a talk to be presented by
    Chen Li at ICCAD04
  • The paper is a collaboration between Purdue,
    Binghamton, and UCLA

40
Motivation and previous work
  • Objective of placement tools wirelength and
    routability
  • Routabiltiy control in global placement
  • Incorporating congestion into cost function
  • Cell movement
  • Routability control in detailed placement
  • Region expanding
  • White space allocation

41
Congestion Estimation
  • Congestion estimation
  • Routing resource estimation
  • based on width spacing of wires in layers
  • Routing demand estimation
  • decompose MST of net into two-pin connections
  • two-bend LZ routes for each two-pin connections
  • Congestion (overflow)

42
WSA White Space Allocation
  • Idea for routing demand-resource matching
  • Fractional cut
  • Cutline shifting
  • Flow of WSA
  • Slicing tree Construction
  • Congestion estimation on tree
  • White space adjustment
  • Detailed placer

43
Slicing Tree Construction
44
White Space Adjustment
Before cutline shifting
45
White Space Adjustment
Level 0
46
White Space Adjustment
Level 1
47
White Space Adjustment
Level 2 WSA finished
48
Detailed Placer
  • Objective maintain white space distribution and
    further reduce HPWL
  • For example, DOMINO cannot be applied here
  • Greedy legalization
  • Remove overlaps
  • Sliding window-based local minimization
  • White space is considered as pseudo-cells

49
Experimental Setup
  • IBM v2 easy and hard benchmarks (16 circuits)
  • All publicly available placers
  • Dragon-fd congestion-driven mode
  • CAPO, Feng Shui, mPL
  • mPG congestion mode off, QPLACE ECO for
    legalization
  • QPLACE
  • All placers are run 5 times except QPLACE, mPL
    and our tools once.
  • WRoute (SE5.3) to evaluate routability

50
Experimental Results
  • Our flow vs. other tools

100 successful routings
8.124.5 reduction on routed WL
51
Experimental Results
  • Impacts of various techniques in out flow

Both techniques improve routability. Combined
flow work best.
Both techniques reduce routed WL
52
Experimental Results
  • WSA on placements generated by other tools

Improvement on routability except for QPLACE
1.18.0 reduction on routed WL compared to
original tools
53
Improved Routability
  • mPL-R
  • Routability-driven global placement reduces
    routing demands through cell-replacement based on
    accurate congestion estimation
  • WSA
  • Routability-driven detailed placement allocates
    routing resources into congested regions
  • Successful routings on all easy hard IBM
    benchmarks
  • shortest routed WL competitive placement
    runtime

54
Dense Placement?
  • While the original placement is dense
  • We can stretch with WSA to get routability
  • We can also stretch to do gate sizing and buffer
    insertion (upcoming ASPDAC paper)
  • We can stretch for thermal and noise issues
  • AND THIS IS STABLE
  • Individual net lengths change very little.
  • Conclusion YOU DO NOT NEED TO RESERVE TONS OF
    WHITE SPACE!
  • And on top of that--when you reserve white space,
    you can increase both the power and delay, making
    timing closure harder
  • Traditional white space approaches are based on
    people not knowing how to stretch--not because
    white space is a good idea.

55
Circuit Optimization
  • Start with a good placement
  • Use a fast and accurate delay analysis tool to
    guide placement. No Elmore delay--its not good
    enough. No AWE or Spice--too slow. Arvind and
    Patrika can tell you what you should do....
  • Size gates and insert buffers
  • Stretch the placement as needed
  • Repeat as necessary
  • As the stretch is stable, we converge quickly
  • As the area is minimized, we have lower wire
    lengths, resulting in less up-sizing and fewer
    buffers.
  • Details on this study in the upcoming ASPDAC paper

56
So Weve Reduced Interconnect Lengths.
  • But how far can we go?
  • There is an OPTIMAL solution--is there room left
    for improvement?
  • PEKO benchmarks are constructed with a known
    optimal solution
  • We use the placement tools on these to evaluate
    if there is further room for gain

57
Experimental Results
58
A Bit of FunGlobal vs. Detail Suboptimality
Improving placement results requires an
understanding of what happens during placement.
Method map each cell in the placement to a
pixel from an image. Rearrange the cells
according to optimal placement. What does this
mean? While theres suboptimality at the global
level, were losing a lot in detail placement.
59
Other places to reduce wire lengths?
  • Surprisingly large potential at the detail
    placement level!

60
Traditional Detail Placement
A
B
C
A
C
B
B
A
C
B
C
A
C
A
B
C
B
A
Legalize the placement, then try permutations on
groups of cells in order to improve (wire length,
delay, congestion, .)
61
Branch-and-Bound results
Bigger window means better results, but longer
run time. About half of FS2.0 run time is in
detail placement.
62
Redefining Local
  • Placements are optimal wrt the locations of
    groups of 6 or so
  • But what about bigger groups?
  • PEKO and Grid benchmarks show that placements are
    not optimal
  • If we increase the window size, how much can we
    get?
  • How to increase window size w/o getting hammered
    on run time?

63
Branch-and-Bound Run Times
  • 2x2 window 4! 24 combinations
  • 3x3 window 9! 362880 combinations
  • Runs in about 0.7 seconds on my PC
  • 4x4 window 16! xxxx combinations
  • Around 1 year to find optimal
  • 5x5 window 25! xxxx combinations
  • Multiple exansions and contractions of the
    universe
  • With the method presented here
  • 10x10 has been solved, and were expecting to be
    able to do much larger (with some algorithmic
    clean-up of the code)
  • Its not going to be cheap in terms of run
    time, but it should be feasible to apply

64
Better Detail Placement
  • We know were suboptimal
  • But by how much? And what portion is from global
    problems, how much from local?
  • Global placement, we have annealing, analytic,
    recursive bisection. All heuristics.
  • Detail placement, we have enumeration/branch-and-b
    ound, and some flow based methods. Optimal, but
    small windows.
  • Strategy
  • Apply techniques from OR community
  • Solve local optimization problems with bigger
    windows

65
Short Branch-and-Price Overview
  • Based on linear programming
  • Solve LP problem to find a lower bound solution
    is not neccesarily integer
  • Use column generation to keep the problem size
    manageable
  • Decompose the problem into master and subproblems
  • Branch on non-integers
  • Try 0 and 1 values, evaluate the lower bounds
  • Traverse the decision tree, jumping to the node
    with the best lower bound
  • If we find an integer solution, and node with a
    higher value (integer or not) can be pruned

66
Summary
  • The first few slides make things sound very bad.

67
But I think its an accurate picture
  • Absolutely essential
  • Consider physical constraints in the design
    process
  • Making chips faster and cheaper is no longer easy
    (it was never easy, but its very very hard now)

68
How to Survive the Future?
  • Short term absolutely minimize circuit
    interconnect. Weve made progress here, but
    theres a long way to go.
  • Long term Im pretty sure that colored plastic
    will help, but otherwise.

69
(No Transcript)
70
Design Automation Challenges
  • Handle Moore's Curse
  • The problem doubles in size every 18 months.
  • Late to Market is a disaster
  • Come closer to human-design
  • Estimates are that automated design leaves about
    7 years of technological advances on the table(!)
  • Shield the System Designers from the Device
    Details
  • Timing, Power, Signal Integrity, ...
  • Designers have enough to worry about now

71
Things that Wont Work
  • Massive parallel computers
  • Neural network paradigm shift
  • Much of the current nano and quantum hype

72
Traditional Legalization
  • Align cuts with cell rows
  • After bisection--all cells are within row
    boundaries
  • Sort cells by X position
  • Feng Shui 1.5 also packs cells to the left
  • For MCNC benchmarks, this works well
  • For IBM, Peko, not quite so good

73
After Bisection
74
Dynamic Programming Legalization
  • Process rows one at a time
  • For each row
  • Select a subset of cells such that the total
    horizontal WL of the packed subset, plus the
    penalty for the non-selected cells, is minimized
  • Simple DP formulation obtains good results

75
DP Solution
  • Suppose we have logic elements A, B, C, D, E, F
  • Assume they're all the same width
  • We have space for four of the six
  • Which ones do we put into the row.
  • To minimize the TOTAL distance things move?

76
Example
B
F
A
D
E
C
All the blocks are going to be packed to the
"left." The total distance things travel depends
on which blocks we choose.
77
Some Observations
  • If A is to the left of B before packing
  • It should still be to the left after packing
  • The distance that F travelled depends only on the
    number of blocks to the left of it
  • We don't care which blocks to the left are
    taken--only how many!

78
DP Matrix
Filling the blanks in the table is easy
Cost of moving E to position 3 plus the lowest
cost for filling to location 2 with blocks before
E
Cost of filling to location 3 using blocks before
E
79
Legalization
80
Legalization
81
Standard Cell Placement Tools
  • Other methods include
  • Capo (recursive bisection, U. Michigan)
  • Dragon (simulated annealing, UCLA)
  • Kraftwerk (linear programming, TU Munich)
  • mPL (multilevel slot-based, UCLA)
  • .
  • Objective is to minimize total wire length
  • Benchmark circuits derived from IBM designs
Write a Comment
User Comments (0)
About PowerShow.com