Next Generation VLSI Circuits: Physical Design Issues

About This Presentation

Title:

Next Generation VLSI Circuits: Physical Design Issues

Description:

There's the 'big picture' related to what we do as a research community, and as ... VHDL, Verilog, custom datapath. Floorplan. Better representations. Placement ... – PowerPoint PPT presentation

Number of Views:290

Avg rating:3.0/5.0

Slides: 80

Provided by: patrick83

Category:

more less

Transcript and Presenter's Notes

Title: Next Generation VLSI Circuits: Physical Design Issues

1
Next Generation VLSI CircuitsPhysical Design
Issues

Prof. Patrick H. Madden
University of Kitakyushu
pmadden_at_acm.org

2
Overview

Constraints on Circuit Design
Trouble in the Next Few Years
Placement Research
Scalability, wire lengths, routing, timing, mixed
size designs
Focus on Fractional Cut and related papers
Summary and Future Research Directions
Two levels to consider
Theres the big picture related to what we do
as a research community, and as an industry
Theres the small picture on what we do to any
specific problem

3
Related Publications

ICCAD 2003 Fractional Cut Improved Recursive
Bisection Based Placement
A. Agnihotri, S. Ono, A. Khatkhate, A. Mathur, M.
C. Yildiz, P. H. Madden
ISPD 2004 Mixed Size Placement
A. Khatkhate, C. Li, A. R. Agnihotri, S. Ono, M.
C. Yildiz, C.-K. Koh, P. H. Madden
ICCAD 2004 White Space Allocation
C. Li, M. Xu, C.-K. Koh, J. Cong, P. H. Madden
ASPDAC 2005 Detail Placement by Branch and
Price
P. Ramachandran, A. Agnihotri, S. Ono, P.
Damodaran, H. Srihari, P. H. Madden
ASPDAC 2005 Buffer and Repeater Insertion
C. Li, C.-K. Koh, P. H. Madden
ASPDAC 2005 Optimality and Scalability Study
S. Ono, P. H. Madden

4
Lithography
Assorted pictures from Google
5
System Level PerspectiveImprovements in each
area multiply their effect

Architecture
CISC, RISC, mass-par, TransMeta
Synthesis
VHDL, Verilog, custom datapath
Floorplan
Better representations
Placement
Annealing, better partitioners, analytic,
Routing
Linsker cost functions, multi-commodity flow
Switch from channel to over-the-cell

Lithography/Fab
2x improvement every 18 months, copper, strained
silicon
Chip Packaging
Flip-chip, BGA, SIMM,
System Packaging
Quieter fans, better batteries
Other
Better software, new markets, colored plastic

6
nVidia chip designs....
Designs are getting larger -- and harder to
do. (Table from nVidia talk at ISPD01).
7
Power DensityWill Get Even Worse

Need to Keep the Junctions Cool
Performance (Higher Frequency)
Lower leakage (Exponential)
Better reliability (Exponential)

Pat Gelsinger, ISSCC 2001
8
The most serious problem? Interconnect.

The next few slides are taken from Desmond
Kirkpatrick and Prashant Saxena (Intel), from the
ISPD04 Chicken panel

9
Interconnect Chicken Littles (an Ostrich
Viewpoint)
Interconnects are dominating! Interconnects are
dominating!

Chicken Little a story of mass hysteria
An acorn hits a chick called Chicken Little on
the head
Chicken Little proclaims the sky is falling
the sky is falling and runs down the road.
Each character (Henny Penny, Goosy Loosy) who
hears her story runs behind her, propagating the
story to the next.

Interconnect-dominated
Interconnect-driven
1995 IEDM Bohr, Interconnect Scaling The Real
Limiter to High Performance VLSI
Deep-submicron
95
98
01
04
Rise of the Interconnect Chicken Little Era
10
Interconnect Ostriches (a Chicken Viewpoint)
Interconnects dont scare me Interconnects dont
scare me

Ostrich a symbol of deep denial
Despite being an aggressive 40lb bird
When frightened, an Ostrich is reputed to hide
its head in the sand

1999 ICCAD Ho/Horowitz, Interconnect Scaling
Implications for CAD - Sylvester/Keutzer used
average wires
1998 ICCAD Sylvester/Keutzer, Getting to the
Bottom of Deep Submicron -50k gates no
problem!
Copper / lowK promises
95
98
01
04
Interconnect Ostrich Malaise Era
11
Revenge of the Interconnect Chicken Littles?

Interconnect Buffering debate
Primarily a paper debate
Consensus global interconnect phenomenon
(e.g. Tau 99 Keutzer/Pillegi agree
floorplanning will need more work)
Recent scary predictions
Buffers will invade synthesis blocks
Meshes / fabrics / grids of buffers will replace
individual elements / stations

2002 ISPD Saxena, et al, The Scaling
Challenge Can Correct-by-Construction Design?
Help - 70 of cells will be interconnect buffers
at 32nm
95
98
01
04
Revenge of the Chicken Littles
12

Exploding buffer counts will break todays block
design paradigms
All realistic scaling projections encounter this
problem

13
What is to be done?

A partial solution minimize the length of the
interconnect.
Note I am firmly in the Chicken Little camp.

14
Fractional Cut Placement

By improving placement -- we strike at the root
of the interconnect problem
Method must be scalable methods that work well
on small problems are not relevant

15
The Placement Problem
Common approach make each gate rectangular, and
arrange them like bricks. The problem.... where
do you put each brick? (And how do you run the
wires?)
Tens of millions of tiny pieces of metal
Millions of gates
16
Leading Placement Methods

Force-Directed/Linear Programming
Simulated Annealing
Recursive Bisection
Split the logic into two groups minimize the
number of wires between the groups
Place one group in the top half of the chip, and
the other in the bottom half
Why use bisection? It scales better than the
other methods, and still gets good results.

17
Bisection Based Placement
Logic elements
Semiconductor chip
18
Recursive Bisection Placement
19
Non-Traditional Approach

In bisection, cut lines are placed between rows
Our idea--ignore row boundaries, and place cut
lines where the relative areas suggest
This is Fractional Cut bisection
Requires legalization to align cells with rows

20
Ignoring Row Boundaries
Row boundaries are in blue Black outlined
rectangles are the regions Numbers indicate
total cell areas (there may be a number of cells
in each region).
21
Experimental Results
22
Standard Cell Observations

Recursive Bisection
Fast, and very competative
More than 30 better than DAC98 best paper(!)
30 in 5 years is small compared to the 8X
improvement (or more) from Lithography
Different Benchmarks change results
PEKO ! IBM?
More on this later

23
This is the start of a solution.

But large designs are not just standard cells.

24
Boulders and Dust

To speed up the design process, pre-designed
blocks are integrated with standard cell logic.

25
Mixed Block Design
Hundreds of large blocks, millions of small
cells. Placement must deal with large size
differential. Also called boulders and dust
problem.
26
Recent previous work

Capo Parquet - ISPD 02, ICCAD 03
Shred macros, global placement.
Form groups of standard cells, run fixed outline
floor planner.
Fix macros, place standard cells.

mPG-ms - ASPDAC 03.
Coarsening - cluster macros and standard cells.
Refinement - large macros are fixed gradually
removing overlaps, carry on refinement on smaller
objects.

Objective in these works only HPWL minimization
27
Our approach

Global placement using Fractional Cut based
recursive bisection.
Greedy legalization.
Branch Bound reordering on standard cells.

28
Global placement

Fractional cut approach (ICCAD03)
Recursive bisection, but cut lines are not
restricted to row boundaries. Instead, use
legalization after bisection.
Key insight bisection can handle both standard
cells and macro blocks
Partition line is located based on total area of
each side block shapes are not considered
Multilevel clustering based partitioner (hMetis),
with multiple random starts
Large blocks have an opportunity to start on
either side of the partition we are not locked
in place

29
Example
30
Mixed Block Enhancement

Output of the Global Placer - rough distribution
of cells/macros across the core area.
Area constraints and fractional cut lines ensure
that distribution is even.
There is some overlap.
Cells and macro blocks are not row-aligned.

31
IBM01 before legalization
32
Placement legalization

Legalization is the stage where we remove
overlaps, align cells with rows, and cell
positions match the site widths in the circuit.
First (abandoned) approach -
Remove macro overlap using a recursive search
procedure
Cell legalization by dynamic programming based
method (similar to ICCAD03 paper).
Complex, many lines of code, and a great deal of
work. Also not very good.
Better method leverage the uniform area demand
to allow a less complex legalizer.

33
Greedy legalization

Sort cells/macros by left-edge locations.
Initialize right edge of each row.
For Each Object
Greedy assignment of an object to a row, for min
displacement.
If no overlap, leave at abstract placement X
position otherwise shift to the right to avoid
overlap
Macro placement must check multiple rows
Update right edge profiles for the rows across
which the object spans.
This method extends a prior standard cell
legalization method by Dwight Hill (US Patent
6,370,673)
Also used in Kahng/Wang APlace paper

34
Legalization
35
Experimental results

We tested Feng Shui 2.4 on the 18 IBM mixed block
benchmarks on the GSRC Bookshelf web site -
http//www.gigascale.org/bookshelf
Comparison with
CapoParquet (I, II) ISPD2002
mPG-MS ASPDAC2003
CapoParquet (III) ICCAD2003
All publications focus on HPWL minimization
timing and routing are not considered.

36
Mixed Block Placement
37
Experimental Results
As much as 51 better on some benchmarks.
Closest is around 8, for the design that doesnt
have macro blocks.
38
Upcoming ICCAD05 mixed block papers

Only one is able to improve on our results
APlace paper from Andrew Kahngs group -- and
they use our legalization method and detail
placer
Others are from 10 to 30 higher wire lengths
Our mixed block work is a major step forward, and
helps mitigate the growing interconnect problem.

39
Wire Length Minimization is good.

But what about routing wires?
These slides are from a talk to be presented by
Chen Li at ICCAD04
The paper is a collaboration between Purdue,
Binghamton, and UCLA

40
Motivation and previous work

Objective of placement tools wirelength and
routability
Routabiltiy control in global placement
Incorporating congestion into cost function
Cell movement
Routability control in detailed placement
Region expanding
White space allocation

41
Congestion Estimation

Congestion estimation
Routing resource estimation
based on width spacing of wires in layers
Routing demand estimation
decompose MST of net into two-pin connections
two-bend LZ routes for each two-pin connections
Congestion (overflow)

42
WSA White Space Allocation

Idea for routing demand-resource matching
Fractional cut
Cutline shifting
Flow of WSA
Slicing tree Construction
Congestion estimation on tree
White space adjustment
Detailed placer

43
Slicing Tree Construction
44
White Space Adjustment
Before cutline shifting
45
White Space Adjustment
Level 0
46
White Space Adjustment
Level 1
47
White Space Adjustment
Level 2 WSA finished
48
Detailed Placer

Objective maintain white space distribution and
further reduce HPWL
For example, DOMINO cannot be applied here
Greedy legalization
Remove overlaps
Sliding window-based local minimization
White space is considered as pseudo-cells

49
Experimental Setup

IBM v2 easy and hard benchmarks (16 circuits)
All publicly available placers
Dragon-fd congestion-driven mode
CAPO, Feng Shui, mPL
mPG congestion mode off, QPLACE ECO for
legalization
QPLACE
All placers are run 5 times except QPLACE, mPL
and our tools once.
WRoute (SE5.3) to evaluate routability

50
Experimental Results

Our flow vs. other tools

100 successful routings
8.124.5 reduction on routed WL
51
Experimental Results

Impacts of various techniques in out flow

Both techniques improve routability. Combined
flow work best.
Both techniques reduce routed WL
52
Experimental Results

WSA on placements generated by other tools

Improvement on routability except for QPLACE
1.18.0 reduction on routed WL compared to
original tools
53
Improved Routability

mPL-R
Routability-driven global placement reduces
routing demands through cell-replacement based on
accurate congestion estimation
WSA
Routability-driven detailed placement allocates
routing resources into congested regions
Successful routings on all easy hard IBM
benchmarks
shortest routed WL competitive placement
runtime

54
Wire Lengths are Reduced

But how far can we go?
There is an OPTIMAL solution--is there room left
for improvement?
PEKO benchmarks are constructed with a known
optimal solution
We use the placement tools on these to evaluate
if there is further room for gain

55
Experimental Results
56
Other places to reduce wire lengths?

Surprisingly large potential at the detail
placement level!

57
Traditional Detail Placement
A
B
C
A
C
B
B
A
C
B
C
A
C
A
B
C
B
A
Legalize the placement, then try permutations on
groups of cells in order to improve (wire length,
delay, congestion, .)
58
Branch-and-Bound results
Bigger window means better results, but longer
run time. About half of FS2.0 run time is in
detail placement.
59
Redefining Local

Placements are optimal wrt the locations of
groups of 6 or so
But what about bigger groups?
PEKO and Grid benchmarks show that placements are
not optimal
If we increase the window size, how much can we
get?
How to increase window size w/o getting hammered
on run time?

60
A Bit of FunGlobal vs. Detail Suboptimality
Improving placement results requires an
understanding of what happens during placement.
Method map each cell in the placement to a
pixel from an image. Rearrange the cells
according to optimal placement. What does this
mean? While theres suboptimality at the global
level, were losing a lot in detail placement.
61
Branch-and-Bound Run Times

2x2 window 4! 24 combinations
3x3 window 9! 362880 combinations
Runs in about 0.7 seconds on my PC
4x4 window 16! xxxx combinations
Around 1 year to find optimal
5x5 window 25! xxxx combinations
Multiple exansions and contractions of the
universe
With the method presented here
10x10 has been solved, and were expecting to be
able to do much larger (with some algorithmic
clean-up of the code)
Its not going to be cheap in terms of run
time, but it should be feasible to apply

62
Better Detail Placement

We know were suboptimal
But by how much? And what portion is from global
problems, how much from local?
Global placement, we have annealing, analytic,
recursive bisection. All heuristics.
Detail placement, we have enumeration/branch-and-b
ound, and some flow based methods. Optimal, but
small windows.
Strategy
Apply techniques from OR community
Solve local optimization problems with bigger
windows

63
Short Branch-and-Price Overview

Based on linear programming
Solve LP problem to find a lower bound solution
is not neccesarily integer
Use column generation to keep the problem size
manageable
Decompose the problem into master and subproblems
Branch on non-integers
Try 0 and 1 values, evaluate the lower bounds
Traverse the decision tree, jumping to the node
with the best lower bound
If we find an integer solution, and node with a
higher value (integer or not) can be pruned

64
Summary

The first few slides were make things sound very
bad.

65
But I think its an accurate picture

Absolutely essential
Consider physical constraints in the design
process
Making chips faster and cheaper is no longer easy
(it was never easy, but its very very hard now)

66
How to Survive the Future?

Short term absolutely minimize circuit
interconnect. Weve made progress here, but
theres a long way to go.
Long term If you figure it out, please let me
know!

67
(No Transcript)
68
Design Automation Challenges

Handle Moore's Curse
The problem doubles in size every 18 months.
Late to Market is a disaster
Come closer to human-design
Estimates are that automated design leaves about
7 years of technological advances on the table(!)
Shield the System Designers from the Device
Details
Timing, Power, Signal Integrity, ...
Designers have enough to worry about now

69
Things that Wont Work

Massive parallel computers
Neural network paradigm shift
Much of the current nano and quantum hype

70
Traditional Legalization

Align cuts with cell rows
After bisection--all cells are within row
boundaries
Sort cells by X position
Feng Shui 1.5 also packs cells to the left
For MCNC benchmarks, this works well
For IBM, Peko, not quite so good

71
After Bisection
72
Dynamic Programming Legalization

Process rows one at a time
For each row
Select a subset of cells such that the total
horizontal WL of the packed subset, plus the
penalty for the non-selected cells, is minimized
Simple DP formulation obtains good results

73
DP Solution

Suppose we have logic elements A, B, C, D, E, F
Assume they're all the same width
We have space for four of the six
Which ones do we put into the row.
To minimize the TOTAL distance things move?

74
Example
B
F
A
D
E
C
All the blocks are going to be packed to the
"left." The total distance things travel depends
on which blocks we choose.
75
Some Observations

If A is to the left of B before packing
It should still be to the left after packing
The distance that F travelled depends only on the
number of blocks to the left of it
We don't care which blocks to the left are
taken--only how many!

76
DP Matrix
Filling the blanks in the table is easy
Cost of moving E to position 3 plus the lowest
cost for filling to location 2 with blocks before
E
Cost of filling to location 3 using blocks before
E
77
Legalization
78
Legalization
79
Standard Cell Placement Tools

Other methods include
Capo (recursive bisection, U. Michigan)
Dragon (simulated annealing, UCLA)
Kraftwerk (linear programming, TU Munich)
mPL (multilevel slot-based, UCLA)
.
Objective is to minimize total wire length
Benchmark circuits derived from IBM designs

Write a Comment

User Comments (0)

About PowerShow.com

Next Generation VLSI Circuits: Physical Design Issues - PowerPoint PPT Presentation

Next Generation VLSI Circuits: Physical Design Issues

There's the 'big picture' related to what we do as a research community, and as ... VHDL, Verilog, custom datapath. Floorplan. Better representations. Placement ... – PowerPoint PPT presentation