VLSI Placement (I)

About This Presentation

Title:

VLSI Placement (I)

Description:

VLSI Placement (I) Prof. Lei He Http://eda.ee.ucla.edu Thanks to Chis Chu, Jason Cong, Paul Villarubia and David Pan for contributions to s – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 66

Provided by: David1178

Learn more at: http://eda.ee.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: VLSI Placement (I)

1
VLSI Placement (I)

Prof. Lei He
Http//eda.ee.ucla.edu

Thanks to Chis Chu, Jason Cong, Paul Villarubia
and David Pan for contributions to slides
2
Problem formulation

Input
Blocks (standard cells and macros) B1, ... , Bn
Shapes and Pin Positions for each block Bi
Nets N1, ... , Nm
Output
Coordinates (xi , yi ) for block Bi.
The total wire length is minimized.
The area of the resulting block is minimized or
given a fixed die
Other consideration timing, routability, clock,
buffering and interaction with physical synthesis

3
Placement can Make A Difference

MCNC Benchmark circuit e64 (contains 230 4-LUT).
Placed to a FPGA.

Random Initial Placement
Final Placement
After Detailed Routing
4
Importance of Placement

Placement is a fundamental problem for physical
design
Glue of the physical synthesis
Becomes very active again in recent years
9 new academic placers for WL min. since 2000
Many other publications to handle timing,
routability, etc.
Reasons
Serious interconnect issues (delay, routability,
noise) in deep-submicron design
Placement determines interconnect to the first
order
Need placement information even in early design
stages (e.g., logic synthesis)
Need to have a good placement solution
Placement problem becomes significantly larger
Cong et al. ASPDAC-03, ISPD-03, ICCAD-03 point
out that existing placers are far from optimal,
not scalable, and not stable

5
Placement Topic in Context

Note that this course is on selected research
topics, so the way we cover placement is at
pretty high level, with some technical details
More fundamentals about placement will be covered
in details, at a core physical design course as
CS258F
Or a new core physical design course may be
offered next year as EE298 (depending on faculty
recruiting)

6
Benchmarking for Large-Scale Placement and Beyond
ISPD-2003

S. N. Adya, M. C. Yildiz, I. L. Markov,
P. G. Villarrubia, P. N. Parakh, P. H. Madden

7
Design Types

ASICs
Lots of fixed I/Os, few macros, millions of
standard cells
Placement densities 40-80 (IBM)
Flat and hierarchical designs
SoCs
Many more macro blocks, cores
Datapaths control logic
Can have very low placement densities lt 20
Micro-Processor (?P) Random Logic Macros(RLM)
Hierarchical partitions are placement instances
(5-30K)
High placement densities 80-98 (low
whitespace)
Many fixed I/Os, relatively few standard cells
Recall Partitioning w Terminals DAC99, ISPD
99, ASPDAC00

8
Requirements for Placers

Must handle 4-10M cells, 1000s macros
64 bits near-linear asymptotic complexity
Scalable/compact design database (OpenAccess)
Accept fixed ports/pads/pins fixed cells
Place macros, esp. with var. aspect ratios
Non-trivial heights and widths(e.g.,
height2rows)
Honor targets and limits for net length
Respect floorplan constraints
Handle a wide range of placement densities(from
lt25 to 100 occupied), ICCAD 02

Placement Footprints

Standard Cell
Data Path
IP - Floorplanning
10
Placement Footprints
Reserved areas
Mixed Data Path sea of gates
11
Placement Footprints
Perimeter IO
Area IO
12
Unconstrained Placement
13
Floor planned Placement
14
VLSI Global Placement Examples
bad placement
good placement
15
Major Placement Techniques

Simulated Annealing
Timberwolf package JSSC-85, DAC-86
Dragon ICCAD-00
Partitioning-Based Placement
Capo DAC-00
Fengshui DAC-2001
Analytical Placement
Gordian TCAD-91
Kraftwerk DAC-98
FastPlace ISPD-04
Halls Quadratic Placement
Genetic Algorithm

16
Outline

Wire length driven placement
Main methods
Simulated Annealing
Gate-Array Timberwolf package
Standard-Cell Timberwolf package, Dragon
Partition-based methods
Analytical methods
Timing and congestion consideration
Newer trends

17
Simulated Annealing Based Placement
( I ) The Timberwolf Placement and Routing
Package, Sechen, Sangiovanni IEEE Journal of
Solid-State Circuits, vol SC-20, No. 2(1985)
510-522 Timber wolf 3.2 A New Standard Cell
Placement and Global Routing Package Sechen,
Sangiovanni, 23rd DAC, 1986, 432-439

Timber wolf
Stage 1
Modules are moved between different rows as well
as within the same row
modules overlaps are allowed
when the temperature is reduced below a certain
value, stage 2 begins
Stage 2
Remove overlaps
Annealing process continues, but only
interchanges adjacent modules within the same row

18
Solution Space
All possible arrangements of modules into rows
possibly with overlaps
19
Neighboring Solutions
Three types of moves
M1 Displace a module to a new location
M2 Interchange two
modules
M3 Change the orientation of a module
Axis of reflections
1 2
2 1
1 2
3 4
3 4
3 4
20
Move Selection

Timber wolf first try to select a move betwee M1
and M2
Prob(M1)4/5
Prob(M2)1/5
If a move of type M1 is chosen ( for certain
module) and it is rejected, then a move of type
M3 (for the same module) will be chosen with
probability 1/10
Restriction on
How far a module can be displaced
What pairs of modules can be interchanged

M1 Displacement M2 Interchange M3 Reflection
21
Move Restriction

Range Limiter
At the beginning, R is very large, big enough to
contain the whole chip
Window size shrinks slowly as the temperature
decreases. In fact, height and width of R ?
log(T)
Stage 2 begins when window size are so small that
no inter-row modules interchanges are possible

Rectangular window R
22
Cost Function
net i
Y C1C2C3
hi
å
b
w
a

)
(

h
C
wi
1
i
i
i
i
i
ai, bi are horizontal and vertical weights,
respectively ai 1, bi 1 ?1/2 perimeter of
bounding box

Critical nets Increase both ai and bi
Double metal technology Over-the-cell routing is
possible. Fewer feed through cells are needed
?vertical wirings are cheaper than horizontal
wirings . use smaller vertical weights i.e. bilt
ai

23
Cost Function (Contd)
C2 Penalty function for module overlaps
O(i,j) amount of overlaps in the X-dimension
between modules i and j a
offset parameter to ensure C2 ? 0 when T ? 0
(
)
å
2

a
j
i
O
C
)
,
(
2
¹
j
i
C3 Penalty function that controls the row
lengths Desired row length d( r ) l(
r ) sum of the widths of the modules in row r
å
-

b
r
d
r
l
C
)
(
)
(
3
r
24
Annealing Schedule

Tk r(k)T k-1 k 1, 2, 3, .
r(k) increase from 0.8 to max value 0.94 and then
decrease to 0.1
At each temperature, a total number of Kn
attempts is made
n number of modules
K user specified constant

25
Dragon2000 Standard-Cell Placement Tool for
Large Industry Circuits

M. Wang, X. Yang, and M. Sarrafzadeh,
ICCAD-2000
pages 260-263

26
Main Idea

Simulated annealing based
1.9x faster than iTools 1.4.0 (commerical version
of TimberWolf)
Comparable wirelength to iTools (i.e., very good)
Performs better for larger circuits
Still very slow compared with than other
approaches
Also shown to have good routability
Top-down hierarchical approach
hMetis to recursively quadrisect into 4h bins at
level h
Swapping of bins at each level by SA to minimize
WL
Terminates when each bin contains lt 7 cells
Then swap single cells locally to further
minimize WL
Detailed placement is done by greedy algorithm

27
Outline

Wire length driven placement
Main methods
Simulated Annealing
Gate-Array Timberwolf package
Standard-Cell Timberwolf package, Grover, Dragon
Partition-based methods
Analytical methods
Timing and congestion consideration
Newer trends

28
Partition based methods

Partitioning methods
FM
Multilevel techniques, e.g., hMetis
Two academic open source placement tools
Capo (UCLA/UCSD/Michigan) multilevel FM
Feng-shui (SUNY Binghamton) use hMetis
Pros and cons
Fast
Not stable

29
Partitioning-based Approach

Try to group closely connected modules together.
Repeatly divide a circuit into subcircuits such
that the cut value is minimized.
Also, the placement region is partitioned (by
cutlines) accordingly.
Each subcircuit is assigned to one partition of
the placement region.
Note Also called min-cut placement approach.

30
An Example
Cutline
Circuit
Placement
31
Variations

There are many variations in the
partitioning-based approach. They are different
in
The objective function used.
The partitioning algorithm used.
The selection of cutlines.

Partitioning

Objective
Given a set of interconnected blocks, produce two
sets that are of equal size, and such that the
number of nets connecting the two sets is
minimized.
33

FM Partitioning

Initial Random Placement
list_of_sets entire_chip while(any_set_has_2_or
_more_objects(list_of_sets)) for_each_set_in(lis
t_of_sets) partition_it() / each time
through this loop the number of / / sets in
the list doubles.
/
After Cut 1
After Cut 2
34

FM Partitioning

Moves are made based on object gain.
Object Gain The amount of change in cut
crossings that will occur
if an object is moved from
its current partition into the other partition
-1
2
0
- each object is assigned a gain - objects are
put into a sorted gain list - the object with
the highest gain from the smaller of the two
sides is selected and moved. - the moved object
is "locked" - gains of "touched" objects are
recomputed - gain lists are resorted
0
-1
0
-2
0
0
-2
-1
1
-1
1
35
FM Partitioning
-1
2
0
0
-1
0
-2
0
0
-2
-1
1
-1
1
36
-1
-2
-2
0
-1
-2
-2
0
0
-2
-1
1
-1
1
37
-1
-2
-2
0
-1
-2
-2
0
0
-2
-1
1
1
-1
38
-1
-2
-2
0
-1
-2
-2
0
0
-2
-1
1
1
-1
39
-1
-2
-2
0
-1
-2
-2
0
-2
-2
1
-1
-1
-1
40
-1
-2
-2
-1
-2
0
-2
0
-2
-2
1
-1
-1
-1
41
-1
-2
-2
-1
-2
-2
0
0
-2
-2
1
-1
-1
-1
42
-1
-2
-2
1
-2
-2
0
-2
-2
-2
1
-1
-1
-1
43
-1
-2
-2
1
-2
-2
0
-2
-2
-2
1
-1
-1
-1
44
-1
-2
-2
1
-2
-2
0
-2
-2
1
-2
-1
-1
-1
45
-1
-2
-2
1
-2
-2
0
-1
-2
-2
-2
-3
-1
-1
46
-1
-2
-2
1
-2
-2
0
-1
-2
-2
-2
-3
-1
-1
47
-1
-2
-2
1
-2
-2
0
-1
-2
-2
-2
-3
-1
-1
48
-1
-2
-2
-1
-2
-2
-2
-1
-2
-2
-2
-3
-1
-1
49
Breuers Cutline Selection Schemes

M.A. Breuer, Min-Cut Placement, J.
Design Automation and Fault-Tolerant Computing
1(4)343-382, Oct. 1977.
M.A. Breuer, A Class of Min-Cut Placement
Algorithms, DAC 1977,
pages 284-290.

50
of Nets Across a Cutline

For any cutline c, let v(c) be the total number
of nets cut by c.
v(c) gives a lower bound on the number of tracks
along cutline c.
Useful in standard-cell or gate-array layout.

Cutline c
v(c) 2
51
Three Objective Functions

Total Net-Cut Min. Sall cutline c v(c)
Equivalent to min. total half-perimeter wire
length.
Min. Max Cut Value Min. Max.all cutline c v(c)
Minimizing channel widths of standard-cell or
gate-array placement.
Sequential Cutline Consider cutlines
sequentially, minimize cut value with respect to
constraints imposed by previous cuts.

52
Two Cutline Styles

Cut Oriented Min-Cut Placement
Use the same cutlines for all sub-regions.
Realize the sequential objective function.
Block Oriented Min-Cut Placement
Different sub-regions have seperate cutlines.
More flexible.

1
2
2a
1
2b
53
3 Cutline Selection Schemes

Suppose we want to partition as follows
Question Which cutline to use first?
3 Cutline Selection Schemes
Quadratic Placement Procedure.
Bisection Placement Procedure.
Slice/Bisection.

54
Quadrature Placement Procedure
3a
1
3b
4a
2
4b

Very suitable for circuits with high routing
density in the centre.

55
Bisection Placement Procedure
3a
2a
3b
1
3c
2b
3d
5a
4
5b
6a
6b
6c
6d

Good for standard-cell placement.

56
Slice/Bisection Procedure
1
2
3
4
5
6
7
9a
8
9b
10a
10b
10c
10d

Most suitable when there is a high interconnect
density at the periphery.

57
Terminal Propagation Algorithm by Dunlop and
Kernighan

A Procedure for Placement of
Standard-Cell VLSI Circuits,
TCAD, 4(1)92-98, Jan. 1985.

58
Problem of Partitioning Subcircuits
A
B
B
B
A
A
Cost of these 2 partitionings are not the same.
59
Terminal Propagation

Need to consider nets connecting to external
terminals or other modules as well.
Do partitioning in a breath-first manner (i.e.,
finish all higher-level partitioning first).

The Dummy Terminal will try to pull B to the top
partition.
Dummy Terminal
A
A
B
A
B
B
60
Dunlop and Kernighans Algorithm

Block-Oriented Quadrature Placement.
Partition until each region contains 6 cells.
Need to assign cells to rows after partitioning.

Row 1
1
1
1,2
1,2
Row 2
2
2
2
2
Row 3
3,4
3,4
3
3
Row 4
4
4
4,3
4,3
61
Can Recursive Bisection Alone Produce Routable
Placement?(Name of placer Capo)

Andrew Caldwell, Andrew Kahng, and Igor Markov
DAC-2000

62
Capo Overview

Standard cell placement, Fixed-die context
Pure recursive bisectioning placer
Several minor techniques to produce good
bisections
Produce good results mainly because
Improvement in mincut bisection using multi-level
idea in the past few years
Pay attention to details in implementation
Implementation with good interface (LEF/DEF and
GSRC bookshelf) available on web

63
Capo Approach

Recursive bisection framework
Multi-level FM for instances with gt200 cells
Flat FM for instances with 35-200 cells
Branch-and-bound for instances with lt35 cells
Careful handling partitioning tolerance
Uncorking Prevent large cells from blocking
smaller cells to move
Repartitioning Several FM calls with decreasing
tolerance
Block splitting heuristics Higher tolerance for
vertical cut
Hierarchical tolerance computation Instance with
more whitespace can have a bigger partitioning
tolerance

Partitioning

Pros - very fast - great quality - scales nearly
linearly with problem size Cons - non-trivial
to implement - very directed algorithm, but this
limits the ability to deal with miscellaneous
constraints
65
Summary for Partition Based Placement

Improvement in mincut partitioning are conducive
to better wirelength and congestion
Routable placements can be produced in most cases
without explicit congestion management
Explicit congestion control may still be useful
in some cases
Better weighted wirelength often implies better
routed wirelength, but not always

Write a Comment

User Comments (0)

About PowerShow.com

VLSI Placement (I) - PowerPoint PPT Presentation

VLSI Placement (I)

VLSI Placement (I) Prof. Lei He Http://eda.ee.ucla.edu Thanks to Chis Chu, Jason Cong, Paul Villarubia and David Pan for contributions to s – PowerPoint PPT presentation