Title: ECE260B - CSE241A VLSI Digital Circuits
1ECE260B CSE241A Winter 2005Placement
Website http//vlsicad.ucsd.edu/courses/ece260b
-w05
Slides courtesy of Prof. Andrew B. Kahng
2VLSI Design Flow and Physical Design Stage
3Placement Problem
- Input
- A set of cells and their complete information (a
cell library). - Connectivity information between cells (netlist
information). - Output
- A set of locations on the chip one location for
each cell. - Goal
- The cells are placed to produce a routable chip
that meets timing - and other constraints (e.g., low-power, noise,
etc.) - Challenge
- The number of cells in a design is very large (gt
1 million). - The timing constraints are very tight.
4Optimal Relative Order
A
B
C
5To spread ...
A
B
C
6.. or not to spread
A
B
C
7Place to the left
8 or to the right
9Optimal Relative Order
A
B
C
Without free space, the placement problem is
dominated by order
10Placement Problem
11Global and Detailed Placement
In global placement, we decide the approximate
locations for cells by placing cells in global
bins. In detailed placement, we make some
local adjustment to obtain the final
non-overlapping placement.
12Standard Cell
Data Path
IP - Floorplanning
13Placement Footprints
Reserved areas
Mixed Data Path sea of gates
14Placement Footprints
Perimeter IO
Area IO
15Placement objectives are subject to user
constraints / design style
- Hierarchical Design Constraints
- pin location
- power rail
- reserved layers
- Flat Design with Floorplan Constraints
- Fixed Circuits
- I/O Connections
16Standard Cells
17Standard Cells
- Power connected by abutment, placed in
sea-of-rows - Rarely rotated
- DRC clean in any combination
- Circuit clean (I.e. no naked T-gates, no huge
input capacitances) - 8,9,10 tracks in height
- Metal 1 only used (hopefully)
- Multi-height stdcells possible
- Buffers sizes, intrinsic delay steps, optimal
repeater selection - Special clock buffers gates (balanced PN)
- Special metastability hardened flops
- Cap cells (metal1 used?)
- Gap fillers (metal1 used?)
- Tie-high, tie-low
18Unconstrained Placement
19Floor planned Placement
20Placement Cube (4D)
- Cost Function(s) to be used
- Cut, wirelength, congestion, crossing, ...
- Algorithm(s) to be used
- FM, Quadratic, annealing, .
- Granularity of the netlist
- Coarseness of the layout domain
- 2x2, 4x4, .
- An effective methodology picks the right mix from
the above and knows when to switch from one to
next. - Most methods today are ad-hoc
21Advantages of Hierarchy
- Design is carved into smaller pieces that can be
worked on in parallel (improved throughput) - A known floor plan provides the logic design team
with a large degree of placement control. - A known floor plan provided early knowledge of
long wires - Timing closure problems can be addressed by
tools, logic design, and hierarchy manipulation - Late design changes can be done with minimal
turmoil to the entire design
22Disadvantages of Hierarchy
- Results depend on the quality of the hierarchy.
The logic hierarchy must be designed with
Physical Design taken into account. - Additional methodology requirements must be met
to enable hierarchy. Ex. Pin assignment, Macro
abstract management, area budgeting, floor
planning, timing budgets, etc - Late design changes may affect multiple
components. - Hierarchy allows divergent methodologies
- Hierarchy hinders Design Automation algorithms.
They can no longer perform global optimizations.
23Traditional Placement Algorithms
- Quadratic Placement
- Simulated Annealing
- Bi-Partitioning / Quadrisection
- Force Directed Placement
- Hybrid
24Quadratic Placement
Min (x1-x3)2 (x1-x2)2 (x2-x4)2 F
x3
x1
dF/dx1 0 dF/dx2 0
Ax B
x2
x4
2 -1 -1 2
A
x3 x4
x1 x2
25Analytical Placement
- Get a solution with lots of overlap
- What do we do with the overlap?
26Pros and Cons of QP
- Pros
- Very Fast Analytical Solution
- Can Handle Large Design Sizes
- Can be Used as an Initial Seed Placement Engine
- Cons
- Can Generate Overlapped Solutions Postprocessing
Needed - Not Suitable for Timing Driven Placement
- Not Suitable for Simultaneous Optimization of
Other Aspects of Physical Design (clocks,
crosstalk) - Gives Trivial Solutions without Pads (and close
to trivial with pads)
27Simulated Annealing Placement
- Initial Placement Improved through
- Swaps and Moves
- Accept a Swap/Move if it improves cost
- Accept a Swap/Move that degrades cost
- under some probability conditions
Cost
Time
28Pros and Cons of SA
- Pros
- Can Reach Globally Optimal Solution (given
enough time) - Open Cost Function.
- Can Optimize Simultaneously all Aspects of
Physical Design - Can be Used for End Case Placement
- Cons
- Extremely Slow Process of Reaching a Good Solution
29Bi-Partitioning/Quadrisection
30Pros and Cons of Partitioning Based Placement
- Pros
- More Suitable to Timing Driven Placement since it
is Move Based - New Innovation (hMetis) in Partitioning
Algorithms have made this Extremely Fast - Open Cost Function
- Move Based means Simultaneous Optimization of all
Design Aspects Possible - Cons
- Not Well Understood
- Lots of indifferent moves
- May not work well with some cost functions.
31Hypergraphs in VLSI CAD
- Circuit netlist represented by hypergraph
32Hypergraph Partitioning in VLSI
- Variants
- directed/undirected hypergraphs
- weighted/unweighted vertices, edges
- constraints, objectives,
- Human-designed instances
- Benchmarks
- up to 4,000,000 vertices
- sparse (vertex degree 4, hyperedge size 4)
- small number of very large hyperedges
- Efficiency, flexibility KL-FM style preferred
33Context Top-Down VLSI Placement
etc
34Context Top-Down Placement
- Speed
- 6,000 cells/minute to final detailed placement
- partitioning used only in top-down global
placement - implied partitioning runtime 1 second for
25,000 cells, lt 30 seconds for 750,000 cells - Structure
- tight balance constraint on total cell areas in
partitions - widely varying cell areas
- fixed terminals (pads, terminal propagation, etc.)
35Fiduccia-Mattheyses (FM) Approach
- Pass
- start with all vertices free to move (unlocked)
- label each possible move with immediate change in
cost that it causes (gain) - iteratively select and execute a move with
highest gain, lock the moving vertex (i.e.,
cannot move again during the pass), and update
affected gains - best solution seen during the pass is adopted as
starting solution for next pass - FM
- start with some initial solution
- perform passes until a pass fails to improve
solution quality
36Cut During One Pass (Bipartitioning)
Cut
Moves
37Multilevel Partitioning
Refinement
Clustering
38Force Directed Placement
- Cells are dragged by forces.
- Forces are generated by nets connecting cells.
Longer nets generate bigger forces. - Placement is obtained by either a constructive or
an iterative method.
Fij
i
i
j
39Pros and Cons of Force Directed Placement
- Pros
- Very Fast Analytical Solution
- Can Handle Large Design Sizes
- Can be Used as an Initial Seed Placement Engine
- The Force
- Cons
- Not sensitive to the non-overlapping constraints
- Gives Trivial Solutions without Pads
- Not Suitable for Timing Driven Placement
40Hybrid Placement
- Mix-matching different placement algorithms
- Effective algorithms are always hybrid
41GORDIAN (quadratic partitioning)
InitialPlacement
Partitionand Replace
42Congestion Minimization
- Traditional placement problem is to minimize
interconnection length (wirelength) - A valid placement has to be routable
- Congestion is important because it represents
routability (lower congestion implies better
routability) - There is not yet enough research work on the
congestion minimization problem
43Definition of Congestion
Routing demand 3 Assume routing supply is
1, overflow 3 - 1 2 on this edge.
Overflow on each edge
Routing Demand - Routing Supply (if Routing
Demand gt Routing Supply) 0 (otherwise)
Overflow overflow
S
all edges
44Correlation between Wirelength and Congestion
45Wirelength ? Congestion
A wirelength minimized placement
A congestion minimized placement
46Congestion Map of a Wirelength Minimized Placement
47Congestion MAP
48Congestion Reduction Postprocessing
Reduce congestion globally by minimizing the
traditional wirelength
Post process the wirelength optimized placement
using the congestion objective
49Congestion Reduction Postprocessing
- Among a variety of cost functions and methods for
congestion minimization, wirelength alone
followed by a post processing congestion
minimization works the best and is one of the
fastest. - Cost functions such as a hybrid length plus
congestion do not work very well.
50Cost Functions for Placement
- The final goal of placement is to achieve
routability and meet timing constraints - Constraints are very hard to use in optimization,
thus we use cost functions (e.g., Wirelength) to
predict our goals. - We will show what happens when you try
constraints directly - The main challenge is a technical understanding
of various cost functions and their interaction.
51Prediction
- What is prediction ?
- every system has some critical cost functions
Area, wirelength, congestion, timing etc. - Prediction aims at estimating values of these
cost functions without having to go through the
time-consuming process of full construction. - Allows quick space exploration, localizes the
search - For example
- statistical wire-load models
- Wirelength in placement
52Paradigms of Prediction
- Two fundamental paradigms
- statistical prediction
- of two-terminal nets in all designs
- of two-terminal nets with length greater than 10
in all designs - constructive prediction
- of two-terminal nets with length greater than 10
in this design - and everything in between, e.g.,
- of critical two-terminal nets in a design based
on statistical data and a quick inspection of the
design in hand. - Absolute truth or I need it to make progress
- SLIP (System Level Interconnect Prediction)
community.
53Cost Functions for Placement
- Net-cut
- Linear wirelength
- Quadratic wirelength
- Congestion
- Timing
- Coupling
- Other performance related cost functions
- Undiscovered crossing
54Net-cut Cost for Global Placement
- The net-cut cost is defined as the number of
external nets between different global bins - Minimizing net-cut in global placement tends to
put highly connected cells close to each other.
55Linear Wirelength Cost
The linear length of a net between cell 1 and
cell 2 is l12 x1-x2 y1-y2 The linear
wirelength cost is the summation of the linear
length of all nets.
56Quadratic Wirelength Cost
The quadratic length of a net between cell 1 and
cell 2 is l12 (x1-x2)2 (y1-y2)2 The
quadratic wirelength cost is the summation of the
quadratic length of all nets.
57Congestion Cost
Routing demand 3 Assume routing supply is
1, overflow 3 - 1 2 on this edge.
Overflow on each edge
58Cost Functions for Placement
- Various cost functions (and a mix of them) have
been used in practice to model/estimate
routability and timing - We have a good feel for what each cost function
is capable of doing - We need to understand the interaction among cost
functions
59Congestion Minimization and Congestion vs
Wirelength
- Congestion is important because it closely
represents routability (especially at
lower-levels of granularity) - Congestion is not well understood
- Ad-hoc techniques have been kind-of working since
congestion has never been severe - It has been observed that length minimization
tends to reduce congestion. - Goal Reduce congestion in placement (willing to
sacrifice wirelength a little bit).
60Correlation between Wirelength and Congestion
Total Wirelength Total Routing Demand
61Wirelength ? Congestion
A wirelength minimized placement
A congestion minimized placement
62Congestion Map of a Wirelength Minimized Placement
63Different Routing Models for modeling congestion
- Bounding box router fast but inaccurate.
- Real router accurate but slow.
- A bounding box router can be used in placement if
it produces correlated routing results with the
real router. - Note For different cost functions, answer might
be different (e.g., for coupling, only a detailed
router can answer).
64Different Routing Models
A MSTshortest_path routing model
A bounding box routing model
65Objective Functions Used in Congestion
Minimization
- WL Standard total wirelength objective.
- Ovrflw Total overflow in a placement (a direct
congestion cost). - Hybrid (1- a)WL a Ovrflw
- QL A quadratic plus linear objective.
- LQ A linear plus quadratic objective.
- LkAhd A modified overflow cost.
- (1- aT)WL aT Ovrflw A time changing hybrid
objective which let the cost function gradually
change from wirelength to overflow as
optimization proceeds.
66Post Processing to Reduce Congestion
Reduce congestion globally by minimizing the
traditional wirelength
Post process the wirelength optimized placement
using the congestion objective
67Post Processing Heuristics
- Greedy cell-centric algorithm Greedily move
cells around and greedily accept moves. - Flow-based cell-centric algorithm Use a
flow-based approach to move cells. - Net-centric algorithm Move nets with bigger
contributions to the congestion first.
68Greedy Cell-centric Heuristic
69Flow-based Cell-centric Heuristic
Bin Nodes
Cell Nodes
70Net-centric Heuristic
2
2
2
1
1
1
2
71From Global Placement to Detailed Placement
Global Placement Assuming all the cells are
placed at the centers of global bins.
Detailed Placement Cells are placed without
overlapping.
72Correlation Between Global and Detailed Placement
Conclusion Congestion at detailed placement
level is correlated with congestion at global
placement level. Thus reducing congestion in
global placement helps reduce congestion in final
detailed placement.
- WLg Wirelength optimized global placement.
- CONg Wirelength optimized detailed placement.
- WLd Congestion optimized global placement.
- CONd Congestion optimized detailed placement.
73Congestion
- Wirelength minimization can minimize congestion
globally. A post processing congestion
minimization following wirelength minimization
works the best to reduce congestion in placement. - A number of congestion-related cost functions
were tested, including a hybrid length plus
congestion (commonly believed to be very
effective). Experiments prove that they do not
work very well. - Net-centric post processing techniques are very
effective to minimize congestion. - Congestion at the global placement level,
correlates well with congestion of detailed
placement.
74Shapes of Cost Functions
net-cut cost
wirelength
congestion
Solution Space
75Relationships Between the Three Cost Functions
- The net-cut objective function is more smooth
than the wirelength objective function - The wirelength objective function is more smooth
than the congestion objective function - Local minimas of these three objectives are in
the same neighborhood.
76Crossing A routability estimator?
- Replace each crossing with a gate
- A planar netlist
- Easy to place
77Timing Cost
Critical Path
- Delay of the circuit is defined as the longest
delay among all possible paths from primary
inputs to primary outputs. - Interconnection delay becomes more and more
important in deep sub-micron regime.
78Timing Analysis
How do we get the delay numbers on the
gate/interconnect?
79Approaches
- Budgeting
- In accurate information
- Fast
- Path Analysis
- Most accurate information
- Very slow
- Path analysis with infrequent path substitution
- Somewhere in between
80Timing Metrics
- How do we assess the change in a delay due to a
potential move during physical design? - Whether it is channel routing or area routing,
the problem is the same - translate geometrical change into delay change
81Others costs Coupling Cost
- Hard to model during placement
- Can run a global router in the middle of
placement - Even at the global routing level it is hard to
model it
Avoid it
82Coupling Solutions
- Once we have some metrics for coupling, we can
calculate sensitivities, and optimize the
physical design...
83Other Performance Costs
- Power usage of the chip.
- Weighted nets
- Dual voltages (severe constraint on placement)
- Very little known about these cost functions and
their interaction with other cost functions - Fundamental research is needed to shed some light
on the structure of them
84Netlist Granularity Problem Size and Solution
Space Size
- The most challenging part of the placement
problem is to solve a huge system within given
amount of time - We need to effectively reduce the size of the
solution space and/or reduce the problem size - Netlist clustering Edge extraction in the
netlist
85Layout Coarsening
- Reduce Solution Space
- Edge extraction in the solution space
- Only simple things have been tried
- GP, DP (Twolf)
- 2x1, 2x2, .
- Coarsen only easy parts
86Incremental Placement
- Given an optimal placement for a given netlist,
how to construct optimal placements for netlists
modified from the given netlist. - Very little research in this area.
- Different type of incremental changes (in one
region, or all over) - Methods to use
- How global should the method be
- An extremely important problem.
87Incremental Placement
- A placement move changes the interconnect
capacitance and resistance of the associated net - A net topology approximation is required to
estimate these changes
88Placynthesis Algorithms
buffering
resizing
restructuring
89Many other Design MetricsPower Supply and Total
Power
Source The Incredible Shrinking Transistor,
Yuan Taur, T. J. Watson Research Center, IBM,
IEEE Spectrum, July 1999
90Dual Voltages A harder problem
- Layout synthesis with dual voltages major
geometric constraints
VL
VH
VH
GND
feedthrough
VL
H
L
OUT
IN
H
L
? ? ?
GND
H -- High Voltage Block L -- Low Voltage Block
Cell Library with Dual Power Rails
Layout Structure
91Placement References
- C. J. Alpert, T. Chan, D. J.-H. Huang, I. Markov,
and K. Yan, Quadratic Placement Revisited,Proc.
34th IEEE/ACM Design Automation Conference, 1997,
pp. 752-757 - C. J. Alpert, J.-H Huang, and A. B. Kahng,
Multilevel Circuit Partitioning, Proc. 34th
IEEE/ACM Design Automation Conference, 1997, pp.
530-533 - U. Brenner, and A. Rohe, An Effective Congestion
Driven Placement Framework, International
Symposium on Physical Design 2002, pp. 6-11 - A. E. Caldwell, A. B. Kahng, and I.L. Markov,
Can Recursive Bisection Alone Produce Routable
Placements,Proc. 37th IEEE/ACM Design Automation
Conference, 2000, pp 477-482 - M.A. Breuer, Min-Cut Placement, J. Design
Automation and Fault Tolerant Computing, I(4),
1997, pp 343-362 - J. Vygen, Algorithms for Large-Scale Flat
Placement, Proc. 34th IEEE/ACM Design Automation
Conference, 1988,pp 746-751 - H. Eisenmann and F. M. Johannes, Generic Global
Placement and Floorplanning, Proc. 35th IEEE/ACM
Design Automation Conference, 1998, pp. 269-274 - S.-L. Ou and M. Pedram, Timing Driven Placement
Based on Partitioning with Dynamic Cut-Net
Control, Proc. 37th IEEE/ACM Design Automation
Conference, 2000, pp. 472-476 - C.M. Fiduccia and R.M. Mattheyses, A linear time
heuristic for improving network partitions, Proc.
ACM/IEEE Design Automation Conference. (1982) pp.
175 - 181.