Title: Partitioning and Clustering
1Partitioning and Clustering
- Professor Lei He
- lhe_at_ee.ucla.edu
- http//eda.ee.ucla.edu/
2Outline
- Circuit Partitioning formulation
- Importance of Circuit Partitioning
- Partitioning Algorithms
- Circuit Clustering Formulation
- Clustering Algorithms
3Partitioning Formulation
4A Bi-Partitioning Example
4
Min-cut size13 Min-Bisection size
300 Min-ratio-cut size 19
Ratio-cut helps to identify natural clusters
5Circuit Partitioning Formulation (Contd)
6Importance of Circuit Partitioning
- Divide-and- conquer methodology
- The most effective way to solve problems of high
complexity - E.g. min-cut based placement, partitioning-based
test generation, - System-level partitioning for multi-chip designs
or 3D - inter-chip interconnection delay dominates
system performance - inter-layer wire pitch is much larger
- Circuit emulation/parallel simulation
- partition large circuit into multiple FPGAs
(e.g. Quickturn), or multiple special-purpose
processors (e.g. Zycad). - Parallel CAD development
- Task decomposition and load
7Partitioning Algorithms
- Iterative partitioning algorithms
- Multi-way partitioning
- Multi-level partitioning (to be discussed after
clustering)
8Iterative Partitioning Algorithms
- Greedy Iterative improvement method
- Kernighan-Lin 1970
- Fiduccia-Mattheyses 1982
- krishnamurthy 1984
- Simulated Annealing
- Kirkpartrick-Gelatt-Vecchi 1983
- Greene-Supowit 1984
- (SA will be formally introduced in the Floorplan
chapter)
9Kernighan-Lins Algorithm
- Pair-wise exchange of nodes to reduce cut size
- Allow cut size to increase temporarily within a
pass - Compute the gain of a swap
- Repeat
- Perform a feasible swap of max gain
- Mark swapped nodes locked
- Update swap gains
- Until no feasible swap
- Find max prefix partial sum in gain sequence g1,
g2, , gm - Make corresponding swaps permanent.
- Start another pass if current pass reduces the
cut size - (usually converge after a few passes)
u ?
v ?
locked
10Fiduccia-Mattheyses Improvement
11Simulated Annealing
Local Search
cost function
solution space
12Statistical Mechanicsvs Combinational
Optimization
13Analogy
14Generic Simulated Annealing Algorithm
15Basic Ingredients for S.A.
- Solution space
- Neighborhood Structure
- Cost Function
- Annealing Schedule
16SA Partitioning
- Optimization by simulation Annealing
-Kirkpatrick, Gaett, Vecchi. - Solution spaceset of all partitions
abc
def
def
bcde
ab
af
a solution
a solution
a solution
abc
def
a move
Randomly move one cell to the other side
17SA Partitioning
- Cost function
- fC?B
- C is the partitioning cost as used before
- B is a measure of how balance the partitioning
is - ? is a constant.
- Example of B
ab . . .
cd . . .
B ( S1 - S2 )2
S2
S1
18SA Partitioning
- Annealing schedule
- Tn(T1/T0)nT0 Ratio T1/T00.9
- At each temperature, either
- 1. There are 10 accepted moves on the average
- or
- 2. of attempts?100? total of cells
- The system is frozen if very low acceptances at
3 consecutive temp.
19Graph Partition Using Simulated Annealing
Without Rejections
- Greene and Supowit, ICCD-88 pp. 658-663
- Motivation
- At low temperature, most moves are rejected!
- e.g. 1/100 acceptance rate for 1,000 vertices
20Graph Partition Using Simulated Annealing Without
Rejections (Contd)
- Key Idea
- (I) Biased selection
- If a move i has probability ?i to be accepted,
generate move i with probability - N
size of neighborhood - In general,
- In conventional model, each move has
probability 1/N - to be generated.
- (II) If a move is generated, it is always be
accepted
21Graph Partition Using Simulated Annealing Without
Rejections (Contd)
22Solution to the Weight Selection Problem(general
solution to the several problems)
?1 ?2
23Solution to the Weight Selection Problem (Contd)
Let W ?1 ?2 ?3?4 ?5 ?6 ?n, how to
select i with probability wi /W ? Equivalent to
choosing x such that ?1 ?i-1lt x ? ?i
?n v? root x ? random( 0, 1 ) w(v) while
v is not a leaf do if x lt w(left (v))
then v ? left(v) else x ?
x-w(left(v)), v ? right (v) end Probability of
ending up at leaf
24Application to PartitioningSpecial solution to
the first problem
25Application to PartitioningSpecial solution to
the first problem(Contd)
Solution Two-step biased selection (i) choose A
or B based on (ii) choose move i within A or B
based Note, s are the same for each in A or
B. So we keep one copy of for A
one copy of for B choose the moves
within A or B using the tree algorithm
D
D
a
I
I
)
(
-gt
T
i
i
D
D
-gta
C
C
)
(
T
i
i
26More Partitioning Techniques
- Spectral based partitioning algorithms
- Hagen-Kahng 1991 Cong-Hagen-Kahng 1992
- Module replication in circuit partitioning
- Kring-Newton 1991 Hwang-ElGamal 1992 Liu et al
TCAD95 Enos, et al, TCAD99 - Generating uni-directional partitioning
- Iman-Pedram-Fabian-Cong 1993 or acyclic
partitioning Cong-Li-Bagrodia, DAC94 Cong-Lim,
ASPDAC2000 - Logic restructuring during partitioning
- Iman-Pedram-Fabian-Cong 1993
- Communication based partitioning
- Hwang-Owens-Irwin 1990 Beardslee-Lin-Sangiovanni
1992
27Multi-Way Partitioning
- Recursive bi-partitioning
Kernighan-Lin 1970 - Generalization of Fuduccia-Mattheyses and
Krishnamurthys algorithms
Sanchis 1989 Cong-Lim,
ICCAD98 - Generalization of ratio-cut and spectral method
to multi-way partitioning
Chan-Schlag-Zien 1993 - generalized ratio-cut valuesum of flux of each
partition - generalized ratio-cut cost of a k-way partition
- ?sum of the k smallest eigenvalue of the
Laplacian Matrix
28Circuit Clustering Formulation
- Motivation
- Reduced the size of flat netlists
- Identify natural circuit hierarchy
- Objectives
- Maximize the connectivity of each cluster
- Minimize the size, delay (or simply depth),
density of clustered circuits
29Lawlers Labeling AlgorithmLawler-Levitt-Turner
1969
- Assumption Cluster size? K Intra-cluster delay
0 Inter-cluster delay 1 - Objective Find a clustering of minimum delay
- Algorithm
- Phase 1 Label all nodes in topological order
- For each PI node V, L(v) 0
- For each non-PI node v
- pMaximum label of predecessors of v
- Xp set of predecessors of v with label p
- if XpltK then L(v) p else L(v) P1
- Phase2 Form clusters
- Start from PO to generate necessary clusters
- Nodes with the same label form a cluster
p-1
p-1
p
p
p-1
Xp
v
30Lawlers Labeling Algorithm(Contd)
- Performance of the algorithm
- Efficient run-time
- Minimum delay clustering solution
- Allow node duplication
- No attempt to minimize the number of clusters
- Extension to allow arbitrary gate delays
- Heuristic solution
Murgai-Brayton-Sangiovanni
1991 - Optimal solution
Rajaraman-Wong 1993
31Maximum Fanout Free Cone (MFFC)
- Definition for a node v in a combinational
circuit, - cone of v ( ) v and all of its
predecessors such that any path connecting a node
in and v lies entirely in - fanout free cone at v ( ) cone of v
such that for any node - maximum FFC at v ( ) FFC of v such
that for any non-PI node w,
32Properties of MFFCs
- If
- Two MFFCs are either disjoint or one contains
another CoDi93
33Maximum Fanout Free Subgraph (MFFS)
- Definition for a node v in a sequential
circuit, - Illustration
MFFCs ???
MFFS
34MFFS Construction Algorithm
- For Single MFFS at Node v
- select root node v and cut all its fanout edges
- mark all nodes reachable backwards from all POs
- MFFSv unmarked nodes
- complexity O(N E)
v
35MFFS Construction Algorithm
- For Single MFFS at Node v
- select root node v and cut all its fanout edges
- mark all nodes reachable backwards from all POs
- MFFSv unmarked nodes
- complexity O(N E)
v
36MFFS Construction Algorithm
- For Single MFFS at Node v
- select root node v and cut all its fanout edges
- mark all nodes reachable backwards from all POs
- MFFSv unmarked nodes
- complexity O(N E)
v
37MFFS Construction Algorithm
- For Single MFFS at Node v
- select root node v and cut all its fanout edges
- mark all nodes reachable backwards from all POs
- MFFSv unmarked nodes
- complexity O(N E)
v
38MFFS Clustering Algorithm
- Clusters Entire Netlist
- construct MFFS at a PO and remove it from netlist
- include its inputs as new POs
- repeat until all nodes are clustered
- complexity O(N (N E))
v
39MFFS Clustering Algorithm
- Clusters Entire Netlist
- construct MFFS at a PO and remove it from netlist
- include its inputs as new POs
- repeat until all nodes are clustered
- complexity O(N (N E))
40MFFS Clustering Algorithm
- Clusters Entire Netlist
- construct MFFS at a PO and remove it from netlist
- include its inputs as new POs
- repeat until all nodes are clustered
- complexity O(N (N E))
v
41MFFS Clustering Algorithm
- Clusters Entire Netlist
- construct MFFS at a PO and remove it from netlist
- include its inputs as new POs
- repeat until all nodes are clustered
- complexity O(N (N E))
42Summary
- Partitioning is key for applying
divide-and-conquer methodology (for complexity
management) - Partitioning also defines global/local
interconnects and greatly impact circuit
performance - Growing importance of interconnect design has
introduced many new partitioning formulations - clustering is effective in reducing circuit size
and identifying natural circuit hierarchy - Multi-level circuit clustering iterative
improvement based methods produce the best
partitioning results