Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu

About This Presentation

Title:

Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu

Description:

Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD ... Provable approximations. Technology extrapolation ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 99

Provided by: Andre524

Learn more at: https://vlsicad.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu

1
Valuation and Values in Application-Driven
Algorithmics Case Studies from VLSI CAD
Andrew B. Kahng, UCLA Computer Science Dept.
June 2, 2000abk_at_cs.ucla.edu,
http//vlsicad.cs.ucla.edu
2
My Research

Applied algorithmics
demonstrably useful solutions for real problems
best known solutions
classic (well-studied) Steiner, partition,
placement, TSP,...
toolkits discrete algorithms, global
optimization, mathematical programming,
approximation frameworks, new-age metaheuristics,
engineering
Ground truths
anatomies
limits

3
Anatomies

Technologies
semiconductor process roadmap, design-manufacturin
g I/F
design technology methodology, flows, design
process
interconnect modeling/analysis delay/noise
est, compact models
Problems
structural theory of large-scale global
optimizations
Heuristics
hypergraph partitioning and clustering
wirelength- and timing-driven placement
single/multiple topology synthesis (length,
delay, skew, buffering,...)
TSP, ..., IP protection, ..., combinatorial
exchange/auction, ...
Cultures
contexts and infrastructure for research and
technology transfer

4
Bounds

Exact methods
Provable approximations
Technology extrapolation
achievable envelope of system implementation
w.r.t. cost, speed, power, reliability, ...
ideally, should drive and be driven by system
architectures, design and implementation
methodologies

5
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse

6
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse

7
Technology Extrapolation
What is the most power-efficient noise management
strategy?

Evaluates impact of
design technology
process technology
Evaluates impact on
achievable design
associated design problems
What matters, when ?
Sets new requirements for CAD tools and
methodologies, capital and RD investment, ...
right tech at the right time
Roadmaps (SIA ITRS) familiar and influential
example

How and when do L, SOI, SER, etc. matter?
Will layout tools need to perform process
simulation to effectively address cross-die and
cross-wafer manufacturing variation?
8
GTX GSRC Technology Extrapolation System

GTX is a framework for technology extrapolation

9
Graphical User Interface (GUI)

Provides user interaction
Visualization (plotting, printing, saving to
file)
4 views
Parameters
Rules
Rule chain
Values in chain

10
GTX Open, Living Roadmap

Openness in grammar, parameters and rules
easy sharing of data, models in research
environment
contributions of best known models from anywhere
Allows development of proprietary models
separation between supplied (shared) and
user-defined parameters / rules
usability behind firewalls
functionality for sharing results instead of data
Multi-platform (SUN Solaris, Windows, Linux)
http//vlsicad.cs.ucla.edu/GSRC/GTX/

11
GTX Activity

Models implemented
Cycle-time models of SUSPENS (with extension by
Takahashi), BACPAC (Sylvester, Berkeley), Fisher
(ITRS)
Currently adding
GENESYS (with help from Georgia Tech)
RIPE (with help from RPI)
New device and power modules (Synopsys /
Berkeley)
New SOI device model (Synopsys / Berkeley)
Inductance extraction (Silicon Graphics /
Berkeley / Synopsys)
Studies performed in GTX
Modeling and parameter sensitivity analyses
Design optimization studies global
interconnects, layer stack
Routability estimation, via impact models, ...

12
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse

13
Subwavelength Optical Lithography
Subwavelength Gap since .35 ?m

EUV, X-rays, E-beams all gt 10 years out
huge investment in gt 30 years of optical litho
infrastructure

14
Mask Types

Bright Field
opaque features
transparent background

Dark Field
transparent features
opaque background

15
Phase Shifting Masks
16
Impact of PSM

PSM enables smaller transistor gate lengths Leff
critical polysilicon features only (gate Leff)
faster device switching faster circuits
better critical dimension (CD) control
improved parametric yield
all features on polysilicon layer, local
interconnect layers
smaller die area more /wafer (full-chip
PSM BIG win)
Alternative build a 10B fab with equipment
that wont exist for 5 years
Data points
exponential increase in price of CAD technology
for PSM
Numerical Technologies market cap 3x that of
Avant!
25 nm gates (!!!) manufactured with 248nm DUV
steppers (NTI MIT Lincoln Labs, announced 2
days ago) 90nm gates in production at Motorola,
Lucent (since late 1999)

17
Double-Exposure Bright-Field PSM
0

180
180
18
The Phase Assignment Problem

Assign 0, 180 phase regions such that critical
features with width (separation) lt B are induced
by adjacent phase regions with opposite phases
Bright Field
(Dark Field)

180
0
180
0
19
Key Global 2-Colorability

If there is an odd cycle of phase implications
layout cannot be
manufactured
layout verification becomes a global, not local,
issue

?
180
0
180
0
180
180
20
Critical features F1,F2,F3,F4
F2
F4
F1
F3
21
F2
F4
F1
F3
Opposite-Phase Shifters (0,180)
22
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Shifters S1-S8

PROPER Phase Assignment
Opposite phases for opposite shifters
Same phase for overlapping shifters

23
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
Proper Phase Assignment is IMPOSSIBLE
24
Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
feature shifting to remove overlap
25
Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
Phase Conflict
feature widening to turn conflict into
non-conflict
26
How will VLSI CAD deal with PSM ?

UCLA first comprehensive methodology for
PSM-aware layout design
currently being integrated by Cadence, Numerical
Technologies
Approach partition responsibility for
phase-assignability
good layout practices (local geometry)
(open) problem is there a set of design rules
that guarantees phase-assignability of layout ?
(no Ts, no doglegs, even fingers...)
automatic phase conflict resolution /
bipartization (global colorability)
enabling reuse of layout (free composability)
problem how can we guarantee reusability of
phase-assigned layouts, such that no odd cycles
can occur when the layouts are composed together
in a larger layout ?

27
Automatic Conflict Resolution
28
Compaction-Oriented Approach

Analyze input layout
Find min-cost set of perturbations needed to
eliminate all odd cycles
Induce constraints for output layout
i.e., PSM-induced (shape, spacing) constraints
Compact to get phase-assignable layout
Key Minimize the set of new constraints,
i.e., break all odd cycles in conflict graph by
deleting a minimum number of edges.

29
Conflict Graph

Dark Field build graph over feature regions
edge between two features whose separation is lt B
Bright Field build graph over shifter regions
shifters for features whose width is lt B
two edge types
adjacency edge between overlapping phase regions
endpoints must have same phase
conflict edge between shifters on opposite side
of critical feature endpoints must have
opposite phase

30
Conflict Graph G

Dark Field

green feature pink conflict
conflict graph G
Bright Field
conflict edge
conflict graph G
adjacency edge
31
Optimal Odd Cycle Elimination
dark green feature pink conflict
conflict graph G
dual graph D
T-join of odd-degree nodes in D
32
Optimal Odd Cycle Elimination
- assign phases dark green and purple -
remaining pink conflicts correctly handled
dark green feature pink conflict
corresponds to broken edges in original conflict
graph
T-join of odd-degree nodes in D
33
The T-join Problem

How to delete minimum-cost set of edges from
conflict graph G to eliminate odd cycles?
Construct geometric dual graph D dual(G)
Find odd-degree vertices T in D
Solve the T-join problem in D
find min-weight edge set J in D such that
all T-vertices have odd degree
all other vertices have even degree
Solution J corresponds to desired min-cost edge
set in conflict graph G

34
Solving T-join in Sparse Graphs

Reduction to matching
construct a complete graph T(G)
vertices T-vertices
edge costs shortest-path cost
find minimum-cost perfect matching
Typical example sparse (not always planar)
graph
note that conflict graphs are sparse
vertices 1,000,000
edges ? 5 ? vertices
T-vertices ? 10 of vertices 100,000
Drawback finding APSP too slow, memory-consuming
vertices 100,000 edges in T(G)
5,000,000,000

35
Solving T-join Reduction to Matching

Desirable properties of reduction to matching
exact (i.e., optimal)
not much memory (say, 2-3X more)
leads to very fast solution
Solution gadgets!
replace each edge/vertex with gadgets s.t.
matching all vertices in gadgeted graph
Û T-join in original graph

36
T-join Problem Reduction to Matching

replace each vertex with a chain of triangles
one more edge for T-vertices
in graph D m edges, n vertices, t T
in gadgeted graph 4m-2n-t vertices, 7m-5n-t
edges
cost of red edges original dual edge costs
cost of (black) edges in triangles 0

vertex Î T
vertex ? T
37
Example of Gadgeted Graph
Gadgeted graph
Dual Graph
black red edges min-cost perfect matching
38
Results

Runtimes in CPU seconds on Sun Ultra-10
Greedy breadth-first-search bicoloring
GW Goemans/Williamson95 heuristic
Cook/Rohe98 for perfect matching
Integration w/compactor saves 9 layout area
vs. GW

39
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Can distinguish between use of shifting, widening
DOFs
40
Black points - features Blue - shifter
overlap Red - extra nodes to distinguish
opposite shifters
Bipartization Problem delete min of nodes
(or edges) to make graph bipartite - blue
nodes shifting - red nodes widening
Bipartization by node deletion is
NP-hard (GW98 9/4-approx)
41
Summary

New fast, optimal algorithms for edge-deletion
bipartization
Fast T-join using gadgets
applicable to any AltPSM phase conflict graphs
Approximate solution for node-deletion
bipartization
Goemans-Williamson98 9/4-approximation
If node-deletion cost lt 1.5 edge deletion, GW is
better than edge deletion
Comprehensive integration w/NTI, Cadence tools

42
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse

43
Applied Algorithmics RD

Heuristics for hard problems
Problems have practical context
Choices dominated by engineering tradeoffs
QOR vs. resource usage, accessibility,
adoptability
How do you know/show that your approach is good?

44
Hypergraphs in VLSI CAD

Circuit netlist represented by hypergraph

45
Hypergraph Partitioning in VLSI

Variants
directed/undirected hypergraphs
weighted/unweighted vertices, edges
constraints, objectives,
Human-designed instances
Benchmarks
up to 4,000,000 vertices
sparse (vertex degree 4, hyperedge size 4)
small number of very large hyperedges
Efficiency, flexibility KL-FM style preferred

46
Context Top-Down VLSI Placement
etc
47
Context Top-Down Placement

Speed
6,000 cells/minute to final detailed placement
partitioning used only in top-down global
placement
implied partitioning runtime 1 second for
25,000 cells, lt 30 seconds for 750,000 cells
Structure
tight balance constraint on total cell areas in
partitions
widely varying cell areas
fixed terminals (pads, terminal propagation, etc.)

48
Fiduccia-Mattheyses (FM) Approach

Pass
start with all vertices free to move (unlocked)
label each possible move with immediate change in
cost that it causes (gain)
iteratively select and execute a move with
highest gain, lock the moving vertex (i.e.,
cannot move again during the pass), and update
affected gains
best solution seen during the pass is adopted as
starting solution for next pass
FM
start with some initial solution
perform passes until a pass fails to improve
solution quality

49
Cut During One Pass (Bipartitioning)
Cut
Moves
50
Multilevel Partitioning
Refinement
Clustering
51
Key Elements of FM

Three main operations
computation of initial gain values at beginning
of pass
retrieval of the best-gain (feasible) move
update of all affected gain values after a move
is made
Contribution of Fiduccia and Mattheyses
circuit hypergraphs are sparse
move gain is bounded between 2 , -2 max
vertex degree
hash moves by gains (gain bucket structure)
each gain affected by a move is updated in
constant time
linear time complexity per pass

52
Taxonomy of Algorithm and Implementation
Improvements

Modifications of the algorithm
Implicit decisions
Tuning that can change the result
Tuning that cannot change the result

53
Modifications of the Algorithm

Important changes to flow, new steps/features
lookahead tie-breaking
CLIP
instead of actual gain, maintain updated gain
actual gain minus
initial gain (at start of pass)
WHY ???
cut-line refinement
insert nodes into gain structure only if incident
to cut nets
multiple unlocking

54
Modifications of the Algorithm

Important changes to flow, new steps/features
lookahead tie-breaking
CLIP
instead of actual gain, maintain updated gain
actual gain minus
initial gain
promotes clustered moves (similar to LIFO
gain buckets)
cut-line refinement
insert nodes into gain structure only if incident
to cut nets
multiple unlocking

55
Implicit Decisions

Tie-breaking in choosing highest gain bucket
Tie-breaking in where to attach new element in
gain bucket
LIFO vs. FIFO vs. random ... (known issue HK 95)
Whether to update, or skip updating, when delta
gain of a move is zero
Tie-breaking when selecting the best solution
seen during pass
first encountered, last encountered,
best-balance, ...

56
Tuning That Can Change the Result

Threshold large nets to reduce runtime
Skip gain update for large nets
Skip zero delta gain updates
changes resolution of hash collisions in gain
container
Loose/stable net removal
perform gain updates for only selected nets
Allow illegal solutions during pass

57
Tuning That Cant Change the Result

Skip updates for nets that cannot have
non-zero delta gain
netcut-specific optimizations
2-way specific optimizations
optimizations for nets of small degree
.....
... 30 years since KL70, 18 years since FM82,
100s of papers in literature

58
Zero Delta Gain Update

When vertex x is moved, gains for all vertices y
on nets incident to x must potentially be updated
In all FM implementations, this is done by going
through incident nets one at a time, computing
changes in gain for vertices y on these nets
Implicit decision
reinsert a vertex y when it experiences a zero
delta gain move (will shift position of y within
the same gain bucket)
skip the gain update (leave position of y
unchanged)

59
Tie-Breaking Between Highest-Gain Buckets

Gain container typically implemented such that
available moves are segregated, e.g., by source
or destination partition
There can be more than one highest-gain bucket
When balance constraint is anything other than
exact bisection, moves at multiple highest-gain
buckets can be legal
Implicit decision
choose the move that is from the same partition
as the last vertex moved (toward)
choose the move that is not from the same
partition as the last vertex moved (away)
choose the move in partition 0 (part0)

60
How Much Can This Matter ?

5 ?
10 ?
20 ?
more ?
50 ?
more ?

61
Implicit Decision Effects IBM01
62
Effect of Implicit Decisions

Stunning average cutsize difference for flat
partitioner with worst vs. best combination
far outweighs new improvements
One wrong decision can lead to misleading
conclusions w.r.t. other decisions
part0 is worse than toward with zero delta
gain updates
better or same without zero delta gain updates
Stronger optimization engines mask flaws
ML CLIP gt ML LIFO gt Flat CLIP gt Flat LIFO
less dynamic range ML masks bad flat
implementation

63
Tuning Effects

Comparison of two CLIP-FM implementation
Min and Ave cutsizes from 100 single-start trials
Another quiz Why did this happen ?
N.B. original inventor of CLIP-FM couldnt
figure it out

64
Tuning Effects

Comparison of two CLIP-FM implementation
Min and Ave cutsizes from 100 single-start trials
Another quiz Why did this happen ?
Hint some modern IBM benchmarks have large
macro-cells

65
Sheer Nightmare Stuff...

Comparison of two LIFO-FM implementations
Min and Ave cut sizes from 100 single-start
trials
Papers 1, 2 both published since mid-1998

66
In Case You Are Wondering...No, VLSI CAD
Researchers Are Not Stupid.
67
How Much Can This Matter ?

5 ?
10 ?
20 ?
more ?
50 ?
more ?
Answer 400 2000 w.r.t. recent
literature and STANDARD, WELL-UNDERSTOOD
heuristics
lots more N years leading partitioner,
placer

68
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse

69
"Barriers to Entry for Researchers

Code development barrier
bare-bones self-contained partitioner 800 lines
not leading-edge (Dutt/Deng LIFO-FM)
modern partitioner requires much more code
Expertise barrier
very small details can have stunning impact
must not only know what to do, but also what not
to do
impossible to estimate knowledge/expertise
required to do research at leading edge
Need reference implementations !
reference prose (6 pp. 9pt double-column)
insufficient

70
Barriers to Relevance for Researchers

All heuristic engines/algorithms tuned to test
cases
Test case usage must capture real use models,
driving applications
e.g., recall bipartitioning is driven by top-down
placement
until CKM99 no one considered effect of fixed
vertices !!!
Test case usage can be fatally flawed by
details
hidden or previously unrealized
previously believed insignificant
results of algorithm research will be flawed as a
result

71
Challenges for Applied Algorithmics

Research in mature areas can stall
incremental research - difficult and risky
implementations not available ? duplicated effort
too much trust ? which approach is really the
best?
some results may not be replicable
not novel is common reason for paper rejection
exploratory research - paradoxically, lower-risk
novelty for the sake of novelty
yet, novel approaches must be well-substantiated
Pitfalls questionable value, roadblocks,
obsolete contexts

72
Challenges for Applied Algorithmics

Difficult to be relevant (time-to-market, QOR
issues)
time to market 5-7 year delay from publishing to
first industrial use (cf. market lifetimes, tech
extrapolation...)
quality of results unmeasurable, unpredictable,
basically unknown
Good news barriers to entry and barriers to
relevance are self-inflicted, and possibly
curable
mature domains require mature RD methodologies
a possible solution cultivate flexibility and
reuse
low cost update of previous work to support
reuse
future tool/algorithm development biased towards
reuse

73
Analogy Hardware Design Tool Design

Hardware design is difficult
complex electrical engineering and optimization
problems
mistakes are costly
verification and test not trivial
few can afford to truly exploit the limits of
technology
A Winning Approach Hardware IP reuse
CAD tools design is difficult
complex software engineering and optimization
problems
mistakes can be showstoppers
verification and test not trivial
few can manage complexity of leading-edge
approaches
A "Surprising Idea CAD-IP reuse

74
What is CAD-IP?

Data models and benchmarks
context descriptions and use models
testcases and good solutions
Algorithms and algorithm analyses
mathematical formulations
comparison and evaluation methodologies for
algorithms
executables and source code of implementations
leading-edge performance results
Traditional (paper-based) publications

75
Bookshelf A Repository for CAD-IP

Community memory for CAD-IP
data models
algorithms
implementations
Publication medium that enables efficient applied
algorithmics algorithm research
benchmarks, performance results
algorithm descriptions and analyses
quality implementations (e.g., open-source Capo,
MLPart)
Simplified comparisons to identify best
approaches
Easier for industry to communicate new use models

76
Summary Addressing Inefficiencies

Inefficiencies
lack of openness and standards ? huge duplication
of effort
incomparable reporting ? improvement difficult
lack of standard comparison/latest use models ?
best approach not clear
industry doesnt bother w/feedback ? outdated use
models
Proposed solutions
widely available, up-to-date, extensible
benchmarks
standardized performance reporting for
leading-edge approaches
available detailed descriptions of algorithms
peer review of executables (and source code?)
credit for quality implementations
Better research, faster adoption, more impact
http//vlsicad.cs.ucla.edu/GSRC/bookshelf/

77
Todays Talk

Demonstrably useful solutions for real problems
Valuation What problems require attention ?
technology extrapolation
automatic layout of phase-shifting masks
Values How do we advance the leading edge ?
anatomy of FM-based hypergraph partitioning
heuristics
culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
Thank you for your attention !!!

78
Spare Slides
79
Parameters

Description of technology, circuit and design
attributes
Importance of consistent naming cannot be
overstated
Naming conventions for parameters
ltprepositiongt _ ltprincipalgt _ qualifier _
ltplacegt _ ltqualifiergt _ ltadverbialgt _
ltindexgt _ ltunitgt
Example r_int_tot_lyr_pu_dl
Benefits
Relatively easy to understand parameter from its
name
Distinguishable (no two parameters should have
the same name)
r_int (interconnect resistance) r_int
(interconnect resistivity) ?
Unique (no two names for the same parameter)
R_int R_wire ?
Sortable (important literals come first)
Software to automatically check parameter naming

80
Rules

Methods to derive unknown parameters from known
ones
ASCII rules
Laws of physics, models of electrical behavior
Statistical models (e.g., Rent's rule)
Include closed-form expressions, vector
operations, tables
Storing of calibration data (e.g., technology
files) for known process, design points in
lookup tables
Constraints
Simulated by rules that compute boolean values
Used to limit range during sweeping
Optimization over a collection of rules
Example buffer insertion for minimal delay with
area constraints

81
Rules (Cont.)

External executable rules
Assume a callable executable (e.g., PERL script)
Example optimization of number and size of
repeaters for global wires
Use command-line interface and transfer through
files
Allow complex semantics of a rule
Example placers, IPEM executable Cong, UCLA)
Code rules
Implemented in C and linked into the inference
engine
Useful if execution speed is an issue

82
Engine

Contains no domain-specific knowledge
Evaluates rules in topological order
Performs studies (multiple evaluations
tradeoffs/sweeping, optimization)

83
Knowledge Representation

Rules and parameters are specified separately
from the derivation engine
Human-readable ASCII grammar
Benefits
Easy creation/sharing of parameters/rules by
multiple users
D. Sylvester and C. Cao device and power, SOI
modules that drop in to GTX
P.K. Nag Yield modeling
Extensible to models of arbitrary complexity
(specialized prediction methods, technology data
sets, optimization engines)
Avant! Apollo or Cadence SE PR tool just
another wirelength estimator
Applies to any domain of work in semiconductors,
VLSI CAD
Transistor sizing, single wire optimizations,
system-level wiring predictions,

84
Corking Effect in CLIP

CLIP begins by placing all moves into the 0-gain
buckets
CLIP chooses moves by cumulative delta gain
(updated gain)
initially, every move has cumulative delta gain
0
Historical legacy (and for speed) FM
partitioners typically look only at the first
move in a bucket
if it is illegal, skip the rest of the bucket
(possibly skip all buckets for that partition)
If the move at the head of each bucket at the
beginning of a CLIP pass is illegal, pass
terminates without making any moves
even if first move is legal, an illegal move soon
afterward will cork
New test cases (IBM) have large cells
large cells have large degree, and often large
initial gain
CLIP inventor couldnt understand bad performance
on IBM cases

85
Tuning to Uncork CLIP

Dont place nodes with area gt balance constraint
in gain container at pass initialization
actually useful for all FM variants
zero CPU overhead
Look beyond the first move in a bucket
extremely expensive
hurts quality (partitioner doesnt operate well
near balance tolerance
not worth it, in our experience
Simply do a LIFO pass before starting CLIP
spreads out nodes in gain buckets
reduces likelihood that large node has largest
total gain

86
Effect of Fixed Terminals
Normalized Cost for IBM01
Runtime for IBM01
87
Enabling Reuse Free
Composability
88
Conflict in Cell (Macro) Based Layouts

Consider connected components of conflict graphs
within each cell master
each component independently phase-assignable (2k
versions)
each is a single vertex in coarse-grain
conflict graph
problem assure free composability (reusability)
of cell masters, such that no odd cycles can
arise in coarse-grain conflict graph

cell master A
cell master B
connected component
edge in coarse-grain conflict graph
89
Case I Creating CAD IP of Questionable Value

Recent hypergraph partitioning papers report FM
implementations 20x worse than leading-edge FM
previous lack of openness caused wrong
conclusions, wasted effort
some improvements may only apply to weak
implementations
duplicated effort re-implementing (incorrectly?)
well-known algorithms
difficult to find the leading edge
no standard comparison methodology
how do you know if an implementation is poor?
To make leading-edge apparent and reproducible
publish performance results on standard
benchmarks
peer review (executables, source code?)
similar to common publication standards !