Title: Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu
1Valuation and Values in Application-Driven
Algorithmics Case Studies from VLSI CAD
Andrew B. Kahng, UCLA Computer Science Dept.
June 2, 2000abk_at_cs.ucla.edu,
http//vlsicad.cs.ucla.edu
2My Research
- Applied algorithmics
- demonstrably useful solutions for real problems
- best known solutions
- classic (well-studied) Steiner, partition,
placement, TSP,... - toolkits discrete algorithms, global
optimization, mathematical programming,
approximation frameworks, new-age metaheuristics,
engineering - Ground truths
- anatomies
- limits
3Anatomies
- Technologies
- semiconductor process roadmap, design-manufacturin
g I/F - design technology methodology, flows, design
process - interconnect modeling/analysis delay/noise
est, compact models - Problems
- structural theory of large-scale global
optimizations - Heuristics
- hypergraph partitioning and clustering
- wirelength- and timing-driven placement
- single/multiple topology synthesis (length,
delay, skew, buffering,...) - TSP, ..., IP protection, ..., combinatorial
exchange/auction, ... - Cultures
- contexts and infrastructure for research and
technology transfer
4Bounds
- Exact methods
- Provable approximations
- Technology extrapolation
- achievable envelope of system implementation
w.r.t. cost, speed, power, reliability, ... - ideally, should drive and be driven by system
architectures, design and implementation
methodologies
5Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
6Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
7Technology Extrapolation
What is the most power-efficient noise management
strategy?
- Evaluates impact of
- design technology
- process technology
- Evaluates impact on
- achievable design
- associated design problems
- What matters, when ?
- Sets new requirements for CAD tools and
methodologies, capital and RD investment, ...
right tech at the right time - Roadmaps (SIA ITRS) familiar and influential
example
How and when do L, SOI, SER, etc. matter?
Will layout tools need to perform process
simulation to effectively address cross-die and
cross-wafer manufacturing variation?
8GTX GSRC Technology Extrapolation System
- GTX is a framework for technology extrapolation
9Graphical User Interface (GUI)
- Provides user interaction
- Visualization (plotting, printing, saving to
file) - 4 views
- Parameters
- Rules
- Rule chain
- Values in chain
10GTX Open, Living Roadmap
- Openness in grammar, parameters and rules
- easy sharing of data, models in research
environment - contributions of best known models from anywhere
- Allows development of proprietary models
- separation between supplied (shared) and
user-defined parameters / rules - usability behind firewalls
- functionality for sharing results instead of data
- Multi-platform (SUN Solaris, Windows, Linux)
- http//vlsicad.cs.ucla.edu/GSRC/GTX/
11GTX Activity
- Models implemented
- Cycle-time models of SUSPENS (with extension by
Takahashi), BACPAC (Sylvester, Berkeley), Fisher
(ITRS) - Currently adding
- GENESYS (with help from Georgia Tech)
- RIPE (with help from RPI)
- New device and power modules (Synopsys /
Berkeley) - New SOI device model (Synopsys / Berkeley)
- Inductance extraction (Silicon Graphics /
Berkeley / Synopsys) - Studies performed in GTX
- Modeling and parameter sensitivity analyses
- Design optimization studies global
interconnects, layer stack - Routability estimation, via impact models, ...
12Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
13Subwavelength Optical Lithography
Subwavelength Gap since .35 ?m
- EUV, X-rays, E-beams all gt 10 years out
- huge investment in gt 30 years of optical litho
infrastructure
14Mask Types
- Bright Field
- opaque features
- transparent background
- Dark Field
- transparent features
- opaque background
15Phase Shifting Masks
16Impact of PSM
- PSM enables smaller transistor gate lengths Leff
- critical polysilicon features only (gate Leff)
- faster device switching faster circuits
- better critical dimension (CD) control
improved parametric yield - all features on polysilicon layer, local
interconnect layers - smaller die area more /wafer (full-chip
PSM BIG win) - Alternative build a 10B fab with equipment
that wont exist for 5 years - Data points
- exponential increase in price of CAD technology
for PSM - Numerical Technologies market cap 3x that of
Avant! - 25 nm gates (!!!) manufactured with 248nm DUV
steppers (NTI MIT Lincoln Labs, announced 2
days ago) 90nm gates in production at Motorola,
Lucent (since late 1999)
17Double-Exposure Bright-Field PSM
0
180
180
18The Phase Assignment Problem
- Assign 0, 180 phase regions such that critical
features with width (separation) lt B are induced
by adjacent phase regions with opposite phases - Bright Field
(Dark Field)
180
0
180
0
19Key Global 2-Colorability
- If there is an odd cycle of phase implications
layout cannot be
manufactured - layout verification becomes a global, not local,
issue
?
180
0
180
0
180
180
20Critical features F1,F2,F3,F4
F2
F4
F1
F3
21F2
F4
F1
F3
Opposite-Phase Shifters (0,180)
22F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Shifters S1-S8
- PROPER Phase Assignment
- Opposite phases for opposite shifters
- Same phase for overlapping shifters
23F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
Proper Phase Assignment is IMPOSSIBLE
24Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
feature shifting to remove overlap
25Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
Phase Conflict
feature widening to turn conflict into
non-conflict
26How will VLSI CAD deal with PSM ?
- UCLA first comprehensive methodology for
PSM-aware layout design - currently being integrated by Cadence, Numerical
Technologies - Approach partition responsibility for
phase-assignability - good layout practices (local geometry)
- (open) problem is there a set of design rules
that guarantees phase-assignability of layout ?
(no Ts, no doglegs, even fingers...) - automatic phase conflict resolution /
bipartization (global colorability) - enabling reuse of layout (free composability)
- problem how can we guarantee reusability of
phase-assigned layouts, such that no odd cycles
can occur when the layouts are composed together
in a larger layout ?
27Automatic Conflict Resolution
28Compaction-Oriented Approach
- Analyze input layout
- Find min-cost set of perturbations needed to
eliminate all odd cycles - Induce constraints for output layout
- i.e., PSM-induced (shape, spacing) constraints
- Compact to get phase-assignable layout
- Key Minimize the set of new constraints,
i.e., break all odd cycles in conflict graph by
deleting a minimum number of edges.
29Conflict Graph
- Dark Field build graph over feature regions
- edge between two features whose separation is lt B
- Bright Field build graph over shifter regions
- shifters for features whose width is lt B
- two edge types
- adjacency edge between overlapping phase regions
endpoints must have same phase - conflict edge between shifters on opposite side
of critical feature endpoints must have
opposite phase
30Conflict Graph G
green feature pink conflict
conflict graph G
Bright Field
conflict edge
conflict graph G
adjacency edge
31Optimal Odd Cycle Elimination
dark green feature pink conflict
conflict graph G
dual graph D
T-join of odd-degree nodes in D
32Optimal Odd Cycle Elimination
- assign phases dark green and purple -
remaining pink conflicts correctly handled
dark green feature pink conflict
corresponds to broken edges in original conflict
graph
T-join of odd-degree nodes in D
33The T-join Problem
- How to delete minimum-cost set of edges from
conflict graph G to eliminate odd cycles? - Construct geometric dual graph D dual(G)
- Find odd-degree vertices T in D
- Solve the T-join problem in D
- find min-weight edge set J in D such that
- all T-vertices have odd degree
- all other vertices have even degree
- Solution J corresponds to desired min-cost edge
set in conflict graph G
34Solving T-join in Sparse Graphs
- Reduction to matching
- construct a complete graph T(G)
- vertices T-vertices
- edge costs shortest-path cost
- find minimum-cost perfect matching
- Typical example sparse (not always planar)
graph - note that conflict graphs are sparse
- vertices 1,000,000
- edges ? 5 ? vertices
- T-vertices ? 10 of vertices 100,000
- Drawback finding APSP too slow, memory-consuming
- vertices 100,000 edges in T(G)
5,000,000,000
35Solving T-join Reduction to Matching
- Desirable properties of reduction to matching
- exact (i.e., optimal)
- not much memory (say, 2-3X more)
- leads to very fast solution
- Solution gadgets!
- replace each edge/vertex with gadgets s.t.
- matching all vertices in gadgeted graph
- Û T-join in original graph
36T-join Problem Reduction to Matching
- replace each vertex with a chain of triangles
- one more edge for T-vertices
- in graph D m edges, n vertices, t T
- in gadgeted graph 4m-2n-t vertices, 7m-5n-t
edges - cost of red edges original dual edge costs
cost of (black) edges in triangles 0
vertex Î T
vertex ? T
37Example of Gadgeted Graph
Gadgeted graph
Dual Graph
black red edges min-cost perfect matching
38Results
- Runtimes in CPU seconds on Sun Ultra-10
- Greedy breadth-first-search bicoloring
- GW Goemans/Williamson95 heuristic
- Cook/Rohe98 for perfect matching
- Integration w/compactor saves 9 layout area
vs. GW
39F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Can distinguish between use of shifting, widening
DOFs
40Black points - features Blue - shifter
overlap Red - extra nodes to distinguish
opposite shifters
Bipartization Problem delete min of nodes
(or edges) to make graph bipartite - blue
nodes shifting - red nodes widening
Bipartization by node deletion is
NP-hard (GW98 9/4-approx)
41Summary
- New fast, optimal algorithms for edge-deletion
bipartization - Fast T-join using gadgets
- applicable to any AltPSM phase conflict graphs
- Approximate solution for node-deletion
bipartization - Goemans-Williamson98 9/4-approximation
- If node-deletion cost lt 1.5 edge deletion, GW is
better than edge deletion - Comprehensive integration w/NTI, Cadence tools
42Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
43Applied Algorithmics RD
- Heuristics for hard problems
- Problems have practical context
- Choices dominated by engineering tradeoffs
- QOR vs. resource usage, accessibility,
adoptability - How do you know/show that your approach is good?
44Hypergraphs in VLSI CAD
- Circuit netlist represented by hypergraph
45Hypergraph Partitioning in VLSI
- Variants
- directed/undirected hypergraphs
- weighted/unweighted vertices, edges
- constraints, objectives,
- Human-designed instances
- Benchmarks
- up to 4,000,000 vertices
- sparse (vertex degree 4, hyperedge size 4)
- small number of very large hyperedges
- Efficiency, flexibility KL-FM style preferred
46Context Top-Down VLSI Placement
etc
47Context Top-Down Placement
- Speed
- 6,000 cells/minute to final detailed placement
- partitioning used only in top-down global
placement - implied partitioning runtime 1 second for
25,000 cells, lt 30 seconds for 750,000 cells - Structure
- tight balance constraint on total cell areas in
partitions - widely varying cell areas
- fixed terminals (pads, terminal propagation, etc.)
48Fiduccia-Mattheyses (FM) Approach
- Pass
- start with all vertices free to move (unlocked)
- label each possible move with immediate change in
cost that it causes (gain) - iteratively select and execute a move with
highest gain, lock the moving vertex (i.e.,
cannot move again during the pass), and update
affected gains - best solution seen during the pass is adopted as
starting solution for next pass - FM
- start with some initial solution
- perform passes until a pass fails to improve
solution quality
49Cut During One Pass (Bipartitioning)
Cut
Moves
50Multilevel Partitioning
Refinement
Clustering
51Key Elements of FM
- Three main operations
- computation of initial gain values at beginning
of pass - retrieval of the best-gain (feasible) move
- update of all affected gain values after a move
is made - Contribution of Fiduccia and Mattheyses
- circuit hypergraphs are sparse
- move gain is bounded between 2 , -2 max
vertex degree - hash moves by gains (gain bucket structure)
- each gain affected by a move is updated in
constant time - linear time complexity per pass
52Taxonomy of Algorithm and Implementation
Improvements
- Modifications of the algorithm
- Implicit decisions
- Tuning that can change the result
- Tuning that cannot change the result
53Modifications of the Algorithm
- Important changes to flow, new steps/features
- lookahead tie-breaking
- CLIP
- instead of actual gain, maintain updated gain
actual gain minus
initial gain (at start of pass) - WHY ???
- cut-line refinement
- insert nodes into gain structure only if incident
to cut nets - multiple unlocking
54Modifications of the Algorithm
- Important changes to flow, new steps/features
- lookahead tie-breaking
- CLIP
- instead of actual gain, maintain updated gain
actual gain minus
initial gain - promotes clustered moves (similar to LIFO
gain buckets) - cut-line refinement
- insert nodes into gain structure only if incident
to cut nets - multiple unlocking
55Implicit Decisions
- Tie-breaking in choosing highest gain bucket
- Tie-breaking in where to attach new element in
gain bucket - LIFO vs. FIFO vs. random ... (known issue HK 95)
- Whether to update, or skip updating, when delta
gain of a move is zero - Tie-breaking when selecting the best solution
seen during pass - first encountered, last encountered,
best-balance, ...
56Tuning That Can Change the Result
- Threshold large nets to reduce runtime
- Skip gain update for large nets
- Skip zero delta gain updates
- changes resolution of hash collisions in gain
container - Loose/stable net removal
- perform gain updates for only selected nets
- Allow illegal solutions during pass
57Tuning That Cant Change the Result
- Skip updates for nets that cannot have
non-zero delta gain - netcut-specific optimizations
- 2-way specific optimizations
- optimizations for nets of small degree
- .....
- ... 30 years since KL70, 18 years since FM82,
100s of papers in literature
58Zero Delta Gain Update
- When vertex x is moved, gains for all vertices y
on nets incident to x must potentially be updated - In all FM implementations, this is done by going
through incident nets one at a time, computing
changes in gain for vertices y on these nets - Implicit decision
- reinsert a vertex y when it experiences a zero
delta gain move (will shift position of y within
the same gain bucket) - skip the gain update (leave position of y
unchanged)
59Tie-Breaking Between Highest-Gain Buckets
- Gain container typically implemented such that
available moves are segregated, e.g., by source
or destination partition - There can be more than one highest-gain bucket
- When balance constraint is anything other than
exact bisection, moves at multiple highest-gain
buckets can be legal - Implicit decision
- choose the move that is from the same partition
as the last vertex moved (toward) - choose the move that is not from the same
partition as the last vertex moved (away) - choose the move in partition 0 (part0)
60How Much Can This Matter ?
- 5 ?
- 10 ?
- 20 ?
- more ?
- 50 ?
- more ?
61Implicit Decision Effects IBM01
62Effect of Implicit Decisions
- Stunning average cutsize difference for flat
partitioner with worst vs. best combination - far outweighs new improvements
- One wrong decision can lead to misleading
conclusions w.r.t. other decisions - part0 is worse than toward with zero delta
gain updates - better or same without zero delta gain updates
- Stronger optimization engines mask flaws
- ML CLIP gt ML LIFO gt Flat CLIP gt Flat LIFO
- less dynamic range ML masks bad flat
implementation
63Tuning Effects
- Comparison of two CLIP-FM implementation
- Min and Ave cutsizes from 100 single-start trials
- Another quiz Why did this happen ?
- N.B. original inventor of CLIP-FM couldnt
figure it out
64Tuning Effects
- Comparison of two CLIP-FM implementation
- Min and Ave cutsizes from 100 single-start trials
- Another quiz Why did this happen ?
- Hint some modern IBM benchmarks have large
macro-cells
65Sheer Nightmare Stuff...
- Comparison of two LIFO-FM implementations
- Min and Ave cut sizes from 100 single-start
trials - Papers 1, 2 both published since mid-1998
66In Case You Are Wondering...No, VLSI CAD
Researchers Are Not Stupid.
67How Much Can This Matter ?
- 5 ?
- 10 ?
- 20 ?
- more ?
- 50 ?
- more ?
- Answer 400 2000 w.r.t. recent
literature and STANDARD, WELL-UNDERSTOOD
heuristics - lots more N years leading partitioner,
placer
68Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse
69"Barriers to Entry for Researchers
- Code development barrier
- bare-bones self-contained partitioner 800 lines
- not leading-edge (Dutt/Deng LIFO-FM)
- modern partitioner requires much more code
- Expertise barrier
- very small details can have stunning impact
- must not only know what to do, but also what not
to do - impossible to estimate knowledge/expertise
required to do research at leading edge - Need reference implementations !
- reference prose (6 pp. 9pt double-column)
insufficient
70Barriers to Relevance for Researchers
- All heuristic engines/algorithms tuned to test
cases - Test case usage must capture real use models,
driving applications - e.g., recall bipartitioning is driven by top-down
placement - until CKM99 no one considered effect of fixed
vertices !!! - Test case usage can be fatally flawed by
details - hidden or previously unrealized
- previously believed insignificant
- results of algorithm research will be flawed as a
result
71Challenges for Applied Algorithmics
- Research in mature areas can stall
- incremental research - difficult and risky
- implementations not available ? duplicated effort
- too much trust ? which approach is really the
best? - some results may not be replicable
- not novel is common reason for paper rejection
- exploratory research - paradoxically, lower-risk
- novelty for the sake of novelty
- yet, novel approaches must be well-substantiated
- Pitfalls questionable value, roadblocks,
obsolete contexts
72Challenges for Applied Algorithmics
- Difficult to be relevant (time-to-market, QOR
issues) - time to market 5-7 year delay from publishing to
first industrial use (cf. market lifetimes, tech
extrapolation...) - quality of results unmeasurable, unpredictable,
basically unknown - Good news barriers to entry and barriers to
relevance are self-inflicted, and possibly
curable - mature domains require mature RD methodologies
- a possible solution cultivate flexibility and
reuse - low cost update of previous work to support
reuse - future tool/algorithm development biased towards
reuse
73Analogy Hardware Design Tool Design
- Hardware design is difficult
- complex electrical engineering and optimization
problems - mistakes are costly
- verification and test not trivial
- few can afford to truly exploit the limits of
technology - A Winning Approach Hardware IP reuse
- CAD tools design is difficult
- complex software engineering and optimization
problems - mistakes can be showstoppers
- verification and test not trivial
- few can manage complexity of leading-edge
approaches - A "Surprising Idea CAD-IP reuse
74What is CAD-IP?
- Data models and benchmarks
- context descriptions and use models
- testcases and good solutions
- Algorithms and algorithm analyses
- mathematical formulations
- comparison and evaluation methodologies for
algorithms - executables and source code of implementations
- leading-edge performance results
- Traditional (paper-based) publications
75Bookshelf A Repository for CAD-IP
- Community memory for CAD-IP
- data models
- algorithms
- implementations
- Publication medium that enables efficient applied
algorithmics algorithm research - benchmarks, performance results
- algorithm descriptions and analyses
- quality implementations (e.g., open-source Capo,
MLPart) - Simplified comparisons to identify best
approaches - Easier for industry to communicate new use models
76Summary Addressing Inefficiencies
- Inefficiencies
- lack of openness and standards ? huge duplication
of effort - incomparable reporting ? improvement difficult
- lack of standard comparison/latest use models ?
best approach not clear - industry doesnt bother w/feedback ? outdated use
models - Proposed solutions
- widely available, up-to-date, extensible
benchmarks - standardized performance reporting for
leading-edge approaches - available detailed descriptions of algorithms
- peer review of executables (and source code?)
- credit for quality implementations
- Better research, faster adoption, more impact
- http//vlsicad.cs.ucla.edu/GSRC/bookshelf/
77Todays Talk
- Demonstrably useful solutions for real problems
- Valuation What problems require attention ?
- technology extrapolation
- automatic layout of phase-shifting masks
- Values How do we advance the leading edge ?
- anatomy of FM-based hypergraph partitioning
heuristics - culture change restoring time-to-market and QOR
in applied algorithmics via IP reuse - Thank you for your attention !!!
78Spare Slides
79Parameters
- Description of technology, circuit and design
attributes - Importance of consistent naming cannot be
overstated - Naming conventions for parameters
- ltprepositiongt _ ltprincipalgt _ qualifier _
ltplacegt _ ltqualifiergt _ ltadverbialgt _
ltindexgt _ ltunitgt - Example r_int_tot_lyr_pu_dl
- Benefits
- Relatively easy to understand parameter from its
name - Distinguishable (no two parameters should have
the same name) - r_int (interconnect resistance) r_int
(interconnect resistivity) ? - Unique (no two names for the same parameter)
- R_int R_wire ?
- Sortable (important literals come first)
- Software to automatically check parameter naming
80Rules
- Methods to derive unknown parameters from known
ones - ASCII rules
- Laws of physics, models of electrical behavior
- Statistical models (e.g., Rent's rule)
- Include closed-form expressions, vector
operations, tables - Storing of calibration data (e.g., technology
files) for known process, design points in
lookup tables - Constraints
- Simulated by rules that compute boolean values
- Used to limit range during sweeping
- Optimization over a collection of rules
- Example buffer insertion for minimal delay with
area constraints
81Rules (Cont.)
- External executable rules
- Assume a callable executable (e.g., PERL script)
- Example optimization of number and size of
repeaters for global wires - Use command-line interface and transfer through
files - Allow complex semantics of a rule
- Example placers, IPEM executable Cong, UCLA)
- Code rules
- Implemented in C and linked into the inference
engine - Useful if execution speed is an issue
82Engine
- Contains no domain-specific knowledge
- Evaluates rules in topological order
- Performs studies (multiple evaluations
tradeoffs/sweeping, optimization)
83Knowledge Representation
- Rules and parameters are specified separately
from the derivation engine - Human-readable ASCII grammar
- Benefits
- Easy creation/sharing of parameters/rules by
multiple users - D. Sylvester and C. Cao device and power, SOI
modules that drop in to GTX - P.K. Nag Yield modeling
- Extensible to models of arbitrary complexity
(specialized prediction methods, technology data
sets, optimization engines) - Avant! Apollo or Cadence SE PR tool just
another wirelength estimator - Applies to any domain of work in semiconductors,
VLSI CAD - Transistor sizing, single wire optimizations,
system-level wiring predictions,
84Corking Effect in CLIP
- CLIP begins by placing all moves into the 0-gain
buckets - CLIP chooses moves by cumulative delta gain
(updated gain) - initially, every move has cumulative delta gain
0 - Historical legacy (and for speed) FM
partitioners typically look only at the first
move in a bucket - if it is illegal, skip the rest of the bucket
(possibly skip all buckets for that partition) - If the move at the head of each bucket at the
beginning of a CLIP pass is illegal, pass
terminates without making any moves - even if first move is legal, an illegal move soon
afterward will cork - New test cases (IBM) have large cells
- large cells have large degree, and often large
initial gain - CLIP inventor couldnt understand bad performance
on IBM cases
85Tuning to Uncork CLIP
- Dont place nodes with area gt balance constraint
in gain container at pass initialization - actually useful for all FM variants
- zero CPU overhead
- Look beyond the first move in a bucket
- extremely expensive
- hurts quality (partitioner doesnt operate well
near balance tolerance - not worth it, in our experience
- Simply do a LIFO pass before starting CLIP
- spreads out nodes in gain buckets
- reduces likelihood that large node has largest
total gain
86Effect of Fixed Terminals
Normalized Cost for IBM01
Runtime for IBM01
87Enabling Reuse Free
Composability
88Conflict in Cell (Macro) Based Layouts
- Consider connected components of conflict graphs
within each cell master - each component independently phase-assignable (2k
versions) - each is a single vertex in coarse-grain
conflict graph - problem assure free composability (reusability)
of cell masters, such that no odd cycles can
arise in coarse-grain conflict graph
cell master A
cell master B
connected component
edge in coarse-grain conflict graph
89Case I Creating CAD IP of Questionable Value
- Recent hypergraph partitioning papers report FM
implementations 20x worse than leading-edge FM - previous lack of openness caused wrong
conclusions, wasted effort - some improvements may only apply to weak
implementations - duplicated effort re-implementing (incorrectly?)
well-known algorithms - difficult to find the leading edge
- no standard comparison methodology
- how do you know if an implementation is poor?
- To make leading-edge apparent and reproducible
- publish performance results on standard
benchmarks - peer review (executables, source code?)
- similar to common publication standards !
90Case II Roadblocks to Creating Needed CAD-IP
- Best approach to global placement?
- recursive bisection (1970s)
- force-directed (1980s)
- simulated annealing (1980s)
- analytical (1990s)
- hybrids, others
- Why is this question difficult?
- lastest public placement benchmarks are from
1980s - data formats are bulky (hard to mix and match
components) - no public implementations since early 1990s
- new ideas are not compared to old
- To match approaches to new contexts
- agree on common up-to-date data model
- publish good format descriptions, benchmarks,
performance results - publish implementations
91Case III Developing CAD-IP for Obsolete Contexts
- Global placement example
- much of academia studies variable-die placement
- row length and spacing not fixed
- explicit feedthroughs
- majority of industrial use is fixed-die
- pre-defined layout dimensions
- HPWL-driven vs. routability- or timing-driven
- runtimes are often not even reported
- this affects benchmarks and algorithms
- Solution perform sanity checks and request
feedback - explicitly define use model and QOR measures
- establish a repository for up-to-date formats,
benchmarks etc. - peer review (executables, source code?)
92Implicit Decision Effects IBM02
93Reference Implementations
- Documentation does not allow replication of
results - amazingly, true even for "classic" algorithms
- true for vendor RD, true for academic RD
- Published reference implementations will raise
quality - minimum standard for algorithm implementation
quality - reduce barrier to entry for new RD
94Conclusions
- Work with mature heuristics requires mature
methodologies - Identified research methodology risks
- Identified reporting methodology risks
- Community needs to adopt standards for both
- reference benchmark implementations
- vigilant awareness of use-model and context
- reporting method that facilitates comparison
95Application-Driven Research
- Well-studied areas have complex, "tuned"
metaheuristics - Risks of poor research methodologies
- irreproducible results or descriptions
- no enabling account of key insights underlying
the contribution - experimental evidence not useful to others
- inconsistent with driving use model
- missing comparisons with leading-edge approaches
- Lets look at some requirements this induces...
96The GSRC Bookshelf for CAD-IP
- Bookshelf consists of slots
- slots represent active research areas with
enough customers - collectively, the slots cover the field
- Who maintains slots?
- experts in each topic collaborate to produce them
- anyone can submit - Currently, 10 active slots
- SAT (U. Michigan, Sakallah)
- Graph Coloring (UCLA, Potkonjak)
- Hypergraph Partitioning (UCLA, Kahng)
- Block Packing (UCSC, Dai)
- Placement (UCLA, Kahng)
- Global Routing (SUNY Binghamton, Madden)
- Single Interconnect Tree Synthesis (UIC, Lillis
and UCLA, Cong) - Commitments for more BDDs, NLP, Test and
Verification
97Whats in a Slot?
- Introduction
- why this area is important and recent progress
- pointers to other resources (links, publications)
- Data formats used for benchmarks
- SAT, graph formats etc.
- new XML-based formats
- Benchmarks, solutions, performance results
- including experimental methodology (e.g.,
runtime-quality Pareto curve) - Binary utilities
- format converters, instance generators, solution
evaluators, legality checkers - optimizers and solvers
- executables
- Implementation source code
- Other info relevant to algorithm research and
implementations - detailed algorithm descriptions
- algorithm comparisons
98Current Progress on the CAD-IP Bookshelf
- Bookshelf_at_gigascale.org
- 33 members (17 developers)
- Main policies and mechanisms published
- 10 active slots
- inc. executables, performance results for
leading-edge partitioners, placers - First Bookshelf Workshop, Nov. 1999
- attendance UCSC, UCB, NWU, UIC, SUNY
Binghamton, UCLA - agreed on abstract syntax and semantics for
initial slots - committed to XML for common data formats
- peer review of slot webpages
- Ongoing research uses components in the Bookshelf