Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu - PowerPoint PPT Presentation

About This Presentation
Title:

Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu

Description:

Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD ... Provable approximations. Technology extrapolation ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 99
Provided by: Andre524
Learn more at: https://vlsicad.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu


1
Valuation and Values in Application-Driven
Algorithmics Case Studies from VLSI CAD
Andrew B. Kahng, UCLA Computer Science Dept.
June 2, 2000abk_at_cs.ucla.edu,
http//vlsicad.cs.ucla.edu
2
My Research
  • Applied algorithmics
  • demonstrably useful solutions for real problems
  • best known solutions
  • classic (well-studied) Steiner, partition,
    placement, TSP,...
  • toolkits discrete algorithms, global
    optimization, mathematical programming,
    approximation frameworks, new-age metaheuristics,
    engineering
  • Ground truths
  • anatomies
  • limits

3
Anatomies
  • Technologies
  • semiconductor process roadmap, design-manufacturin
    g I/F
  • design technology methodology, flows, design
    process
  • interconnect modeling/analysis delay/noise
    est, compact models
  • Problems
  • structural theory of large-scale global
    optimizations
  • Heuristics
  • hypergraph partitioning and clustering
  • wirelength- and timing-driven placement
  • single/multiple topology synthesis (length,
    delay, skew, buffering,...)
  • TSP, ..., IP protection, ..., combinatorial
    exchange/auction, ...
  • Cultures
  • contexts and infrastructure for research and
    technology transfer

4
Bounds
  • Exact methods
  • Provable approximations
  • Technology extrapolation
  • achievable envelope of system implementation
    w.r.t. cost, speed, power, reliability, ...
  • ideally, should drive and be driven by system
    architectures, design and implementation
    methodologies

5
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse

6
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse

7
Technology Extrapolation
What is the most power-efficient noise management
strategy?
  • Evaluates impact of
  • design technology
  • process technology
  • Evaluates impact on
  • achievable design
  • associated design problems
  • What matters, when ?
  • Sets new requirements for CAD tools and
    methodologies, capital and RD investment, ...
    right tech at the right time
  • Roadmaps (SIA ITRS) familiar and influential
    example

How and when do L, SOI, SER, etc. matter?
Will layout tools need to perform process
simulation to effectively address cross-die and
cross-wafer manufacturing variation?
8
GTX GSRC Technology Extrapolation System
  • GTX is a framework for technology extrapolation

9
Graphical User Interface (GUI)
  • Provides user interaction
  • Visualization (plotting, printing, saving to
    file)
  • 4 views
  • Parameters
  • Rules
  • Rule chain
  • Values in chain

10
GTX Open, Living Roadmap
  • Openness in grammar, parameters and rules
  • easy sharing of data, models in research
    environment
  • contributions of best known models from anywhere
  • Allows development of proprietary models
  • separation between supplied (shared) and
    user-defined parameters / rules
  • usability behind firewalls
  • functionality for sharing results instead of data
  • Multi-platform (SUN Solaris, Windows, Linux)
  • http//vlsicad.cs.ucla.edu/GSRC/GTX/

11
GTX Activity
  • Models implemented
  • Cycle-time models of SUSPENS (with extension by
    Takahashi), BACPAC (Sylvester, Berkeley), Fisher
    (ITRS)
  • Currently adding
  • GENESYS (with help from Georgia Tech)
  • RIPE (with help from RPI)
  • New device and power modules (Synopsys /
    Berkeley)
  • New SOI device model (Synopsys / Berkeley)
  • Inductance extraction (Silicon Graphics /
    Berkeley / Synopsys)
  • Studies performed in GTX
  • Modeling and parameter sensitivity analyses
  • Design optimization studies global
    interconnects, layer stack
  • Routability estimation, via impact models, ...

12
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse

13
Subwavelength Optical Lithography
Subwavelength Gap since .35 ?m
  • EUV, X-rays, E-beams all gt 10 years out
  • huge investment in gt 30 years of optical litho
    infrastructure

14
Mask Types
  • Bright Field
  • opaque features
  • transparent background
  • Dark Field
  • transparent features
  • opaque background

15
Phase Shifting Masks
16
Impact of PSM
  • PSM enables smaller transistor gate lengths Leff
  • critical polysilicon features only (gate Leff)
  • faster device switching faster circuits
  • better critical dimension (CD) control
    improved parametric yield
  • all features on polysilicon layer, local
    interconnect layers
  • smaller die area more /wafer (full-chip
    PSM BIG win)
  • Alternative build a 10B fab with equipment
    that wont exist for 5 years
  • Data points
  • exponential increase in price of CAD technology
    for PSM
  • Numerical Technologies market cap 3x that of
    Avant!
  • 25 nm gates (!!!) manufactured with 248nm DUV
    steppers (NTI MIT Lincoln Labs, announced 2
    days ago) 90nm gates in production at Motorola,
    Lucent (since late 1999)

17
Double-Exposure Bright-Field PSM
0


180
180
18
The Phase Assignment Problem
  • Assign 0, 180 phase regions such that critical
    features with width (separation) lt B are induced
    by adjacent phase regions with opposite phases
  • Bright Field
    (Dark Field)

180
0
180
0
19
Key Global 2-Colorability
  • If there is an odd cycle of phase implications
    layout cannot be
    manufactured
  • layout verification becomes a global, not local,
    issue

?
180
0
180
0
180
180
20
Critical features F1,F2,F3,F4
F2
F4
F1
F3
21
F2
F4
F1
F3
Opposite-Phase Shifters (0,180)
22
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Shifters S1-S8
  • PROPER Phase Assignment
  • Opposite phases for opposite shifters
  • Same phase for overlapping shifters

23
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
Proper Phase Assignment is IMPOSSIBLE
24
Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Phase Conflict
feature shifting to remove overlap
25
Phase Conflict Resolution
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
Phase Conflict
feature widening to turn conflict into
non-conflict
26
How will VLSI CAD deal with PSM ?
  • UCLA first comprehensive methodology for
    PSM-aware layout design
  • currently being integrated by Cadence, Numerical
    Technologies
  • Approach partition responsibility for
    phase-assignability
  • good layout practices (local geometry)
  • (open) problem is there a set of design rules
    that guarantees phase-assignability of layout ?
    (no Ts, no doglegs, even fingers...)
  • automatic phase conflict resolution /
    bipartization (global colorability)
  • enabling reuse of layout (free composability)
  • problem how can we guarantee reusability of
    phase-assigned layouts, such that no odd cycles
    can occur when the layouts are composed together
    in a larger layout ?

27
Automatic Conflict Resolution
28
Compaction-Oriented Approach
  • Analyze input layout
  • Find min-cost set of perturbations needed to
    eliminate all odd cycles
  • Induce constraints for output layout
  • i.e., PSM-induced (shape, spacing) constraints
  • Compact to get phase-assignable layout
  • Key Minimize the set of new constraints,
    i.e., break all odd cycles in conflict graph by
    deleting a minimum number of edges.

29
Conflict Graph
  • Dark Field build graph over feature regions
  • edge between two features whose separation is lt B
  • Bright Field build graph over shifter regions
  • shifters for features whose width is lt B
  • two edge types
  • adjacency edge between overlapping phase regions
    endpoints must have same phase
  • conflict edge between shifters on opposite side
    of critical feature endpoints must have
    opposite phase

30
Conflict Graph G
  • Dark Field

green feature pink conflict
conflict graph G
Bright Field
conflict edge
conflict graph G
adjacency edge
31
Optimal Odd Cycle Elimination
dark green feature pink conflict
conflict graph G
dual graph D
T-join of odd-degree nodes in D
32
Optimal Odd Cycle Elimination
- assign phases dark green and purple -
remaining pink conflicts correctly handled
dark green feature pink conflict
corresponds to broken edges in original conflict
graph
T-join of odd-degree nodes in D
33
The T-join Problem
  • How to delete minimum-cost set of edges from
    conflict graph G to eliminate odd cycles?
  • Construct geometric dual graph D dual(G)
  • Find odd-degree vertices T in D
  • Solve the T-join problem in D
  • find min-weight edge set J in D such that
  • all T-vertices have odd degree
  • all other vertices have even degree
  • Solution J corresponds to desired min-cost edge
    set in conflict graph G

34
Solving T-join in Sparse Graphs
  • Reduction to matching
  • construct a complete graph T(G)
  • vertices T-vertices
  • edge costs shortest-path cost
  • find minimum-cost perfect matching
  • Typical example sparse (not always planar)
    graph
  • note that conflict graphs are sparse
  • vertices 1,000,000
  • edges ? 5 ? vertices
  • T-vertices ? 10 of vertices 100,000
  • Drawback finding APSP too slow, memory-consuming
  • vertices 100,000 edges in T(G)
    5,000,000,000

35
Solving T-join Reduction to Matching
  • Desirable properties of reduction to matching
  • exact (i.e., optimal)
  • not much memory (say, 2-3X more)
  • leads to very fast solution
  • Solution gadgets!
  • replace each edge/vertex with gadgets s.t.
  • matching all vertices in gadgeted graph
  • Û T-join in original graph

36
T-join Problem Reduction to Matching
  • replace each vertex with a chain of triangles
  • one more edge for T-vertices
  • in graph D m edges, n vertices, t T
  • in gadgeted graph 4m-2n-t vertices, 7m-5n-t
    edges
  • cost of red edges original dual edge costs
    cost of (black) edges in triangles 0

vertex Î T
vertex ? T
37
Example of Gadgeted Graph
Gadgeted graph
Dual Graph
black red edges min-cost perfect matching
38
Results
  • Runtimes in CPU seconds on Sun Ultra-10
  • Greedy breadth-first-search bicoloring
  • GW Goemans/Williamson95 heuristic
  • Cook/Rohe98 for perfect matching
  • Integration w/compactor saves 9 layout area
    vs. GW

39
F2
S3
S4
F4
S7
S8
S1
F1
S2
F3
S5
S6
Can distinguish between use of shifting, widening
DOFs
40
Black points - features Blue - shifter
overlap Red - extra nodes to distinguish
opposite shifters
Bipartization Problem delete min of nodes
(or edges) to make graph bipartite - blue
nodes shifting - red nodes widening
Bipartization by node deletion is
NP-hard (GW98 9/4-approx)
41
Summary
  • New fast, optimal algorithms for edge-deletion
    bipartization
  • Fast T-join using gadgets
  • applicable to any AltPSM phase conflict graphs
  • Approximate solution for node-deletion
    bipartization
  • Goemans-Williamson98 9/4-approximation
  • If node-deletion cost lt 1.5 edge deletion, GW is
    better than edge deletion
  • Comprehensive integration w/NTI, Cadence tools

42
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse

43
Applied Algorithmics RD
  • Heuristics for hard problems
  • Problems have practical context
  • Choices dominated by engineering tradeoffs
  • QOR vs. resource usage, accessibility,
    adoptability
  • How do you know/show that your approach is good?

44
Hypergraphs in VLSI CAD
  • Circuit netlist represented by hypergraph

45
Hypergraph Partitioning in VLSI
  • Variants
  • directed/undirected hypergraphs
  • weighted/unweighted vertices, edges
  • constraints, objectives,
  • Human-designed instances
  • Benchmarks
  • up to 4,000,000 vertices
  • sparse (vertex degree 4, hyperedge size 4)
  • small number of very large hyperedges
  • Efficiency, flexibility KL-FM style preferred

46
Context Top-Down VLSI Placement
etc
47
Context Top-Down Placement
  • Speed
  • 6,000 cells/minute to final detailed placement
  • partitioning used only in top-down global
    placement
  • implied partitioning runtime 1 second for
    25,000 cells, lt 30 seconds for 750,000 cells
  • Structure
  • tight balance constraint on total cell areas in
    partitions
  • widely varying cell areas
  • fixed terminals (pads, terminal propagation, etc.)

48
Fiduccia-Mattheyses (FM) Approach
  • Pass
  • start with all vertices free to move (unlocked)
  • label each possible move with immediate change in
    cost that it causes (gain)
  • iteratively select and execute a move with
    highest gain, lock the moving vertex (i.e.,
    cannot move again during the pass), and update
    affected gains
  • best solution seen during the pass is adopted as
    starting solution for next pass
  • FM
  • start with some initial solution
  • perform passes until a pass fails to improve
    solution quality

49
Cut During One Pass (Bipartitioning)
Cut
Moves
50
Multilevel Partitioning
Refinement
Clustering
51
Key Elements of FM
  • Three main operations
  • computation of initial gain values at beginning
    of pass
  • retrieval of the best-gain (feasible) move
  • update of all affected gain values after a move
    is made
  • Contribution of Fiduccia and Mattheyses
  • circuit hypergraphs are sparse
  • move gain is bounded between 2 , -2 max
    vertex degree
  • hash moves by gains (gain bucket structure)
  • each gain affected by a move is updated in
    constant time
  • linear time complexity per pass

52
Taxonomy of Algorithm and Implementation
Improvements
  • Modifications of the algorithm
  • Implicit decisions
  • Tuning that can change the result
  • Tuning that cannot change the result

53
Modifications of the Algorithm
  • Important changes to flow, new steps/features
  • lookahead tie-breaking
  • CLIP
  • instead of actual gain, maintain updated gain
    actual gain minus
    initial gain (at start of pass)
  • WHY ???
  • cut-line refinement
  • insert nodes into gain structure only if incident
    to cut nets
  • multiple unlocking

54
Modifications of the Algorithm
  • Important changes to flow, new steps/features
  • lookahead tie-breaking
  • CLIP
  • instead of actual gain, maintain updated gain
    actual gain minus
    initial gain
  • promotes clustered moves (similar to LIFO
    gain buckets)
  • cut-line refinement
  • insert nodes into gain structure only if incident
    to cut nets
  • multiple unlocking

55
Implicit Decisions
  • Tie-breaking in choosing highest gain bucket
  • Tie-breaking in where to attach new element in
    gain bucket
  • LIFO vs. FIFO vs. random ... (known issue HK 95)
  • Whether to update, or skip updating, when delta
    gain of a move is zero
  • Tie-breaking when selecting the best solution
    seen during pass
  • first encountered, last encountered,
    best-balance, ...

56
Tuning That Can Change the Result
  • Threshold large nets to reduce runtime
  • Skip gain update for large nets
  • Skip zero delta gain updates
  • changes resolution of hash collisions in gain
    container
  • Loose/stable net removal
  • perform gain updates for only selected nets
  • Allow illegal solutions during pass

57
Tuning That Cant Change the Result
  • Skip updates for nets that cannot have
    non-zero delta gain
  • netcut-specific optimizations
  • 2-way specific optimizations
  • optimizations for nets of small degree
  • .....
  • ... 30 years since KL70, 18 years since FM82,
    100s of papers in literature

58
Zero Delta Gain Update
  • When vertex x is moved, gains for all vertices y
    on nets incident to x must potentially be updated
  • In all FM implementations, this is done by going
    through incident nets one at a time, computing
    changes in gain for vertices y on these nets
  • Implicit decision
  • reinsert a vertex y when it experiences a zero
    delta gain move (will shift position of y within
    the same gain bucket)
  • skip the gain update (leave position of y
    unchanged)

59
Tie-Breaking Between Highest-Gain Buckets
  • Gain container typically implemented such that
    available moves are segregated, e.g., by source
    or destination partition
  • There can be more than one highest-gain bucket
  • When balance constraint is anything other than
    exact bisection, moves at multiple highest-gain
    buckets can be legal
  • Implicit decision
  • choose the move that is from the same partition
    as the last vertex moved (toward)
  • choose the move that is not from the same
    partition as the last vertex moved (away)
  • choose the move in partition 0 (part0)

60
How Much Can This Matter ?
  • 5 ?
  • 10 ?
  • 20 ?
  • more ?
  • 50 ?
  • more ?

61
Implicit Decision Effects IBM01
62
Effect of Implicit Decisions
  • Stunning average cutsize difference for flat
    partitioner with worst vs. best combination
  • far outweighs new improvements
  • One wrong decision can lead to misleading
    conclusions w.r.t. other decisions
  • part0 is worse than toward with zero delta
    gain updates
  • better or same without zero delta gain updates
  • Stronger optimization engines mask flaws
  • ML CLIP gt ML LIFO gt Flat CLIP gt Flat LIFO
  • less dynamic range ML masks bad flat
    implementation

63
Tuning Effects
  • Comparison of two CLIP-FM implementation
  • Min and Ave cutsizes from 100 single-start trials
  • Another quiz Why did this happen ?
  • N.B. original inventor of CLIP-FM couldnt
    figure it out

64
Tuning Effects
  • Comparison of two CLIP-FM implementation
  • Min and Ave cutsizes from 100 single-start trials
  • Another quiz Why did this happen ?
  • Hint some modern IBM benchmarks have large
    macro-cells

65
Sheer Nightmare Stuff...
  • Comparison of two LIFO-FM implementations
  • Min and Ave cut sizes from 100 single-start
    trials
  • Papers 1, 2 both published since mid-1998

66
In Case You Are Wondering...No, VLSI CAD
Researchers Are Not Stupid.
67
How Much Can This Matter ?
  • 5 ?
  • 10 ?
  • 20 ?
  • more ?
  • 50 ?
  • more ?
  • Answer 400 2000 w.r.t. recent
    literature and STANDARD, WELL-UNDERSTOOD
    heuristics
  • lots more N years leading partitioner,
    placer

68
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse

69
"Barriers to Entry for Researchers
  • Code development barrier
  • bare-bones self-contained partitioner 800 lines
  • not leading-edge (Dutt/Deng LIFO-FM)
  • modern partitioner requires much more code
  • Expertise barrier
  • very small details can have stunning impact
  • must not only know what to do, but also what not
    to do
  • impossible to estimate knowledge/expertise
    required to do research at leading edge
  • Need reference implementations !
  • reference prose (6 pp. 9pt double-column)
    insufficient

70
Barriers to Relevance for Researchers
  • All heuristic engines/algorithms tuned to test
    cases
  • Test case usage must capture real use models,
    driving applications
  • e.g., recall bipartitioning is driven by top-down
    placement
  • until CKM99 no one considered effect of fixed
    vertices !!!
  • Test case usage can be fatally flawed by
    details
  • hidden or previously unrealized
  • previously believed insignificant
  • results of algorithm research will be flawed as a
    result

71
Challenges for Applied Algorithmics
  • Research in mature areas can stall
  • incremental research - difficult and risky
  • implementations not available ? duplicated effort
  • too much trust ? which approach is really the
    best?
  • some results may not be replicable
  • not novel is common reason for paper rejection
  • exploratory research - paradoxically, lower-risk
  • novelty for the sake of novelty
  • yet, novel approaches must be well-substantiated
  • Pitfalls questionable value, roadblocks,
    obsolete contexts

72
Challenges for Applied Algorithmics
  • Difficult to be relevant (time-to-market, QOR
    issues)
  • time to market 5-7 year delay from publishing to
    first industrial use (cf. market lifetimes, tech
    extrapolation...)
  • quality of results unmeasurable, unpredictable,
    basically unknown
  • Good news barriers to entry and barriers to
    relevance are self-inflicted, and possibly
    curable
  • mature domains require mature RD methodologies
  • a possible solution cultivate flexibility and
    reuse
  • low cost update of previous work to support
    reuse
  • future tool/algorithm development biased towards
    reuse

73
Analogy Hardware Design Tool Design
  • Hardware design is difficult
  • complex electrical engineering and optimization
    problems
  • mistakes are costly
  • verification and test not trivial
  • few can afford to truly exploit the limits of
    technology
  • A Winning Approach Hardware IP reuse
  • CAD tools design is difficult
  • complex software engineering and optimization
    problems
  • mistakes can be showstoppers
  • verification and test not trivial
  • few can manage complexity of leading-edge
    approaches
  • A "Surprising Idea CAD-IP reuse

74
What is CAD-IP?
  • Data models and benchmarks
  • context descriptions and use models
  • testcases and good solutions
  • Algorithms and algorithm analyses
  • mathematical formulations
  • comparison and evaluation methodologies for
    algorithms
  • executables and source code of implementations
  • leading-edge performance results
  • Traditional (paper-based) publications

75
Bookshelf A Repository for CAD-IP
  • Community memory for CAD-IP
  • data models
  • algorithms
  • implementations
  • Publication medium that enables efficient applied
    algorithmics algorithm research
  • benchmarks, performance results
  • algorithm descriptions and analyses
  • quality implementations (e.g., open-source Capo,
    MLPart)
  • Simplified comparisons to identify best
    approaches
  • Easier for industry to communicate new use models

76
Summary Addressing Inefficiencies
  • Inefficiencies
  • lack of openness and standards ? huge duplication
    of effort
  • incomparable reporting ? improvement difficult
  • lack of standard comparison/latest use models ?
    best approach not clear
  • industry doesnt bother w/feedback ? outdated use
    models
  • Proposed solutions
  • widely available, up-to-date, extensible
    benchmarks
  • standardized performance reporting for
    leading-edge approaches
  • available detailed descriptions of algorithms
  • peer review of executables (and source code?)
  • credit for quality implementations
  • Better research, faster adoption, more impact
  • http//vlsicad.cs.ucla.edu/GSRC/bookshelf/

77
Todays Talk
  • Demonstrably useful solutions for real problems
  • Valuation What problems require attention ?
  • technology extrapolation
  • automatic layout of phase-shifting masks
  • Values How do we advance the leading edge ?
  • anatomy of FM-based hypergraph partitioning
    heuristics
  • culture change restoring time-to-market and QOR
    in applied algorithmics via IP reuse
  • Thank you for your attention !!!

78
Spare Slides
79
Parameters
  • Description of technology, circuit and design
    attributes
  • Importance of consistent naming cannot be
    overstated
  • Naming conventions for parameters
  • ltprepositiongt _ ltprincipalgt _ qualifier _
    ltplacegt _ ltqualifiergt _ ltadverbialgt _
    ltindexgt _ ltunitgt
  • Example r_int_tot_lyr_pu_dl
  • Benefits
  • Relatively easy to understand parameter from its
    name
  • Distinguishable (no two parameters should have
    the same name)
  • r_int (interconnect resistance) r_int
    (interconnect resistivity) ?
  • Unique (no two names for the same parameter)
  • R_int R_wire ?
  • Sortable (important literals come first)
  • Software to automatically check parameter naming

80
Rules
  • Methods to derive unknown parameters from known
    ones
  • ASCII rules
  • Laws of physics, models of electrical behavior
  • Statistical models (e.g., Rent's rule)
  • Include closed-form expressions, vector
    operations, tables
  • Storing of calibration data (e.g., technology
    files) for known process, design points in
    lookup tables
  • Constraints
  • Simulated by rules that compute boolean values
  • Used to limit range during sweeping
  • Optimization over a collection of rules
  • Example buffer insertion for minimal delay with
    area constraints

81
Rules (Cont.)
  • External executable rules
  • Assume a callable executable (e.g., PERL script)
  • Example optimization of number and size of
    repeaters for global wires
  • Use command-line interface and transfer through
    files
  • Allow complex semantics of a rule
  • Example placers, IPEM executable Cong, UCLA)
  • Code rules
  • Implemented in C and linked into the inference
    engine
  • Useful if execution speed is an issue

82
Engine
  • Contains no domain-specific knowledge
  • Evaluates rules in topological order
  • Performs studies (multiple evaluations
    tradeoffs/sweeping, optimization)

83
Knowledge Representation
  • Rules and parameters are specified separately
    from the derivation engine
  • Human-readable ASCII grammar
  • Benefits
  • Easy creation/sharing of parameters/rules by
    multiple users
  • D. Sylvester and C. Cao device and power, SOI
    modules that drop in to GTX
  • P.K. Nag Yield modeling
  • Extensible to models of arbitrary complexity
    (specialized prediction methods, technology data
    sets, optimization engines)
  • Avant! Apollo or Cadence SE PR tool just
    another wirelength estimator
  • Applies to any domain of work in semiconductors,
    VLSI CAD
  • Transistor sizing, single wire optimizations,
    system-level wiring predictions,

84
Corking Effect in CLIP
  • CLIP begins by placing all moves into the 0-gain
    buckets
  • CLIP chooses moves by cumulative delta gain
    (updated gain)
  • initially, every move has cumulative delta gain
    0
  • Historical legacy (and for speed) FM
    partitioners typically look only at the first
    move in a bucket
  • if it is illegal, skip the rest of the bucket
    (possibly skip all buckets for that partition)
  • If the move at the head of each bucket at the
    beginning of a CLIP pass is illegal, pass
    terminates without making any moves
  • even if first move is legal, an illegal move soon
    afterward will cork
  • New test cases (IBM) have large cells
  • large cells have large degree, and often large
    initial gain
  • CLIP inventor couldnt understand bad performance
    on IBM cases

85
Tuning to Uncork CLIP
  • Dont place nodes with area gt balance constraint
    in gain container at pass initialization
  • actually useful for all FM variants
  • zero CPU overhead
  • Look beyond the first move in a bucket
  • extremely expensive
  • hurts quality (partitioner doesnt operate well
    near balance tolerance
  • not worth it, in our experience
  • Simply do a LIFO pass before starting CLIP
  • spreads out nodes in gain buckets
  • reduces likelihood that large node has largest
    total gain

86
Effect of Fixed Terminals
Normalized Cost for IBM01
Runtime for IBM01
87
Enabling Reuse Free
Composability
88
Conflict in Cell (Macro) Based Layouts
  • Consider connected components of conflict graphs
    within each cell master
  • each component independently phase-assignable (2k
    versions)
  • each is a single vertex in coarse-grain
    conflict graph
  • problem assure free composability (reusability)
    of cell masters, such that no odd cycles can
    arise in coarse-grain conflict graph

cell master A
cell master B
connected component
edge in coarse-grain conflict graph
89
Case I Creating CAD IP of Questionable Value
  • Recent hypergraph partitioning papers report FM
    implementations 20x worse than leading-edge FM
  • previous lack of openness caused wrong
    conclusions, wasted effort
  • some improvements may only apply to weak
    implementations
  • duplicated effort re-implementing (incorrectly?)
    well-known algorithms
  • difficult to find the leading edge
  • no standard comparison methodology
  • how do you know if an implementation is poor?
  • To make leading-edge apparent and reproducible
  • publish performance results on standard
    benchmarks
  • peer review (executables, source code?)
  • similar to common publication standards !

90
Case II Roadblocks to Creating Needed CAD-IP
  • Best approach to global placement?
  • recursive bisection (1970s)
  • force-directed (1980s)
  • simulated annealing (1980s)
  • analytical (1990s)
  • hybrids, others
  • Why is this question difficult?
  • lastest public placement benchmarks are from
    1980s
  • data formats are bulky (hard to mix and match
    components)
  • no public implementations since early 1990s
  • new ideas are not compared to old
  • To match approaches to new contexts
  • agree on common up-to-date data model
  • publish good format descriptions, benchmarks,
    performance results
  • publish implementations

91
Case III Developing CAD-IP for Obsolete Contexts
  • Global placement example
  • much of academia studies variable-die placement
  • row length and spacing not fixed
  • explicit feedthroughs
  • majority of industrial use is fixed-die
  • pre-defined layout dimensions
  • HPWL-driven vs. routability- or timing-driven
  • runtimes are often not even reported
  • this affects benchmarks and algorithms
  • Solution perform sanity checks and request
    feedback
  • explicitly define use model and QOR measures
  • establish a repository for up-to-date formats,
    benchmarks etc.
  • peer review (executables, source code?)

92
Implicit Decision Effects IBM02
93
Reference Implementations
  • Documentation does not allow replication of
    results
  • amazingly, true even for "classic" algorithms
  • true for vendor RD, true for academic RD
  • Published reference implementations will raise
    quality
  • minimum standard for algorithm implementation
    quality
  • reduce barrier to entry for new RD

94
Conclusions
  • Work with mature heuristics requires mature
    methodologies
  • Identified research methodology risks
  • Identified reporting methodology risks
  • Community needs to adopt standards for both
  • reference benchmark implementations
  • vigilant awareness of use-model and context
  • reporting method that facilitates comparison

95
Application-Driven Research
  • Well-studied areas have complex, "tuned"
    metaheuristics
  • Risks of poor research methodologies
  • irreproducible results or descriptions
  • no enabling account of key insights underlying
    the contribution
  • experimental evidence not useful to others
  • inconsistent with driving use model
  • missing comparisons with leading-edge approaches
  • Lets look at some requirements this induces...

96
The GSRC Bookshelf for CAD-IP
  • Bookshelf consists of slots
  • slots represent active research areas with
    enough customers
  • collectively, the slots cover the field
  • Who maintains slots?
  • experts in each topic collaborate to produce them
    - anyone can submit
  • Currently, 10 active slots
  • SAT (U. Michigan, Sakallah)
  • Graph Coloring (UCLA, Potkonjak)
  • Hypergraph Partitioning (UCLA, Kahng)
  • Block Packing (UCSC, Dai)
  • Placement (UCLA, Kahng)
  • Global Routing (SUNY Binghamton, Madden)
  • Single Interconnect Tree Synthesis (UIC, Lillis
    and UCLA, Cong)
  • Commitments for more BDDs, NLP, Test and
    Verification

97
Whats in a Slot?
  • Introduction
  • why this area is important and recent progress
  • pointers to other resources (links, publications)
  • Data formats used for benchmarks
  • SAT, graph formats etc.
  • new XML-based formats
  • Benchmarks, solutions, performance results
  • including experimental methodology (e.g.,
    runtime-quality Pareto curve)
  • Binary utilities
  • format converters, instance generators, solution
    evaluators, legality checkers
  • optimizers and solvers
  • executables
  • Implementation source code
  • Other info relevant to algorithm research and
    implementations
  • detailed algorithm descriptions
  • algorithm comparisons

98
Current Progress on the CAD-IP Bookshelf
  • Bookshelf_at_gigascale.org
  • 33 members (17 developers)
  • Main policies and mechanisms published
  • 10 active slots
  • inc. executables, performance results for
    leading-edge partitioners, placers
  • First Bookshelf Workshop, Nov. 1999
  • attendance UCSC, UCB, NWU, UIC, SUNY
    Binghamton, UCLA
  • agreed on abstract syntax and semantics for
    initial slots
  • committed to XML for common data formats
  • peer review of slot webpages
  • Ongoing research uses components in the Bookshelf
Write a Comment
User Comments (0)
About PowerShow.com