Title: Large-Scale Optimization in VLSI CAD
1Large-Scale Optimizationin VLSI CAD
- Igor Markov
- http//www.eecs.umich.edu/imarkov
2Goals/Outline of the Talk
- Give a general idea about the field
- success stories and applications
- potential for cross-pollination
- What drives the field
- Reusable Intellectual Property in CAD
- Consequences of large-scale
- Sample wide-open problems
3General (VLSI CAD)
- Very Large System Integration
- numerous components interconnect
- emergent properties
- not apparent in isolated components
- Computer-Aided Design
- better than human design (super-human!)
- and then some
FOR MORE INFO...
http//www.eecs.umich.edu/imarkov/EECS527
4Integrated Circuits
- Excellent examples of large systems
- manufacturing is enormously expensive
- research can prevent blunders and pays off
- two Moores laws keep everyone busy
- circuits are growing
- circuit design is getting harder
- decreased market windows
- must design quickly (or else)
- digital circuits amenable to auto- manipulation
- have a lot of regularity (easier to represent)
5Just How Large?
- As large as we can handle
- a priori (physical) limitsare at least 20 years
away - pushing the boundaries is our goal
- Current limits
- need to solve many NP-hard problems
- poor understanding, mathematical models
- lack of efficient algorithms
- (typical problem sizes will follow)
6Design via Optimization
- Think of all possible design solutions
- solution space
- need to choose one solution (or several)
- What parameters should be optimized?
- objective functions f1(x), f2(x),
- Need to observe design constraints
- The EDA revolution of the 1980s
- searching, combinatorial and mathematical
optimization may outperform engineeringintuition
when implemented in software
7A Meta-Approach to Optimization
- Global Optimization
- often cannot optimize accurate objectives
- they can be hopeless to evaluate
- e.g., min routed wirelength as f(placement)
- find simpler objectives that correlate well
- ditto for constraints
- Detailed Optimization
- improve global solutions by local search
- can now worry about weird constraints
- can optimize a better measure of signal delay, etc
8Consequences of Large-Scale
- Runtimes must scale near-linearly
- strict limitation on used primitives(e.g., no
Gaussian elimination) - wide-spread use of multi-level methods
- Same goes for memory consumption
- cannot represent graphs as dense matrices
- use random sampling/walks instead of enumeration
- Trading solution quality for runtime
- especially for randomized algorithms
9Historic Opportunism
- In early days of VLSI CAD
- the Electronic Design Automation revolution
- enabling, but short-lived results (can easily do
better) - e.g., this new algorithm addresses objective
f(x) - many proposed approaches never picked up
- As ICs became larger, most CAD toolscould not
handle leading-edge circuits - algorithms for Deep SubMicron circuits
- soon turned out that many algos were weak
- partitioning, placement, SAT, etc.
10Competitiveness
- Outdated algorithms cause costly software
rewrites and lost opportunity - commercial tools may sell for 400,000
- Learning circuit physics, optics, semiconductor
technologies, applied math, CS theory, AI,
databases, proper software design, etc is well
worth the effort - competitive edge
- As a result of competitiveness, VLSI CAD offers
- some of the best algorithms, very strong
implementations - frequent contributions to other fields
11Success Stories
- Min-cut hyper- graph partitioning
- (very good solutions)
- 200K 0/1 variables, 1-2 mins of CPU time
- Minimal Steiner trees (optimal)
- hundreds of points in 1 second
- Provably good routing (approximation)
- 500K nets in several hours (!!!)
12Min-cut Partitioning
- Given
- hyper- graph
- k bins
- each accommodates up to N vertices
- Seek
- to assign each vertex to a bin
- Minimize
- of hyper- edges between bins
13Min-cut Partitioning (contd)
- Numerous apps in VLSI CAD beyond
- supercomputing, data mining, Internet,
- Progress in partitioning algorithms
- started in 1972 and still going
- many approaches invented / discarded
- now can auto-partition 1M-gate circuits
- better than manually, with free software
- couldnt, even commercially, just 3 years ago
- (this has nothing to do with Deep SubMicron)
14Min-cut Partitioning (contd)
- UCLA MLPart (ASPDAC 2000)
- faster than hMetis per start
- returns better solutions on average
- never worse than 5 off from hMetis
- sometimes (ibm06,2aa) 30 better
- available in source code (C) and binaries
- at the bookshelf, free for any use w/o
notification - Used at Cadence, Intel, start-ups
- Vital to UCLA Capo placer
15Steiner Minimal Trees
- Given
- k points in the plane
- Seek
- a Steiner tree connecting the points
- add extra points
- connect all points by straight-line segments
- Minimize
- total edge-length of the tree
16Steiner Minimal Trees (contd)
- Applications
- routing signal nets
- connecting cities by highways
- 1989, Scientific American
- cannot find an SMT for 100 US cities
- 1999, SODA (Warme/Zachariasen)
- with GeoSteiner can do that in lt1 sec
- implementation available in source code
17Routing of Multiple Nets
- Given
- n-tuples of locations to be connected
- with Steiner trees (think of signal nets)
- Constraints (not trivial to satisfy!)
- routes cannot occupy same space
- Minimize
- total length of routes, congestion
18Routing Of Multiple Nets
- One of the first circuit design automations (late
1960s) - Has enormous solution space
- A classic AI problem
- Current commercial tools (e.g., Cadence)
- up to a day for 500K nets, no guarantees
- ISPD 2000, Albrecht (using multi-commodity flows)
- 500K nets in several hours, within 20 of opt.
- (IBM Power 3 chip)
19What Makes a Break-through?(or at least a splash)
- Study sample splashes
- Is it enough to minimize a function? (function -
relevant, minimization - efficient) - Yes
- Yes, but
- No
- Absolutely not
20Background VLSI Placement
- bad placement good placement
21Global WL-driven Placement
- Objective
- total Half-Perimeter WireLength
- approximates Steiner Minimal Tree
- UCLA Capo placer (DAC 2000)
- beats Cadence QPlace on many benchmarks
- lt50k gates unpublished 30 better on a 280K
gate bm. - compared by routed WL after Cadence WarpRoute
- in congestion-driven mode 1 routing violation
failure - used for research at IBM, Intel, Phillips CMU,
- available in source code (C), free for any use
- (timing-driven mode not yet released)
22Background Detailed Placement
- Detailed circuit placement
- given locations of circuit elements
(cells),improve them by local changes (e.g.,
swaps) - minimize total length of signal nets
- Local, but large-scale problem
- entails a very large number of small sub-problems
- Practically important
- local improvements directly translate to large
scale - very similar to floorplanning (a high-level
problem)
23Background Detailed Placement
- Naïve detailed optimization
- consider 7-8 cells at a time
- enumerate all permutations
- compute HPWL for each
- pick the best permutation
- repeat for another group of 7-8
- Greater groups ? better solutions
- practical limit 0.01sec per group
- Use Branch-and-bound for each group (ISPD 99)
- Overall linear runtime
- Easy parallelization (optimize many groups in )
24Optimal Interleaving
- ICCAD 2000, Hur and Lillis (TR available)
A
B
C
D
E
1
2
3
4
5
Optimally in O(n2) time by Dynamic Programming
A
1
2
B
C
3
4
D
5
E
- Can handle 30 elements at a time
- easier to implement than BB
- the order constraint turns out very mild
- Very good result
- but, seemingly, nothing more than min f(x) !
25Popularity Comparison w GeoSteiner
- The Hur/Lillis algorithm
- appeared several months ago (on paper)
- already implemented by several groups
- with great results
- but Warmes GeoSteiner
- is barely used
- source code published 2 years ago
- instead, used are simple heuristics that are
slower - Difference ease of reuse!
- of result itself and/or of its representation
26Intellectual Property in CAD
- Reuse?
- today hundreds of VLSI CAD engineersare
implementing the same, known, but difficult
algorithms - Breakthroughs typically producevalidated and
reusable intellectual property - yet another algorithm to min f(x) does not
automatically qualify for validated, reusable CAD
IP - applicability, generality, quality of
description, etc. - CAD IP is not just algorithms and code
- CAD IP benchmarks, evaluation techniques,
empirical studies/results, algorithm analyses,etc - Studies of CAD IP suggest
- to effectively reuse, need infrastructure
27Intellectual Property in CAD
- GRSC Bookshelf for Fundamental Algorithms in
CAD - a repository for reusable CAD IP, a publication
medium - a way to communicate with industry
- problem formulations are also considered CAD IP
- http//vlsicad.cs.ucla.edu/GSRC/bookshelf
- Existing bookshelf slots include
- SAT, Graph Coloring, Hypergraph Partitioning,
Mathematical Optimization, Circuit Placement,
Clock Tree Routing, Global Routing, Interconnect
Optimization, etc - Leading-edge implementations (free for all uses)
- UCLA Physical Design Tools (graph partitioners,
placers,etc) - many more (SAT solvers from U. Michigan,
GeoSteiner, etc)
28Reuse and Education
- Both are necessary to sustain Moores laws
- not enough designers to implement new chips
- not enough CAD engineers to automate design
- Need to teach/study reusable design
- hardware, software/CAD IP (similar? different?)
- note typical promising research demos not
reusable - Design of reusable software
- theory has been available for years (processes,
code metrics, interface languages, modeling,
robust public-domain tools, etc) - need more infrastructure, practice, experience
of reuse - first reuse software
- then design reusable software
29Research Directions (1)
- Citius, Altius, Fortius
- faster, leaner implementations
- higher-quality solutions
- stronger impact on applications
- aid available latest advances in CS theory,
Mathematics, AI, software engineering, etc - Large-scale computing aspects of VLSI CAD
- memory locality (big deal for irregular circuits)
- memory-less algorithms (and trade-offs)
30Research Directions (2)
- Quantified suboptimality of heuristics
- (for NP-hard problems)
- how close can we get to optima in practice?
- estimate suboptimality of specific solutions
- study dependence on input distributions
- related to CS theory / approximation algos
- example detection of symmetries in Logic
Synthesis - Kravets/Sakallah, ICCAD 2000 and TR
- Lower bounds and impossibility arguments for
fundamental algorithms
31Research Directions (3)
- Using better, but still computable, models of
reality - simulation as a driver for optimization
- modeling semiconductor effects
- Alpert et al, ISPD 2000 --- a new interconnect
delay model, better than Elmore delay all
optimizations assuming Elmore are open to
porting - inductance, noise, etc
- effects of statistical variations
- CAD for new types of semi technologies and styles
- subwavelength lithography (optical proximity
correction, etc) - System-On-Chip (high-level partitioning, etc)
- CAD for analog circuits (including RF, MW)
32Research Directions (4)
- Self-conscious optimization tools
- prediction and estimation
- of solution quality before optimization
- SLIP 2001 - http//www.ee.pdx.edu/slip
- GTX - http//www.gigascale.org/gtx
- calibration (which solutions/tools are good?)
- Support for intelligent/expert users
- computer-aided does not always mean w/o
people - efficient visualization, diagnostics and
interactivity - how do you visualize a partitioning solution?
- how do you visualize many unrouted 2-pin nets in
same row?
33Conclusions
- Large-scale optimization in VLSI CAD
- dynamic and challenging field
- benefits from other fields and gives back
- IP reuse is paramount
- research is respected and economically justified
- opportunities available
FOR MORE INFO...
http//www.eecs.umich.edu/imarkov/