Title: Parallel Optimization Tools for High Performance Design of Integrated Circuits
1Parallel Optimization Tools for High Performance
Design of Integrated Circuits
- Azadeh Davoodi
- Assistant Professor
- (joint work with my student Tai-Hsuan Wu)
- Department of Electrical and Computer Engineering
WISCAD VLSI Design Automation Lab
http//wiscad.ece.wisc.edu
Thanks to Jeff Linderoth
2Research Optimality in IC Design
- Optimality
- required to assess the quality of existing design
techniques - currently use heuristics to solve large-scale,
non-linear and discrete optimization problems - have no idea how far might
be from the optimal
solution -
Optimality matters to shorten the design cycle
of Integrated Circuits and meet stringent
time-to-market requirements.
Source MIPS Technologies
3Optimization for High Performance Design
j
dj
Tcons
- Discrete optimization problem
- Typically the relaxed continuous version is
solved as a convex program and the result is
discretized
4Examples of Optimization Complexity
Bench of Variables Exhaustive Search Size Reduced Search Size Level in Search Tree
c5315 705 gt E230 E10 35.11
c7552 822 gt E230 E08 26.93
c6288 1256 gt E230 E11 33.98
s1488 307 E230 E11 32.19
s1494 309 E227 E09 30.23
s9234 740 gt E230 E07 18.77
s5378 930 gt E230 E09 29.39
s38584 6950 gt E230 E09 47.94
s35932 7260 gt E230 E10 59.17
b20 24484 gt E230 E12 68.34
Azadeh Davoodi--WISCAD
5Using Master-Worker Framework of Condor for Grid
Optimization
http//www.cs.wisc.edu/condor/mw
Master
- C APIs which facilitate
- dynamic and opportunistic resource utilization
- fault tolerant implementation via checkpointing
and job migration
Unprocessed Tasks
Finished Tasks
T1
T2
T3
T4
T5
T6
T7
T8
T9
Tasks in process
Azadeh Davoodi--WISCAD
6Master-Worker Implementation for High Performance
IC Design
- Master
- imposes variable ordering in the branch-and-bound
search tree - applies pruning of sub-optimal branches
- check points after every 5000 completed tasks by
workers
- Worker
- each worker computes upper and lower bounds for K
number of nodes in the search tree sequentially
and communicates the bounds to the Master
7Dealing with Communication Overhead
- 3 types of data exchange between the Master and
each Worker - scalar upper and lower bounds
- circuit information (optimization problem
description) - partial variable assignment
- Send above only once when the worker is allocated
and reuse each worker for future tasks as much as
possible
8MW Implementation in Condor
- MASTER SUBMIT FILE
- Universe Scheduler
- Executable master_DGS_socket
- Image_Size 100000
- MemoryRequirements 100
- Input in_master.socket
- Output out_master.socket
- Error out_worker.socket
- Log _DGS.log
- Requirements (Arch "INTEL"
OPSYS"LINUX") - getenv True
- Queue
- WORKER SUBMIT FILE
- Universe Vanilla
- Worker 1Executable exec0.(Opsys).(Arch).ex
e arguments 0 8997 8997 144.92.240.35 - Log log_file
- Output output_file.0
- Error error_file.0
- Requirements ( Arch"INTEL
OPSYS"LINUX") - should_transfer_files Yes when_to_transfer_outp
ut ON_EXIT - rank Mips
- on_exit_remove false
- Queue
- Worker 2
- Resource Information
- 179 CAE machines Intel/Linux
- If all CAE are in use, Flocks to the queue of
Intel/Linux machines in CS
Azadeh Davoodi--WISCAD
9Results
On-an-average each variable had 4.5 discrete
options to choose from.
Bench variables Runtime Max Workers Average Workers
c5315 705 36min 118 105.74
c6288 1256 66min 126 114.94
c7552 822 31min 113 101.97
s5378 930 39min 129 95.57
s9234 740 52min 119 95.8
s15850 617 48min 139 112.67
s35932 7260 36min 163 108.15
s38584 6950 62min 133 113.86
b18 47191 52hours 192 189.82
b20 15699 28hours 187 167.29
b22 24484 38hours 190 173.73
Azadeh Davoodi--WISCAD
10Future Plans
- Install and work with personalized Condor
- Work with larger circuits and more number of
sites in addition to CAE and CS - Study possibilities for optimization on a grid of
multi-core machines - Better understand and work around the priority
scheduling of jobs at Condor ?
Azadeh Davoodi--WISCAD