Title: Design Optimization in CellBased Design Environment
1Design Optimization in Cell-Based Design
Environment
- Hidetoshi Onodera
- Department of Communications and Computer
Engineering - Kyoto University
2Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Overview Background and Objectives
- On-Demand Library Generation
- Design Experiment
- Post-Layout Optimization
- Cell-Based Data-Path Design with On-Demand
Library Generation
3Design Optimization with On-Demand Library
Generation
- Conventional Cell-Based Design
- Cell library is supplied by a fab. or library
vendor - Designer uses the library as a black box.
- Pros High reliability
- Cons Excessive safety margin
moderate performance - On-Demand Library Generation
- The design of library is a part of the total
design cycle - Library can be optimized according to the spec.,
process, circuit structure, etc. - Pros Performance enhancement
- Cons Need automatic library generation/characteri
zation
4Design Optimization with On-Demand Library
Generation Objective
Short TAT Design of High-Performance and
Cost-Effective SoCs
- Custom(Optimized) Design in Cell-Based(ASIC)
Design Environment - Transistor-level optimization in ASIC design
environment - Enhancement of IP Re-Usability
- Performance adjustability under various process
technologies - Short TAT Design by On-Demand Library Generation
- Management of UDSM Effects
- Interconnect delay, cross talk, performance
variability, etc.
5Design Optimization with On-Demand Library
Generation Overview
RTL
Performance Estimation
Logic Synth.
On-Demand Library Generation
Post-Layout Optimization
Layout Synth.
Delay, Power, Noise Optimization
Spec.
Circuit
Process
Optimized Library
- Tr. Sizing for delay/power/noise
- Based on detail-routed layout
ASIC/SoC
- Tr sizes
- Variety
- Strength
Tuning
6Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Overview Background and Objectives
- On-Demand Library Generation
- Design of library structure
- Cell layout generation
- Design Experiment
- Post-Layout Optimization
- Cell-Based Data-Path Design with On-Demand
Library Generation
7Design of Library Structure
- Effect of library-structure(varieties in logic
and strength) is experimentally examined. - Variety in logic Compact set is OK.
- Basic logics(nand, nor, xor)
- Simple complex logics(aoi, oai)
- Positive logics(and, ao, etc.)
- Driving strength Wide variety is necessary.
- Small and intermediate strength for power
reduction - Large strength for high speed
Cell Library with Variable Driving Strength
8Generation of Cell Layout with Variable Driving
Strength
Requirements
Symbolic layout
Real layout
- Process independence
- Dense layout
- Variable driving strength
- cell height
- Tr. width inside cell
- Coping with
- phase shift mask
- mismatch of Tr. and wire pitches
Applied to 0.35, 0.18, 0.13 um
9Example of Variable Driving Strength
Standard size
Half size
Fixed height
Adjustable Tr. width
Fixed pin locations
10Features of Symbolic Layout
- Hierarchically defined virtual grid for
- the adaptability of design rules
- the flexibility in transistor width
11Examples of Generated Layout
9-pitch, max width.
9-pitch, different width
11-pitch, max width
12Comparison with Fixed Cells Used for Mass
Production(0.18 mm)
Similar Performance
13Comparison with Fixed Cells Used for Mass
Production(0.18 mm)
Small Area Penalty
14Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Overview Background and Objectives
- On-Demand Library Generation
- Design Experiment
- Real Chip Example
DSP for moving Picture
Compression - 32-bit RISC Core
- Post-Layout Optimization
- Cell-Based Data-Path Design with On-Demand
Library Generation
15DSP for Moving Picture Compression
- DLX-based RISC Processor
- 10 bit x 16 parallel SIMD operation
- 15 k cells
- 0.35 mm 3 Metal Process
- 2 Cores with Different Libraries
- Fixed Library Process Specific Library
- On-Demand Library
16Result
Routing Resource Limited
Fixed Lib.
Core area 8 less
(Cell area 17 less)
4.9mm
Power Dissipation (Measured at 25MHz, 1.6V) 10
less (21 less with optimized Flip-Flops)
On-Demand Lib.
17Design Experiments
- 32 bit RISC Core
- 9 k Cells
- 0.35 mm 3 Metal Process
- Design Specifications(Clock Freq.)
- 100 MHz, 120 MHz, 130 MHz
- Libraries under Comparison
- Fixed Library Process Specific Library
- On-Demand Library
18Timing Closure
Timing failed
Timing failed
19Design Results with On-Demand Lib.
(a) 100 MHz
(b) 120 MHz
(c) 130 MHz
9-pitch cell
11-pitch cell
13-pitch cell
20Area-Delay Trade-off Characteristics
21Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Overview Background and Objectives
- On-Demand Library Generation
- Design Experiment
- Post-Layout Optimization
- Cell-Based Data-Path Design with On-Demand
Library Generation
22Post Layout Transistor Sizing for Power and
Crosstalk Reduction
Standard Size
After Optimization
Fixed Height
Width is tunable.
Pin Location is fixed.
Interconnect is preserved while tuning.
23Results of Power Optimization
- Constraints
- Minimum Delay
- Max Transition
- 0.5ns
- Noise Margin
- 0.25Vdd
- 0.35mm Process
60 Power Reduction
Power is evaluated by PowerMill.
24An Example of Power Reduction (des)
- Initial Circuit(x1,x2,x3,x4,,) 14.7mW
- Discrete Opt. (. x0.15, x0.5) 11.3mW(-23)
- Continuous Opt.
6.4mW(-56)
25Initial and Optimized Layouts
Initial
Optimized
26Peak Current Reduction
66 reduction
Flattened current
Reduction of IR-drop, di/dt noise,
electromigration Reliability is enhanced.
des, fastest, Max transition 0.5ns
27Results of Crosstalk Noise Reduction
Crosstalk noise is reduced while delay is kept
constant.
28Design Optimization with On-Demand Library
Generation
- Custom Design Quality in Cell-Based Design
Environment by - On-Demand Library Generation
- Post-Layout Transistor Sizing
- Large Power Savings
- Cross-Talk Noise Reduction
- PS Generated Libraries are used in Japanese MPC
service for academia (similar to MOSIS).
29Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Overview Background and Objectives
- On-Demand Library Generation
- Design Experiment
- Post-Layout Optimization
- Cell-Based Data-Path Design with On-Demand
Library Generation
30Data-Path Design in Cell-Based Design Environment
Data-Path Circuits
Regular flow of signals (Bit-slice layout)
Transistor-level Performance Opt.
Evaluate the impact of the above operations
31Three layout (placement) procedures
- Manual placement of cells and I/Os considering
regularity of signal flow. Automatic routing - Manual placement of I/Os considering regularity
of signal flow. Automatic placement and
routing. - Automatic placement and routing of I/Os and cells
Layout 3 circuits in the same area and compare
wire length and delay
32Test circuits
- Carry select adder(8 bit and 32 bit)
- 16-bit tree-style multiplier
- 0.35mm technology with three metal layers
carry select adder
multiplier
33Signal flow
4-2 adder
partial product
16-bit multiplier using 4-2 adder
34Signal flow
folded 16-bit multiplier using 4-2 adder
35Design Time
- Manual placement of I/Os and cells
- 8-bit carry select adder 3 hours
- 32-bit carry select adder 4 hours
- 16-bit tree-style multiplier 5 hours
36Design Results (Total Wire Length)
Total wire length decreased by Max. 63 Ave. 22
-8
-18
Not much difference in Manual and Semi-Auto
(Manual I/Os, Automatic cells)
-63
-20
-58
-15
37Design Results (Delay)
- Critical path delay evaluated by PathMill
-5
-5
Delay decreased by Max. 12 Ave. 4
-12
-11
-3
Not much difference in Manual and Semi-Auto
(Manual I/Os, Automatic cells)
-3
38Issues in Manual Cell Placement
- Longer design time
- Difficulty in achieving compact layout while
keeping regularity
Manually placed 16-bit multiplier Using 4-2 adder
39Area Reduction by Automatic Placement
- Automatic placement reduces dead space as far as
routability allows. - Proper placement of I/Os ensures little
degradation of performance
40Design Results (Total Wire Length)
- Core ratio, from 0.65 to 0.96
- Decrease in total wire length
Total
41Design Results (Delay)
- Constant delay around 7.6ns
Manual placement I/Os Automatic
placement Cells
42Data-Path Design in Cell-Based Design Environment
Data-Path Circuits
Regular flow of signals (Bit-slice layout)
Transistor-level Performance Opt.
Cell-Based Design Environment
Layout design with regular flow of signals
Transistor sizing inside leaf-cell
Evaluate the impact of the above operations
43Experiment Tr. sizing(1/2)
- Evaluate performance improvement by Tr. sizing
- Longest path delay
- Power dissipation
- Tr. Sizing strategy
- 1. Minimize longest path delay
- 2. Minimize sum of Tr. sizes within 3 delay
increase
44Experiment Tr. sizing(2/2)
- Limited implementation
- Non-linear optimizer pathmill
- Same logic cells have same structure.
- 52 variables in 32b-CSA
- (28Tr. in FA 12Tr. in MUX 12Tr. in MUX)
- 34 variables in 16b-multiplier
- (28Tr. in FA 6Tr. in AND2)
- Optimized 32b-CSA is imported into 16b-multiplier
as CPA.
45CPU Time
- 8-bit Carry Select Adder 4 hours
- 32-bit Carry Select Adder 7 hours
- 16-bit Tree-style multiplier 5 days
46Tr-Sizing Results (Delay)
Max. 20 Ave. 15 decrease
-14
-11
-20
-15
-19
-13
Evaluated by PathMill
47Tr-Sizing Results (Power)
Max. 58 Ave. 43 decrease
-38
-41
-49
-36
-58
-36
Evaluated by PowerMill with 100 random patters in
10 ns cycles
48Cell-Based Data-Path Design with On-Demand
Library Generation
- Experimentally evaluates effectiveness of
- Bit-slice layout that maintains regularity and
signal flow - Transistor sizing
- Automatic cell placement realizes equal quality
in arithmetic unit design - Transistor sizing is very effective both in delay
and power
49Design Optimization in Cell-Based Design
Environment
- Design Optimization with On-Demand Library
Generation - Custom Design Quality in Cell-Based Design
Environment by - On-Demand Library Generation
- Post-Layout Transistor Sizing
- Large Power Savings
- Cross-Talk Noise Reduction
- Cell-Based Data-Path Design with On-Demand
Library Generation - ManualI/Os, AutomaticCells works well
- Transistor sizing is effective both in delay and
power