Title: VDSM and Full-Custom Layout Design Issues(2)
1VDSM and Full-Custom Layout Design Issues(2)
2Contents
- Interconnect Modeling and Design
- IR drop and Ldi/dt analysis
- Full Custom Datapath Design
- Antenna Rules
3- Interconnect Modelling and Design
4Interconnect Scaling Delay
0.35
1.4
0.3
1.2
Ctotal
0.25
1.0
Resis.
0.2
0.8
Resistance per length (ohms/um)
Capacitance (fF/um)
0.15
0.6
Ccoup
0.1
0.4
0.05
0.2
Cover
0.25
0.18
0.13
0.1
0.35
Feature size (um)
Interconnect contribution to signal delay
5Interconnect Modeling From above-1um To
below-0.18um Generation Technology
Technology
gt 1um
1.0 0.5um
0.50.25um
0.18um gt
Resistance
Ignored
Lumped
Distributed
Capacitance
Area cap.
Area, 2D fringing
Lat. coupling dominant
2/3D fringing, Lat. coupling
Important
Inductance
Lumped C
Lumped RC
Distributed RC
RLC Trans-mission line
Circuit model
Process Characterization
Nominal modeling
Statistical modeling
6Extraction Backannotation Flow
Interconnect model library (process model ??)
calibrated technology file
layout file
field solver
Parasitic Extractor - rule-based -
library-based
backannotation
simulation verification
netlist
enhanced netlist
7Layout Design Methodology
RTL Verilog
Synopsys
floorplan
Interconnect -planning
Avant! Apollo(PR)
Saturn (buffer resizing)
- wire
- statistics
- ?(wire model)
slack gt2ns
slacklt2ns
- Interconnect-Planning
- Synchronization
- Clock skew
- power-ground noise
- I/O switching
- Time critical Bus Plan
- Repeater Plan
3D extraction
Static timing (SDF/DSPF)
LVS/DRC
8Interconnect Planning
Process Library
plan.x
Net-list
PR (Apollo)
Floorplan (fPlan)
Block Abstract P/G trunk, PAD (LEF,TDF,CMD)
Static timing
High speed IO model
IR-drop EM model
IO SSN model
Coupling Noise model
Clock Tree model
SPICE-level optimization
PR Guide
9Transistor/Interconnect Sizing
PO
PI
considering both transistor width(wi) and wire
width(Wi)
with fixed wire width
- Positive polynomial, posynomial
10Optimization based on Posynomiality
- Positive polynomial, posynomial
- Solution space is convex, therefore, local
optimum solution is globally optimal. - Geometric programming approach
- Based on Newtons method
- Too slow for large circuits
- Iterative approach
- Local refinement until no more improvement
- Tilos
- Fast engineering solution
11Table-Driven Iterative Sizing
- Given initial sizing wi (TR ) and Wi( Wire)
- Optimize wi (or Wi) along the critical path
- Determine wi (or Wi) with others fixed
- Repeat until no more improvement
- Delay evaluation
- Analytic linear model inexact
- Simulation-based very accurate but excessive
CPU time - Table-driven
- utilization of SPICE simulation
data(delay/slope/power) - accurate and fast
- Linear interpolation between data
- Iteration between transistor sizing and
interconnect sizing rather than solving for both
simultaneously.
12Ex. Repeater and Interconnect
OPT(buf1,buf2,buf3) 70/27,45/20,78/30 OPT(wire
1,wire2,wire3) w0.94,s1.26
13Ex. Time-critical bus design
2525 ps(23GD) 1. After initial TR/interconnect
sizing
1647 ps(15GD) 5. After segment buffer Trip point
tuning
ReadBuslt630gt
2000?m within cache datapath
GND
2306 ps(21GD) 2. After evaluation TR size
optimizing
AnyBusltigt_at_phi2
1537 ps(14GD) 6. After segment 3 Track assignment
1757 ps(16GD) 4. after segment Optimize (width,
space, And interleaving
1976 ps(18GD) 3. After segment 2 Shieldinf
and interleaving
8000mm 64-bit bus
- Careful interleaving of busses so that opposite
transition between neighboring nets do not occur
simultaneously - Functionally and Temporally exclusive bus
- Power/GND implicit shielding
- Pre-charged busses (ReadBus) monotonic
transition - Delay reduction from 23GD to 14GD, where
1GD110ps - Clock frequency increased from 140MHz to 200MHz,
Performance improve by 40
14- IR drop and L di/dt analysis
15IR drop
- Voltage drop in supply lines from current drawn
by cells - Chip malfunctions on certain vectors
- Performance degradation
- Biggest problem whats the worst-case vector?
Current depends on driver type, loads, and how
often the cell is switched
Voltage depends on currents of other cells
Power supply network consists of wires of
varying sizes they must be big enough, but too
big wastes area
16Electro-migration
- Power supply lines fail due to excessive current
- More severe with DC path than AC path
- Can be utilized as fuse
- Prevention wire cross-section to current rules
- Maximum current density for particular materials
(via layers) - Higher limits for short, thin wires due to grain
effects - Copper 100x resistance to EM ? not a problem
any more ?
Current limit depends on wire size
17Power IR-drop
- IR-drop
- Slows down the circuitry
- Inhibits switching and incurs loss of state
- Critical to maintain functionality
- and performance
- IR drop is aggravated by
- Smaller Transistor, longer interconnect
- Long interconnect increasing resistance
- High frequency which increases current density
per die size - IR-drop is more serious than electro-migration(?)
18Comparison between Previous Approaches and
Proposed Approach
- Previous approaches
- Transistor level analysis at PR stage
- Long simulation time for exact current
estimation in the full-chip power analysis - Huge RC netlist for power network
- Large CPU time for iterations
- Proposed approach
- RTL stage planning method based on floorplan
- Reducing turn-around-time
- Virtual power grid and current source
- Area-based DC current model
19Optimization Flow
Sizing the width of power trunk, the number,
and width of power refreshes within 10 IR-drop
20Comparison
1) Exact analysis
2) Early planning
1)
2)
21Standard cell Power Distribution
- Assumption
- Constant current source
- Uniform grid resistance network
22Full chip power distribution
- Power trunk
- Resistance network
- Determine metal width
- PAD
- Determine location and
- its number
- Block
- Determine the number
- and the metal width of
- power refresh
23Memory Power/Ground
Best Case
Typical Case
Worst Case
24Final Result Block current(mA)
IR-drop(V)
Power metal width
120mm
60mm
30mm
20mm
Trunk width vs. IR-drop
25Simultaneous Switching Noise (SSN) at I/O driver
- Output pad, Bi-direction pad ? ?? ? loading ? ??
large transient current ( di/dt ) ? ?? ground
bounce ??. - IO-limited design Staggered I/O, ??? power ??
?? - High Speed - Full swing issue ?? ??? ??? ??
??? ??.
VDDIO
V33IO
L
R
C
output
Package model 388 PBGA bonding wire and lead
frame L 20nH, R 0.4 ohm, C 1.5pF
IO pad
input
Package
PCB
FR-4 PCB model(7mil) L 42 nH, R 10 ohm, C
17 pF
VSSIO
26Power/Ground Bouncing
VDDIO
Power/Ground bouncing
VDDcore
VSS
27High Speed I/O speed, swing, power
I/O Type
Description
Application
Performance
Features
Benefits
LVTTL /CMOS
Pull/push I/O, LVTTL levels
On board, chip-to-chip
65-100MHz
Push/Pull, LVTTL
Standard I/O
SSTL
Used to improve operation in Situations where
buses must be isolated from large stubs such as
memory
200MHz
4 class I,II,III,IV
More drive flexibility, low voltage swing
HSTL
High-speed data communication such as DSP, fast
computing systems, processor-to-memory
To 250MHz
Low voltage swing, low power dissipation, good
noise immunity, 4 classes offer more flexibility
Low voltage swing power dissipation, good noise
immunity, 4 classes also variation in termination
scheme (terminated/ unterminated/ series
terminated) offer more design flexibility
LVDS
Low voltage differential signaling
System-to-system communications for multiple
parallel transaction and caches
1 Gb/s
250-400 mv voltage swing, internal termination
resistor
High frequency, excellent noise immunity, low
power consumption, low ratio of GND and Vcc pins
relative to I/Os, no external resistor needed,
handles multiple parallel transactions
GigaBlaze
Full-duplex, gigabit serial interface
Serial interface to high-speed protocols,
parallel bus replacement in chip-to-chip and
board-to-board applications
3.1 Gb/s
Full-duplex capability doubles data rates,
includes clock recovery circuit, meets Fibre
Channel spec
Excellent signal integrity by eliminating SSOs in
board-to-board links
Direct Rambus
Rambus Channel interface
High-speed chip-to-memory, graphics, multimedia,
TV set-top box converters
1.6 Gbytes/s
800 MT/s,built-in-testability, signal swing 0.8 V
High bandwidth with low pin count
28- Full Custom Datapath Design
29Interconnect Capacitance
0.25m m TLM
0.60mm
M3
0.80mm
0.4mm
CV
0.61mm
M2
CH
0.80mm
0.44mm
0.69mm
M1
- Pitch(0.84mm) and aspect ratio(1.36 2.25)
- Large horizontal capacitance causes higher
coupling between wires.
30Inter-wire Cross-coupling Effect
L.Agg.
Victim
R.Agg.
CV
Ceff ? CH (? switching activity factor)
CH
CH
CH
CH
CH
CH
?2
?0
bad situation
good situation
31Interconnect delay and Switching Behavior
1500
bad situation ?2
1000
delay(ps)
shielded, quiet neighbors ?1
500
good situation ?0
0.4
1.2
2.0
2.8
3.6
4.4
5.2
length(1000?m)
- Wire RC delay on bus is strongly dependent on the
switching behavior of neighbors.
32Observation
- Reducing the Cross-Coupling
- Reducing the Effective Capacitance
- Reducing the Delay and Power due to Dynamic
Switching
- Policy Reduce Ceff by reducing ?ij for wire
pairs with large Cij. - In chip with datapath, there are many wire pairs
with large Cij. The question is how to reduce ?ij
by utilizing the properties of such wires.
33Interconnect Layout in Datapath
lt63gt
ltigt
lt1gt
lt0gt
34Ordering of Control Signal Wires
distributed RC
sel1
s1
s1b
sel2
s2
s3
s2
s4
s2b
sel3
s1b
s2b
s3b
s4b
s3
s3b
s4
s4b
coupling cap.
- Most control signals are for multiplexer and
usually lie on the time-critical path . - Control signals , si and sib make transition in
the opposite direction and are vulnerable to be
damaged by cross-coupling effect.
35Datapath Cell Layout for mux
..
s1
mux3
s2
s3
s1b
..
s2b
s3b
..
- Vertical metal 3 data signal
- Horizontal metal 2 control signal
36Control Signal Transition
- Observation of opposite transitions
- select (i)
- (siá, sibâ )
- deselect(i)
- (siâ, sibá )
- selection change (i à j)
- (siâ, sjá ) and (sibá, sjbâ )
- Suggestion
? Try not to place si/sib pair in
neighborhood
? Try not to place si/sj pair in neighborhood
? Try not to place sib/sjb pair in neighborhood
37New Control Signal Ordering Scheme
Place only Si/Sjb(I? j) pair in neighborhhood
s1
0
0
1
1
s1b
12 1.78
s2
0
0
order1
1
1
s2b
0
mux3
s3
0
1
s3b
1
s1
0
0
0
s2
0
8 1.94
0
All the opposite transitions are removed.
s3
0
1
1
order2
s1b
1
1
s2b
- mux3 12 ( 8 ) --gt 0
- mux4 24 ( 12) --gt 0
1
1
s3b
s1
0
0
Equivalent cap. is reduced by 24 43
s2b
1
1
s3
0 1.11
0
0
proposed
s1b
- mux3 1.78 ( 1.94 ) --gt 1.11
- mux4 1.50 ( 1.67 ) --gt 1. 14
1
1
0
0
s2
1
s3b
1
38Experimental Results
64-bit datapath consisting of 3-to-1 mux and
4-to-1 mux, control signal length 1000mm
39Density control for CMP
- Cause of CMP variability
- Pad deforms over metal frame
- Greater ILD thickness over dense regions of
layout - Dishing in sparse regions of layout
- Layout density control
- Modern foundry rules specify layout density
bounds to minimize impact of CMP on yield - Uniform density achieved by post-processing,
insertion of dummy features - Density rules control local feature density for
WxW windows - eg. For metal layout every 2000umx2000um window
must be between 35 and 70 filled. - Filling insertion of dummy feature to improve
layout density - Accurate knowledge of filling is required during
physical design and verification
40Hot Electron Effect
- Also called short-channel effect
- Caused by extremely high E-field in the channel
- Occurs when voltages are not scaled as fast as
dimensions - Electron picks up speed in channel
- Oxide and/or interface are damaged
- Symptom Threshold shift over time until chip
fails - Prevention
- Stay within the allowable region for input slew
and output load - Set maximum allowed degradation over life of
device - Size device as needed
41Standard Cell
42Standard cell Implementation Flow
RTL description
P R
Final Layout
synthesis
Gate level netlist
43Custom Datapath
- Datapath covers up to 60 of chip area especially
for DSP applications. - Up to 40 chip area reduction compared to
standard cell implementation. - More than 30 power saving is possible with
optimized datapath implementation. - Compact datapath core as competitive IP.
44Commercial Chip Datapath
Control Buffer Area
Bit-Slice
45Custom Datapath Design Flow
Schematic entry
RTL description
Micro-floorplan
Final layout
46Hand-crafted density based on the bit-wise
regularity
47Mosaic Datapath compiler
- Structure analysis and micro-floorplan
- regularity analysis
- bus flow analysis
- element ordering
- track assignment
- routing analysis
- regular/irregular mixed placement
- pre-placement analysis
- Benefits
- reduced design-turn-around time
- enable engineering change to be
- quickly analyzed
- reduced bit-wise skew
- improved routability
- predictable result
- design flexibility (bus row, aspect ratio)
- Hand-crafted density and performance
- Mixed Cell-based and bit-slice style layout
48Cell-based Datapath Compiler for Fast Error-free
Design
inv, tsbt nand2,3,4 nor2,3 latl,lath regr pass mux
2s,3s,4s mux2t,3t,4t
- Various types of cell schematic library enables
easy and fast schematic entry
49Cell Generation with Automated Library Builder
- Symbolic Layout helps easy technology mapping
Technology Migration takes advantage of
existing layouts
existing layout
Symbolic layout
Cell extraction
compaction
cell compaction
target layout
- technology file
- cell geometry
Mask layout
Re-routing
50 51Plasma ?
- ???? ???? ?? ??? ???? ? ???? ??? ?? ???? ?? ????
??? ????? ?????? ?? ?? ???? ??? ??? ?? ??. - ??? ?? ??? ??? ?? ??? ??
- ??? ? ?? ?? ?? ?? ???? ??
- ?? ??? ???
- ?? ?? ???
- ?? ??? ?? ???? ??? ???? ????(?)
- ?????? ????? ??? ??? ???(???) ?? ????? ????.
52??? ????? Plasma (Dry Etch)
- ????? ???? ??, ??, ???? ?? ??? ??? sheath ??? ???
???? ??? ???? ???? ??? ?? ??? ?? ??. - ???, ??? ??? ???? ?? ?? ??? ?? (Anisotropic
Etching)? ???? ????? ??? ??? ???? ?? (Ion
Assisted Etching)??? ?? ??? ?? (Reactive Ion
Etching)??? ??. - ?? ???? volatile? ??? ??? ??? ???.
53Plasma Damage
- ??? ?? ?? ? Plasma? ??? ?? (etch or ash)? ??? ??
(MOSFET)? ??? ?? ?? ?? - Physical damage (Silicon damage)
- Energetic particle bombardment, polymer
deposition, metal contamination - Junction leakage, contact resistance, minority
carrier lifetime - Electrical damage (Gox damage)
- Charging (Electrical trap), UV radiation
- Gox leakage, device parameter shifts (Vth,
Idsat), poor yield/reliability
54? Plasma Damage? ??? Issue? ?????
???? ?? (CD? ??)
High density plasma
Multiple metal process
Direct charging ??
Charging path ??
55Cause of Plasma Damage
Dry etching
Plasma
Antenna ratio
Plasma non-uniformity
Plasma damage
ESE (Electron Shading Effect)
56Antenna Rules
- Charging in semiconductor processing
- Many process steps use plasma, charges particles
- Charge collects on conducting poly, metal
surfaces - Capacitive coupling large E fields over gate
oxide - Stresses cause damage, or complete breakdown
- Induce Vt shifts affect device matching (e.g, in
analog) - Standard solution Limit antenna ratio
- Antenna ratio (Apoly Am1 .. ) / Agate-ox
- Eg. Antenna ratio lt300
- Amx ? metal(x) are electrically connected to
node without using metal(x1), and not connected
to an active area - General solution Bridging (break antenna by
moving route to higher layer) - Antennas also solved by protection diodes
- Not free (leakage power, area penalty)
57Plasma Damage Mechanism (1/3)
1 . Antenna ratio
Plasma source
Gate poly-Si connected with charge collecting
antenna
IP
Source
Drain
Gate
Field Ox.
Field Ox.
n-
n-
n
n
IFN
p-Si substrate
Grounded substrate
Antenna ratio? ?? F-N current ??
58Plasma Damage Mechanism (2/3)
2 . Plasma non-uniformity
Plasma source
Gate poly-Si connected with charge collecting
antenna
IP1
IP2
Field Ox.
Field Ox.
Field Ox.
IFN1
IFN2
IFN
p-Si substrate
Floating substrate
Plasma non-uniformity? ?? F-N current ??
59Plasma Damage Mechanism (3/3)
3 . ESE (Electron Shading Effect)
- High A/R ?? Etching ? Te gtgt Ti??? ??? ???? ??? ??
??? - Etching?? ???? ???? ? ???? ???
- ??? ???? ?? Conducting Plug? ?? Gate Oxide?? FN
Current? ???? ??? ?? ?? - Layout Topology Dependent
- Design Rule? ????? ??
ESE? ?? F-N current ??
60Plasma Process??? Antenna Effect
Plasma
damage
61How To Calculate Antenna Ratio
Single layer? antenna ratio? ?? ? process ??
step? ???? incremental accumulate??.
Area of charge collecting layer on node Area of
gate dielectric on node
1 Antenna coeff.
1) Single layer Antenna ratio
X
lt Example gt
Antenna ratio X 0.12
Antenna ratio
X 0.09
3
3
310320 1 10 30
3103208108 1 1012
30
10
10
3
3
20
20
8
12
8
L35
Antenna coeff.
GPOLY
30
CNT/VIA
5
Metal
100
62How To Calculate Antenna Ratio
?
2) Cumulative Antenna ratio(CAR) AR(i)
AR Sinlge layer antenna Ratio
i GPOLY (TOP-1) metal
lt Example gt
1) CAR for MOS1 (10/30)(0.46/5)(76.2/100)(0.1
5/5)1.22 -gt A
diode connected to the gate electrode of a node
provides a shunt path for the collected
charge and, therefore,
protects the node against antenna damage.
Both N/P-well and
P/N-well diodes are effective for this
purpose. 2) CAR for MOS2 (2.5/30)(0.23/5)(76.
2/100)(0.15/5)0.92
-gt This Transistor meets the rule
CAR gt 1 ? MOS1? Antenna error
63How To Calculate Antenna Ratio
- In case of 4 metal level (I)
- In case of 4 metal level (II)
C.A.R. (Cumulative antenna ratio) ??? ????? ??
layer
64CAR ?? Metal Routing
Met3
Met2
? Charged Plasma? gate? damage? ?.
Met1
Field Ox.
Field Ox.
p-Si substrate
Met3
Met2
Met2
Met1
Met1
Field Ox.
Field Ox.
? Plasma? discharge? ?? gate? ?? ??.
p-Si substrate
65CAR ?? Plasma Damage Protection Diode
? Plasma? ??? charge?? ??? ??? ??? ? ??
??? ??? diode? gate? ?? ??.
66Antenna Rule Summary
- Plasma Damage? Dry Etching ???? ????.
- Plasma Damage? ?? ??? Antenna Ratio, Plasma
Non-Uniformity, Electronic Shading Effect ?? ??. - Antenna Ratio? Plasma? Charge?? Conductor? ???
Gate ?? ?? Incremental Accumulate?? ????. - Antenna Fail? Device ??? ??? ???, Metal Routing
?? Protection Diode ??? ??? ???? ??.