Title: Design Automation for Asynchronous Circuits
1Design Automation for Asynchronous Circuits
- Alex Kondratyev
- Cadence Berkeley Labs,Berkeley, CA, USA
In collaboration with Jordi Cortadella, Luciano
Lavagno Kelvin Lwin and Christos Sotiriou
2Outline
Outline
- What do we optimize?
- End of deterministic design
- Technical and business implications
- Asynchronous design with commercial tools
- Desynchronization
- Delay-insensitive datapath
- Fine-grain pipelining
3Optimization metrics
- Late 70-s
- Literals
- nodes of a Boolean network
- Levels of a Boolean network
Area
Speed
- Nowadays
- Literals
- nodes of a Boolean network
- Levels of a Boolean network
- Wire length
Area
Speed
Tools are optimizing for area and speed!
4Universal metrics
Power
C
5Universal metrics
Power
?
small
2
C
P a f C V
dd
dyn
clk
Delay
?
, delay
Supply voltage
?
Power
?
?
Speed can be taken as a universal metrics
6Outline
Outline
- What do we optimize?
- End of deterministic design
- Technical and business implications
- Asynchronous design with commercial tools
- Desynchronization
- Delay-insensitive datapath
- Fine-grain pipelining
7Timing margins
- Algorithms/tools (approximations)
- Modeling (process corners e.g.)
- Architecture (unbalanced computation)
8Algorithms/tools
False paths (lt 5)
Common path pessimism removal
Hierarchy hurts!!!
10-35 gain from floorplan flattening
(Reshape)
Bad news we do not know how far we are from
optimum ?
Good news optimum is not possible to find ?
9Modeling
Why to panic?
New BIG players signal integrity and process
variability
10Variability sources
- Environment (T, Vdd) signal integrity
- Within-die only
- Process variations
- (gate length L, wire width W, threshold voltage
Vt)
- Die-to-die (design independent)
- Within-die (design dependent)
11Environment SI
Temperature -40?C to 125 ?C
Supply voltage 10
VDD
VDD
IR drop decrease in the current from Vdd
Bad news
Good news
7
6
Field solvers can handle 10 variables
10 gates x 8metal layers
Abstraction, model reduction, IP reuse
help further
9
? 10 RC elements in VDD grid
Tools make IR drop sign off at 5Vdd (still ? 10
delay penalty)
12Environment SI
Crosstalk
Conservative analysis up to 20 delay penalty
(post-layout fixes)
13Process variations
- Within-die
- design dependent,
- systematic and random!!
- Die-to-die
- design independent, well
- modeled via worst-case files
Lgate
Wwire
Tt
Nassif01
14Measuring variability
chips
Microprocessor at-speed functional testing
frequency
Bin1
Bin2
Bin3
ASIC no delay testing, no binning
Strategically placed oscillators
Problem Up to 15 delay variation in RO
(Nassif03) Vertical/horizontal (4), spacing
poli-SI (7), distance (5)
15Modeling variability
Model for gate delay (linear wrt variability
sources)
Independence of sources (within a group - model
reduction (PCA or SVD))
For a single variability source L L
L
spatial
random
var
(is modeled by random normally distributed
variables N(0,?))
Variation of path delay D ? d (L
)
var
var
var
16Statistical timing analysis
?
Reconvergence needs some care
- Numerical computation of a distribution
- Approximate convolution (5 accuracy)
- Use upper and lower bounds (10 diff. Blaauw03)
Algorithms have linear complexity!
17What it buys?
Trading yield
STA helps to quantify risk (reduce margin and
be structure specific)
STA might help to trade off confidence margin and
yield (testing???)
- Open issues
- why normal?
- how to derive ??
- how to derive sensitivity coefficients?
18Outline
Outline
- What do we optimize?
- End of deterministic design
- Technical and business implications
- Asynchronous design with commercial tools
- Desynchronization
- Delay-insensitive datapath
- Fine-grain pipelining
19Summing this up
Clock overhead
Cycle time
Real Computation Time
Worst- average
Variability
25
30
45
Some designs work twice faster than needed by
spec!
Everything boils down to
Synchronous design is turning out to become a
costly proposition
20Is asynchronous an option?
It is about time but must requirements to
asynchronous CAD tool
- Competitive
- - added value with minimal (or no) penalty
- - scalable (capable of handling large designs)
-
- Simple
- - minimal knowledge of asynchronous design
- - RTL input
-
- Risk-free
- - does not change sign-off (STA)
- - complete solution in verification and
testing - - backup options (synchronous implementation)
-
21Outline
Outline
- What do we optimize?
- End of deterministic design
- Technical and business implications
- Asynchronous design with commercial tools
- Desynchronization
- Delay-insensitive datapath
- Fine-grain pipelining
22Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Penalties?
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks
23Desyncronization flow
- Think synchronous
- Design synchronousone clock and edge-triggered
flip-flops - De-synchronize (automatically)
- Run it asynchronously
Asynchronous for dummies
24Synchronous circuit
L
L
L
L
0
0
1
1
CLK
0
0
L
L
25De-synchronization
L
L
L
L
0
0
1
1
0
0
L
L
26De-synchronization
Distributed controllers substitute the clock
network
C
C
C
C
C
C
The data path remains intact !
27A
B
C
D
28A
B
C
D
A
B
C
D
A
B
C
D
A-
B-
C-
D-
Overlapping is also acceptable
29Concurrent model
30For any netlist
31Synchronization layer
32Synchronization layer
33Synchronization layer
This
This is a circuit marked graph (CMG)
34Properties of CMGs
- Any CMG is live and safe
- Safeness no data overwriting
- Liveness no deadlock
A
B
C
A-
B-
C-
35(No Transcript)
36Flow equivalence Guernic, Talpin, Lann, 2003
A
B
37Flow equivalence
CLK
A 1 3 0 2 1
5 3 1 6 0
B 5 1 2 3 1
4 2 4 3 1
Synchronous behavior
A 1 3 0 2
1 5 3 1 6 0
B 5 1 2 3 1 4
2 4 3 1
De-synchronized behavior
38Flow equivalence
CLK
A 1 3 0 2 1
5 3 1 6 0
B 5 1 2 3 1
4 2 4 3 1
Synchronous behavior
A 1 3 0 2
1 5 3 1 6 0
B 5 1 2 3 1 4
2 4 3 1
De-synchronized behavior
Theorem The de-synchronization model
preserves flow-equivalence
39Timing equivalence
del_a
del_b
del_c
A
B
C
D
del_b del_a del_c del_d
A
del_a
del_a
B
del_b
del_b
C
del_c
del_c
D
A
B-
C
D-
Synchronous-like behavior
del_c
del_a
del_b
A-
B
C-
D
40Timing equivalence
del_a
del_b
del_c
A
B
C
D
del_b gt del_a del_c del_d
A
del_a
del_a
B
del_b
del_b
C
del_c
del_c
D
A
B-
C
D-
B keeps the same period and settles the rest
del_c
del_a
del_b
A-
B
C-
D
41Compatibility
Synchronous T ? T T T
T
setup
comb
skew
CQ
sync
Desynchronized T ? T T
T
desync
CQ
comb
controller
Statement Desynchronized design is behavior and
timing compatible to its synchronous counterpart
42Synchronous environment
A
B
C
Clk
Clk
A
B
C
Clk
Timing arc
A-
B-
C-
Clk-
43Implementation of a controller
- Only local handshakes with adjacent controllers
are necessary - Synthesis by using intuition, common sense,
and petrify
44Implementation of a controller
45Delay matching
Combinational logic
d
46Post-layout delay matching
Combinational logic
47Post-layout delay matching
Combinational logic
48Desynchronization. Gaining Trust
Synchronous RTL
49Async DLX block diagram
50Desynchronization. Gaining Trust
Synchronous RTL
Synchronous
Desynchronized
Cycle 4.4ns Power 70.9mW Area 372,656?m
Cycle 4.45ns Power 71.2mW Area 378,058?m
51DLX lessons. Positive
- Asynchronous design with no area, power, delay
penalties
- Partial tolerance of variability
- (matched delays scale with the rest of the
gates)
req
Treq gt Tclk ? Error
52DLX lessons. Negative
- Asynchronous design with no area, power, delay
advantage
- Clock power is saved but latched designs have
higher loads
- PR constraints of de-sync design are non-trivial
- Matched delay variability might hurt
Hard work to come out even with synchronous
53Can we do better?
S
M
M
S
54Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks
55Introduction to NCL
2-phase functioning (evaluate (DATA) precharge
(NULL))
Self-timed register interaction (acknowledgement
of phases)
Reg.
Reg.
Combinational logic
CD
NULL
Micropipeline with delay-insensitive (DI) datapath
56NCL Design Flow
57First Attempt. Pattern Matching
(delay-insensitive 2-rail implementation)
Huge area penalty!!!
58From 2 to 3-rail Scheme
Not DI scheme!!!
59From 2 to 3-rail Scheme
Rationale behind delay-insensitivity of 3-rail
scheme
- 2-rail circuit is hazard-free under monotonic
input changes
- All inputs changes are observable at outputs
60NCLX flow (MUX )
61NCL lessons. Positive
- High security of computation
62NCL lessons. Negative
- Big area overhead 2.7-3.0x
- No performance advantage
- (average case performance is swallowed by the
penalty from NULL)
- Completion introduces further penalties (power
and delay)
63Can we do better?
- Timing optimization of completion network
- (may recover about 25 area and power)
- Partial recovery of single-rail nodes in
datapath
- 4-rail data communication to save power
64Phased Logic
Linden94
Even Phase
00
11
LSB is value bit (v) MSB is timing bit (t)
Odd Phase
01
10
Value 1
Value 0
t
v
odd1
odd0
even1
odd1
even0
even0
even1
A signal changes phase or value (only one bit
changes)
65Phased logic gate
A PL gate has an internal state Even or Odd. A
PL gate fires when all inputs match the gate
phase.
E
GatePhase E
O
Gate is not ready to fire
O
After Firing
Gate ready to fire
E
E
GatePhase O
GatePhase E
E
O
E
E
66LUT-4 based implementation
D
-
latch
a_v
new_v
b_v
v
LUT4
D
Q
c_v
d_v
Q
EN
R r
-
bit
Input completion detection
fi
a_v
v_rbit
reset
a_t
D
-
lat
ch
b_v
gate_phase
new_t
delay
t
b_t
D
Q
G1
C
c_v
t_b
Q
c_t
EN
d_v
out_phase gate_phase
-
reset
R r
bit
d_t
G
2
fo
t_rbit
reset
fo_b
out_phase
G3
- Functionality v(a_v, b_v, c_v, d_v) Phase
a_t, b_t, c_t, d_t, t
Area penalty!
67DI-datapath summary
- NCL and PL show a way to tolerate variability
- Both have significant penalties
- May be good for niche applications (smart cards,
mixed signals)
- Average case speed is masked by DI-coordination
overhead
New optimization approaches
Fine-grain pipelining
68Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks