Title: Introduction to asynchronous circuit design: specification and synthesis
1Introduction toasynchronous circuit design
specification and synthesis
- Part III
- Advanced topics on synthesis of control circuits
from STGs
2Outline
- Logic decomposition
- Hazard-free decomposition
- Signal insertion
- Technology mapping
- Optimization based on timing information
- Relative timing
- Timing assumptions and constraints
- Automatic generation of timing assumptions
3Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
4No Hazards
5Decomposition May Lead to Hazards
1000
1100
1100
0100
0110
6Decomposition
- Acknowledgement
- Global acknowledgement
- Generating candidates
- Hazard-free signal insertion
- Event insertion
- Signal insertion
7Global acknowledgement
8How about 2-input gates ?
9How about 2-input gates ?
c
z
b
a
a
y
b
d
10How about 2-input gates ?
0
c
0
z
b
a
a
y
b
d
11How about 2-input gates ?
c
z
b
a
a
y
b
d
12How about 2-input gates ?
c
z
y
d
13Strategy for logic decomposition
- Each decomposition defines a new internal signal
- Method Insert new internal signals such that
- After resynthesis, some large gates are
decomposed - The new specification is hazard-free
- Generate candidates for decomposition using
standard logic factorization techniques - Algebraic factorization
- Boolean factorization (boolean relations)
14Decomposition example
15y-
1001
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
16s1
y-
s
1001
1011
z-
s-
w
1001
1000
z-
s-
y
w-
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
s0
z
0111
0110
17s1
y-
y-
1001
1011
z-
s-
s-
w
1001
1000
z-
s-
y
w-
z-
w-
w
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
y
x
x-
w-
z-
y
x
0111
0010
0100
s
s
y
x
z
s0
z
0111
0110
18y-
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
yz1
yz0
19y-
y-
s1
1001
1011
s-
s-
w
1001
z-
w-
0011
0001
1000
z-
w-
w
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
y
x
x-
0111
0010
0100
s
y
x
s
s0
z
z
0111
0110
z- is delayed by the new transition s- !
20y-
s1
1001
1011
s-
w
1001
z-
w-
0011
0001
1000
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
y
y
y
y
y
y
y
s0
z
0111
0110
21Decomposition (Algebraic, Boolean relations)
F
22Decomposition (Algebraic, Boolean relations)
F
until no more progress
Hazard-free ? (Event insertion)
23Signal insertion for function F
Insertion by input borders
State Graph
24Event insertion
25Event insertion
SR(x)
b
x
x
x
x
26Properties to preserve
a is persistent
27Boolean decomposition
f F (x1,,xn)
f G(H(x1,,xn))
Our problem Given F and G, find H
28h1
f
h2
This is a Boolean Relation
29a
F
c
y
d
30a
c
y
d
31a
c
y
d
a
32a
c
y
d
a
d
c
33Technology mapping
- Merging small gates into larger gates introduces
no new hazards - Standard synchronous technique can be applied,
e.g. BDD-based boolean matching - Handles sequential gates and combinational
feedbacks - Due to hazards there is no guarantee to find
correct mapping (some gates cannot be decomposed) - Timing-aware decomposition can be applied in
these rare cases
34Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
35Timing assumptions in design flow
- Speed-independent wire delays after a
forksmaller than fan-out gate delays - Burst-mode circuit stabilizes betweentwo
changes at the inputs - Timed circuits Absolute bounds on gate /
environment delays are known a priori (before
physical design)
36Relative Timing Circuits
- Assumptions a before b
- for concurrent events reduces reachable state
space - for ordered events permits early enabling
- both increase dont care space for logic
synthesis gt simplify logic (better area and
timing) - Assume - if useful - guarantee approach
assumptions are used by the tool to derive a
circuit and required timing constraints that must
be met in physical design flow - Applied to design of the Rotating Asynchronous
Pentium Processor(TM) Instruction Decoder
(K.Stevens, S.Rotem et al. Intel Corporation)
37Relative Timing Asynchronous Circuits
Speed-independent C-element
b
c
a
38State Graph (Read cycle)
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
39Lazy Transition Systems
ER (LDS)
LDS
LDS-
LDS-
LDS-
FR (LDS-)
DTACK-
ER (LDS-)
Event LDS- is lazy firing subset of enabling
40Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
41Speed-independent Netlist
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
42Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
43Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
44State space domain
DSr
LDTACK-
45State space domain
DSr
LDTACK-
46State space domain
DSr
LDTACK-
Two more unreachable states
47Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
48Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for all signals
One state conflict is removed
49Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
50Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
51Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
52Ordered events early enabling
b
b
a
c
c
F
G
a
b
c
53Adding timing assumptions (II)
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
54State space domain
LDS-
D-
DSr-
Reachable space is unchanged
For LDS- enabling can be changed in one state
55Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
56Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
-
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for one signal LDS
If used LDS DSr, otherwise LDS DSr D
57Before early enabling
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
58Netlist with two constraints
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
DSr
LDS
LDTACK
Both timing assumptions are used for optimization
and become constraints
59Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering ab and a enabled before b,
but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates del(a)
lt del(c)del(b)
c
b
a
a
a
c
b
b
60Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering (ab) and (a enabled before
b), but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates
c
b
a
a
a
c
b
b
- Effect I a state becomes DC for all signals
61Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering (ab) and (a enabled before
b), but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates
c
b
a
a
a
c
b
b
- Effect II another state becomes local DC for
signal of event b
62Backannotation of Timing Constraints
- Timed circuits require post-verification
- Can synthesis tools help ?
- Report the least stringent set of timing
constraints required for the correctness of the
circuit - Not all initial timing assumptions may be
required - Petrify reports a set of constraints for order of
firing that guarantee the circuit correctness
63Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
64Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
65Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
66Timing constraints generation
1
a
c
b
d
d
d
d
b
c
a
Incorrect behavior
e
e
e
c
b
2
Assumptions d before b and c before e and a
before d
67Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
e
e
e
c
b
2
4
Assumptions d before b and c before e and a
before d
Other possible constraints remove states from
assumption domain gt invalid
68Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
c before e
e
e
e
c
b
2, 4
2
4
Assumptions d before b and c before e and a
before d
Constraints for the minimal cost solution d
before c and c before e
69Timing aware state encoding
- Solve only state conflicts reachable in the RT
assumptions domain - Generate automatic timing assumptions for
inserted state signals gt state signals can be
implemented as RT logic - State variables inserted concurrently with I/O
events gt latency and cycle time reduction
70Value of Relative Timing
- RT circuits provides up to 2-3x (1.3-2x)
delayarea reduction with respect to SI circuits
synthesized without (with) concurrency reduction - Automatic generation of timing assumptions gt
foundation for automatic synthesis of RT circuits
with area/performance comparable/better than
manual - Back-annotation of timing constraints gt minimal
required timing information for the back-end
tools - Timing-aware state encoding allows significant
area/performance optimization
71Design Flow with Timing
Specification(STG user assumptions)
Reachability analysis
Lazy State Graph
Timing-aware state encoding
Automatic Timing Assumptions
Lazy SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Required Timing Constraints
Gate netlist
72FIFO example
ro
li
FIFO
lo
ri
73Speed-Independent Implementation
without concurrency reduction 3 state signals are
required
74SI implementation with concurrency reduction
x
li
ro-
lo-
ri-
ri
li
-
gC
x
gC
ro
lo
li-
lo
ro
ri
x-
75RT implementation
ri
li
x
lo
ro
76RT implementation
x
li
lo-
ro-
ri-
To satisfy the constraint Delay(x- ) lt Delay
(ri )
and Delay(lo) Delay(x- ) lt Delay(ro ) Delay
(ri )
li-
lo
ro
ri
x-
All constraints are either satisfied by default
or easy to satisfy by sizing