Introduction to asynchronous circuit design: specification and synthesis - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to asynchronous circuit design: specification and synthesis

Description:

none – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 133
Provided by: Comp966
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to asynchronous circuit design: specification and synthesis


1
Introduction to asynchronous circuit
designspecification and synthesis
  • Jordi Cortadella, Universitat Politècnica de
    Catalunya, Spain
  • Michael Kishinevsky, Intel Corporation, USA
  • Alex Kondratyev, Theseus Logic, USA
  • Luciano Lavagno, Università di Udine, Italy

2
Outline
  • I Introduction to basic concepts onasynchronous
    design
  • II Synthesis of control circuits from STGs
  • III Advanced topics on synthesis of
    controlcircuits from STGs
  • IV Synthesis from HDL and other synthesis
    paradigmsNote no references in the tutorial

3
Introduction toasynchronous circuit design
specification and synthesis
  • Part I
  • Introduction to basic concepts on asynchronous
    circuit design

4
Outline
  • What is an asynchronous circuit ?
  • Asynchronous communication
  • Asynchronous logic blocks
  • Micropipelines
  • Control specification and implementation
  • Delay models
  • Why asynchronous circuits ?

5
Synchronous circuit
R
R
R
R
CL
CL
CL
CLK
Implicit synchronization
6
Asynchronous circuit
Ack
R
R
R
R
CL
CL
CL
Req
Explicit synchronization Req/Ack handshakes
7
Synchronous communication
1
1
0
0
1
0
  • Clock edges determine the time instants where
    data must be sampled
  • Data wires may glitch between clock edges
    (set-up/hold times must be satisfied)
  • Data are transmitted at a fixed rate(clock
    frequency)

8
Dual rail
1
1
1
0
0
0
  • Two wires per bit
  • 00 spacer, 01 0, 10 1
  • n-bit data communication requires 2n wires
  • Each bit is self-timed
  • Other delay-insensitive codes exist

9
Bundled data
1
1
0
0
1
0
  • Validity signal
  • Similar to an aperiodic local clock
  • n-bit data communication requires n1 wires
  • Data wires may glitch when no valid
  • Signaling protocols
  • level sensitive (latch)
  • transition sensitive (register) 2-phase /
    4-phase

10
Example memory read cycle
Valid address
Address
A
A
Valid data
Data
D
D
  • Transition signaling, 4-phase

11
Example memory read cycle
Valid address
Address
A
A
Valid data
Data
D
D
  • Transition signaling, 2-phase

12
Outline
  • What is an asynchronous circuit ?
  • Asynchronous communication
  • Asynchronous logic blocks
  • Micropipelines
  • Control specification and implementation
  • Delay models
  • Why asynchronous circuits ?

13
Asynchronous modules
DATA PATH
Data IN
Data OUT
start
done
req in
req out
CONTROL
ack in
ack out
  • Signaling protocolreqin start computation
    done reqout ackout ackinreqin- start-
    reset done- reqout- ackout-
    ackin-(more concurrency is also possible, e.g.
    by overlapping the return-to-zero phase of step
    i-1 with the evaluation phase of step i)

14
Completion detection


15
Asynchronous latches C element
Vdd
A
B
Z
A
B
Z
A
B
Z
A
B
Gnd
16
Dual-rail logic
Dual-rail AND gate
Valid behavior for monotonic environment
17
Differential cascode voltage switch logic
start
Z.t
Z.f
done
A.t
A.f
B.f
C.f
B.t
C.t
start
3-input AND/NAND gate
18
Bundled-data logic blocks
logic


start
done
delay
Conventional logic matched delay
19
Micropipelines (Sutherland 89)
Aout
Ain
C
L
L
L
L
logic
logic
logic
Rin
Rout
20
Data-path / Control
L
L
L
L
logic
logic
logic
Rin
Rout
CONTROL
Ain
Aout
21
Outline
  • What is an asynchronous circuit ?
  • Asynchronous communication
  • Asynchronous logic blocks
  • Micropipelines
  • Control specification and implementation
  • Delay models
  • Why asynchronous circuits ?

22
Control specification
A
A
B
B
A-
A input B output
B-
23
Control specification
A
B
B
A
A-
B-
24
Control specification
A
B-
B
A
A-
B
25
Control specification
A
B
A
C
C
B
A-
B-
C-
26
Control specification
A
B
A
C
C
A-
B
B-
C-
27
Control specification
28
A simple filter specification
IN
Rin
Ain
y 0 loop x READ (IN) WRITE (OUT,
(xy)/2) y x end loop
filter
Aout
Rout
OUT
29
A simple filter block diagram
  • x and y are level-sensitive latches (transparent
    when R1)
  • is a bundled-data adder (matched delay between
    Ra and Aa)
  • Rin indicates the validity of IN
  • After Ain the environment is allowed to change
    IN
  • (Rout,Aout) control a level-sensitive latch at
    the output

30
A simple filter control spec.
31
A simple filter control impl.
32
Control observable behavior
z
Ain-
Rin
Rx
Ry-
Rx-
Ax-
z-
Ay
Ay-
Ax
Ra
Aa
Rout
Aout
z
Rout-
Aout-
Ry
33
Outline
  • What is an asynchronous circuit ?
  • Asynchronous communication
  • Asynchronous logic blocks
  • Micropipelines
  • Control specification and implementation
  • Delay models
  • Why asynchronous circuits ?

34
Taking delays into account
  • Delay assumptions
  • Environment 3 times units
  • Gates 1 time unit

events x ? x- ? y ? z ? z- ? x- ? x ? z-
? z ? y- ?
time 3 4 5 6 7
9 10 12 13 14
35
Taking delays into account
x
x
y
z
z
very slow
Delay assumptions unbounded delays
events x ? x- ? y ? z ? x- ? x ? y-
failure !
time 3 4 5 6 9
10 11
36
Gate vs wire delay models
  • Gate delay model delays in gates, no delays in
    wires
  • Wire delay model delays in gates and wires

37
Delay models for async. circuits
  • Bounded delays (BD) realistic for gates and
    wires.
  • Technology mapping is easy, verification is
    difficult
  • Speed independent (SI) Unbounded (pessimistic)
    delays for gates and negligible (optimistic)
    delays for wires.
  • Technology mapping is more difficult,
    verification is easy
  • Delay insensitive (DI) Unbounded (pessimistic)
    delays for gates and wires.
  • DI class (built out of basic gates) is almost
    empty
  • Quasi-delay insensitive (QDI) Delay insensitive
    except for critical wire forks (isochronic
    forks).
  • Formally, it is the same as speed independent
  • In practice, different synthesis strategies are
    used

BD
SI ? QDI
38
Motivation (designers view)
  • Modularity
  • Plug-and-play interconnectivity
  • Reusability
  • IPs with abstract timing behaviors
  • High-performance
  • Average-case performance (no worst-case delay
    synchronization)
  • No clock skew (local timing assumptions instead)
  • Many interfaces are asynchronous
  • Buses, networks, ...

39
Motivation (technology aspects)
  • Low power
  • Automatic clock gating
  • Electromagnetic compatibility
  • No peak currents around clock edges
  • Robustness
  • High immunity to technology and environment
    variations (in-die variations, temperature, power
    supply, ...)

40
Problems
  • Concurrent models for specification
  • CSP, Petri nets, ...
  • Difficult to design
  • Hazards, synchronization
  • Complex timing analysis
  • Difficult to estimate performance
  • Difficult to test
  • No way to stop the clock

41
But we have some success stories...
  • Philips
  • AMULET microprocessors
  • Sharp
  • Intel (RAPPID)
  • IBM (interlocked pipeline)
  • Start-up companies
  • Theseus Logic, Cogency, ADD
  • ...

42
Introduction toasynchronous circuit design
specification and synthesis
  • Part II
  • Synthesis of control circuitsfrom STGs

43
Outline
  • Overview of the synthesis flow
  • Specification
  • State graph and next-state functions
  • State encoding
  • Implementability conditions
  • Speed-independent circuit
  • Complex gates
  • C-element architecture

44
Design flow
45
x
x
y
y
z
z
x-
z
x
y
z-
y-
Signal Transition Graph (STG)
46
(No Transcript)
47
(No Transcript)
48
Next-state functions
49
x
y
z
50
Outline
  • Overview of the synthesis flow
  • Specification
  • State graph and next-state functions
  • State encoding
  • Implementability conditions
  • Speed-independent circuit
  • Complex gates
  • C-element architecture

51
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
52
VME bus
53
STG for the READ cycle
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
LDS
DSr
VME Bus Controller
LDTACK
DTACK
54
Choice Read and Write cycles
55
Choice Read and Write cycles
56
Choice Read and Write cycles
57
Choice Read and Write cycles
58
Circuit synthesis
  • Goal
  • Derive a hazard-free circuitunder a given delay
    model andmode of operation

59
Outline
  • Overview of the synthesis flow
  • Specification
  • State graph and next-state functions
  • State encoding
  • Implementability conditions
  • Speed-independent circuit
  • Complex gates
  • C-element architecture

60
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
61
STG for the READ cycle
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
LDS
DSr
VME Bus Controller
LDTACK
DTACK
62
Binary encoding of signals
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
63
Binary encoding of signals
DSr
DTACK-
10000
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
10010
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
10110
01110
10110
D
D-
DSr-
DTACK
(DSr , DTACK , LDTACK , LDS , D)
64
Excitation / Quiescent Regions
65
Next-state function
0 ? 1
0 ? 0
1 ? 1
1 ? 0
66
Karnaugh map for LDS
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
67
Outline
  • Overview of the synthesis flow
  • Specification
  • State graph and next-state functions
  • State encoding
  • Implementability conditions
  • Speed-independent circuit
  • Complex gates
  • C-element architecture

68
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
69
Concurrency reduction
LDS
LDS-
LDS-
LDS-
10110
10110
70
Concurrency reduction
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
71
State encoding conflicts
LDS
LDTACK-
LDS-
LDTACK
10110
10110
72
Signal Insertion
LDTACK-
LDS
LDS-
LDTACK
101101
101100
D-
DSr-
73
Outline
  • Overview of the synthesis flow
  • Specification
  • State graph and next-state functions
  • State encoding
  • Implementability conditions
  • Speed-independent circuit
  • Complex gates
  • C-element architecture

74
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
75
Complex-gate implementation
  • Under what conditions does a hazard-free
    implementation exist?

76
Implementability conditions
  • Consistency
  • Rising and falling transitions of each signal
    alternate in any trace
  • Complete state coding (CSC)
  • Next-state functions correctly defined
  • Persistency
  • No event can be disabled by another event (unless
    they are both inputs)

77
Implementability conditions
  • Consistency CSC persistency
  • There exists a speed-independent circuit that
    implements the behavior of the STG(under the
    assumption that any Boolean function can be
    implemented with one complex gate)

78
Persistency
a
c
b
is this a pulse ?
Speed independence ? glitch-free output behavior
under any delay
79
Speed-independent implementations
  • How can the implementability conditions
  • Consistency
  • Complete state coding
  • Persistency
  • be satisfied?
  • Standard circuit architectures
  • Complex (hazard-free) gates
  • C elements with monotonic covers
  • Standard gates and latches

80
(No Transcript)
81
ER(d)
ER(d-)
82
ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
Complex gate
83
Implementation with C elements
? S ? z ? S- ? R ? z- ? R- ?
  • S (set) and R (reset) must be mutually exclusive
  • S must cover ER(z) and must not intersect
    ER(z-) ? QR(z-)
  • R must cover ER(z-) and must not intersect
    ER(z) ? QR(z)

84
ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
S
d
C
R
85
but ...
S
d
C
R
86
Starting from state 0000 (R1 and S0)
a R- b a- c S d
S
d
C
R
87
ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
Monotonic covers
88
C-based implementations
89
Synthesis exercise
1011
0011
0111
Derive circuits for signals x and z (complex
gates and monotonic covers)
90
Synthesis exercise
1011
wx
yz
00
01
11
10
-
1
1
0
00
0011
-
1
1
0
01
-
0
0
0
11
-
1
1
0
10
0111
Signal x
91
Synthesis exercise
1011
wx
yz
00
01
11
10
-
0
0
0
00
0011
-
0
0
0
01
-
1
1
1
11
-
1
0
0
10
0111
Signal z
92
Introduction toasynchronous circuit design
specification and synthesis
  • Part III
  • Advanced topics on synthesis of control circuits
    from STGs

93
Outline
  • Logic decomposition
  • Hazard-free decomposition
  • Signal insertion
  • Technology mapping
  • Optimization based on timing information
  • Relative timing
  • Timing assumptions and constraints
  • Automatic generation of timing assumptions

94
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
95
No Hazards
96
Decomposition May Lead to Hazards
1000
1100
1100
0100
0110
97
Decomposition
  • Acknowledgement
  • Generating candidates
  • Hazard-free signal insertion
  • Event insertion
  • Signal insertion

98
Global acknowledgement
99
How about 2-input gates ?
100
How about 2-input gates ?
c
z
b
a
a
y
b
d
101
How about 2-input gates ?
0
c
0
z
b
a
a
y
b
d
102
How about 2-input gates ?
c
z
b
a
a
y
b
d
103
How about 2-input gates ?
c
z
y
d
104
Strategy for logic decomposition
  • Each decomposition defines a new internal signal
  • Method Insert new internal signals such that
  • After resynthesis, some large gates are
    decomposed
  • The new specification is hazard-free
  • Generate candidates for decomposition using
    standard logic factorization techniques
  • Algebraic factorization
  • Boolean factorization (boolean relations)

105
Decomposition example
106
y-
1001
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
107
s1
y-
y-
1001
1011
z-
s-
s-
w
1001
1000
z-
s-
y
w-
z-
w-
w
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
y
x
x-
w-
z-
y
x
0111
0010
0100
s
s
y
x
z
s0
z
0111
0110
108
s1
y-
s
1001
1011
z-
s-
w
1001
1000
z-
s-
y
w-
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
s0
z
0111
0110
109
y-
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
yz1
yz0
110
y-
y-
s1
1001
1011
s-
s-
w
1001
z-
w-
0011
0001
1000
z-
w-
w
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
y
x
x-
0111
0010
0100
s
y
x
s
s0
z
z
0111
0110
z- is delayed by the new transition s- !
111
y-
s1
1001
1011
s-
w
1001
z-
w-
0011
0001
1000
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
y
y
y
y
y
y
y
s0
z
0111
0110
112
Decomposition (Algebraic, Boolean relations)
F
113
Decomposition (Algebraic, Boolean relations)
F
until no more progress
Hazard-free ? (Event insertion)
114
Signal insertion for function F
Insertion by input borders
State Graph
115
Event insertion
116
Event insertion
SR(x)
b
x
x
x
x
117
Properties to preserve
a is persistent
118
Boolean decomposition
f F (x1,,xn)
f G(H(x1,,xn))
Our problem Given F and G, find H
119
h1
f
h2
This is a Boolean Relation
120
a
F
c
y
d
121
a
c
y
d
122
a
c
y
d
a
123
a
c
y
d
a
d
c
124
Technology mapping
  • Merging small gates into larger gates introduces
    no new hazards
  • Standard synchronous technique can be applied,
    e.g. BDD-based boolean matching
  • Handles sequential gates and combinational
    feedbacks
  • Due to hazards there is no guarantee to find
    correct mapping (some gates cannot be decomposed)
  • Timing-aware decomposition can be applied in
    these rare cases

125
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
126
Timing assumptions in design flow
  • Speed-independent wire delays after a
    forksmaller than fan-out gate delays
  • Burst-mode circuit stabilizes betweentwo
    changes at the inputs
  • Timed circuits Absolute bounds on gate /
    environment delays are known a priori (before
    physical design)

127
Relative Timing Circuits
  • Assumptions a before b
  • for concurrent events reduces reachable state
    space
  • for ordered events permits early enabling
  • both increase dont care space for logic
    synthesis gt simplify logic (better area and
    timing)
  • Assume - if useful - guarantee approach
    assumptions are used by the tool to derive a
    circuit and required timing constraints that must
    be met in physical design flow
  • Applied to design of the Rotating Asynchronous
    Pentium Processor(TM) Instruction Decoder
    (K.Stevens, S.Rotem et al. Intel Corporation)

128
Relative Timing Asynchronous Circuits
Speed-independent C-element
b
c
a
129
State Graph (Read cycle)
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
130
Lazy Transition Systems
ER (LDS)
LDS
LDS-
LDS-
LDS-
FR (LDS-)
DTACK-
ER (LDS-)
Event LDS- is lazy firing subset of enabling
131
Timing assumptions
  • (a before b) for concurrent events
    concurrency reduction for firing and
    enabling
  • (a before b) for ordered events
    early enabling
  • (a simultaneous to b wrt c) for triples of
    events combination of the above

132
Speed-independent Netlist
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
133
Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
134
Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
135
State space domain
DSr
LDTACK-
136
State space domain
DSr
LDTACK-
137
State space domain
DSr
LDTACK-
Two more unreachable states
138
Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
139
Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for all signals
One state conflict is removed
140
Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
141
Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
142
Timing assumptions
  • (a before b) for concurrent events
    concurrency reduction for firing and
    enabling
  • (a before b) for ordered events
    early enabling
  • (a simultaneous to b wrt c) for triples of
    events combination of the above

143
Ordered events early enabling
b
b
a
c
c
F
G
a
b
c
144
Adding timing assumptions (II)
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
145
State space domain
LDS-
D-
DSr-
Reachable space is unchanged
For LDS- enabling can be changed in one state
146
Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
147
Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
-
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for one signal LDS
If used LDS DSr, otherwise LDS DSr D
148
Before early enabling
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
149
Netlist with two constraints
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
DSr
LDS
LDTACK
Both timing assumptions are used for optimization
and become constraints
150
Deriving automatic timing assumptions
  • Rule I (out of 6) a,b - non-input events
  • Untimed ordering ab and a enabled before b,
    but not vice versa
  • Derived assumption a fires before b
  • Justification delay of a gate can be made
    shorter than delay of two (or more) gates del(a)
    lt del(c)del(b)

c
b
a
a
a
c
b
b
151
Deriving automatic timing assumptions
  • Rule I (out of 6) a,b - non-input events
  • Untimed ordering (ab) and (a enabled before
    b), but not vice versa
  • Derived assumption a fires before b
  • Justification delay of a gate can be made
    shorter than delay of two (or more) gates

c
b
a
a
a
c
b
b
  • Effect I a state becomes DC for all signals

152
Deriving automatic timing assumptions
  • Rule I (out of 6) a,b - non-input events
  • Untimed ordering (ab) and (a enabled before
    b), but not vice versa
  • Derived assumption a fires before b
  • Justification delay of a gate can be made
    shorter than delay of two (or more) gates

c
b
a
a
a
c
b
b
  • Effect II another state becomes local DC for
    signal of event b

153
Backannotation of Timing Constraints
  • Timed circuits require post-verification
  • Can synthesis tools help ?
  • Report the least stringent set of timing
    constraints required for the correctness of the
    circuit
  • Not all initial timing assumptions may be
    required
  • Petrify reports a set of constraints for order of
    firing that guarantee the circuit correctness

154
Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
155
Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
156
Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
157
Timing constraints generation
1
a
c
b
d
d
d
d
b
c
a
Incorrect behavior
e
e
e
c
b
2
Assumptions d before b and c before e and a
before d
158
Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
e
e
e
c
b
2
4
Assumptions d before b and c before e and a
before d
Other possible constraints remove states from
assumption domain gt invalid
159
Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
c before e
e
e
e
c
b
2, 4
2
4
Assumptions d before b and c before e and a
before d
Constraints for the minimal cost solution d
before c and c before e
160
Timing aware state encoding
  • Solve only state conflicts reachable in the RT
    assumptions domain
  • Generate automatic timing assumptions for
    inserted state signals gt state signals can be
    implemented as RT logic
  • State variables inserted concurrently with I/O
    events gt latency and cycle time reduction

161
Value of Relative Timing
  • RT circuits provides up to 2-3x (1.3-2x)
    delayarea reduction with respect to SI circuits
    synthesized without (with) concurrency reduction
  • Automatic generation of timing assumptions gt
    foundation for automatic synthesis of RT circuits
    with area/performance comparable/better than
    manual
  • Back-annotation of timing constraints gt minimal
    required timing information for the back-end
    tools
  • Timing-aware state encoding allows significant
    area/performance optimization

162
Design Flow with Timing
Specification(STG user assumptions)
Reachability analysis
Lazy State Graph
Timing-aware state encoding
Automatic Timing Assumptions
Lazy SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Required Timing Constraints
Gate netlist
163
FIFO example
ro
li
FIFO
lo
ri
164
Speed-Independent Implementation
without concurrency reduction 3 state signals are
required
165
SI implementation with concurrency reduction
x
li
ro-
lo-
ri-
ri
li
-

gC
x
gC

ro
lo
li-
lo
ro
ri
x-
166
RT implementation
ri
li
x
lo
ro
167
RT implementation
x
li
lo-
ro-
ri-
To satisfy the constraint Delay(x- ) lt Delay
(ri )
and Delay(lo) Delay(x- ) lt Delay(ro ) Delay
(ri )
li-
lo
ro
ri
x-
All constraints are either satisfied by default
or easy to satisfy by sizing
168
Introduction toasynchronous circuit design
specification and synthesis
  • Part IV
  • Synthesis from HDL
  • Other synthesis paradigms

169
Outline
  • Synthesis from standard HDL (Verilog) L. Lavagno
    et al Async00
  • Subset for asynchronous specification
  • Data-path/control partitioning
  • Circuit architecture. Control generation
  • Synthesis from asynchronous HDL (CSP, Tangram)
  • CSP for control generation A. Martin et al,
    Caltech
  • Tangram for silicon compilation K. van Berkel et
    al, Philips
  • Control synthesis using FSMs K. Yun, S. Nowick
  • Burst-mode machines
  • Comparison with STGs
  • Disclaimer this is NOT a comprehensive review

170
Motivation
  • Language-based design key enabler to synchronous
    logic success
  • Use HDL as single language for
  • specification
  • logic simulation and debugging
  • synthesis
  • post-layout simulation
  • HDL must support multiple levels of abstraction

171
Control-data partitioning
  • Splitting of asynchronous control and synchronous
    data path
  • Automated insertion of bundling delays

CONTROL UNIT
DATA PATH
request
delay
acknowledge
172
Design flow
HDL specification
Synthesizable HDL (data)
Control/data splitting
STG (control)
Synthesis (Synopsys)
Logic delays
Synthesis (petrify)
Timing analysis (Synopsys)
HDL implementation
Logic implementation
Delay insertion
173
Asynchronous Verilog subset by example
always begin wait(start) R SMP 3 RES
SMP 4 R if(RES7 1) RES 0 else
begin if(RES6 1) RES 1 end done
1 wait(!start) done 0 end
SMP
R
R E S
RES
C.U.
done
start
  • begin-end for sequencing, fork-join for
    concurrency, if-else for input choice
  • Only structured mix of sequencing, concurrency
    and choice can be specified

174
Controller design flow
HDL
Syntax-directed translation
Petri Net
Transformations
Reductions
Trace Expressions
Synthesis
Circuit
175
Trace expressions example
  • ( a ( b c) ) (d e)

176
Reduction Example
a
d?a ( b f )

f
b
e
c


c


h
g h?e
d
g
177
Transformation concurrency reduction
a
  • Concurrency in TE
  • b and f have a common
  • parallel father

f
b
c
d
178
Transformation concurrency reduction
a
  • f and b are ordered

f
b

c
d
179
Synthesis
  • Place-based encoding ( based on a David-cell
    approach)
  • Transformations to improve area and performance
  • Structural methods to derive a circuit Pastor
    et al. Transactions on CAD, Nov98

180
Place-based encoding
p2
p1
p2
p1
1100
p3
t1
ER(t1) 111-
t1
p3
0010
p4
t2
ER(t2) --11
t2
p3-
p4
0001
p4-
181
Synthesis example VME bus
ldtack
p2
p1-
LDS
p8-
p11-
p3
lds
D
p1
LDTACK
LDTACK-
DSr
p2-
p7-
p4
p10-
dsr
dtack
D
DTACK-
LDS-
ldtack-
p8
p3-
Place encoding
p11
p5
DTACK
D-
p9-
p6-
dsr-
lds-
dtack-
p4-
DSr-
p9
p6
p10
p7
D-
p5-
182
VME bus spec after transforms
ldtack
p2
ldtack
p1-
p8-
p11-
lds
d
p3
lds
D
dtack
p1
dsr
p2-
p7-
dsr-
p9
p9-
ldtack-
p4
p10-
dsr
dtack
ldtack-
p8
lds-
dtack-
Reductions Transforms
p3-
p11
p5
d-
p9-
p6-
dsr-
lds-
dtack-
p4-
p9
p6
p10
p7
D-
p5-
183
Deriving Next state function
Next-state function of signal y ?
184
Deriving Next State function
Next-state function of signal y ?
y x z
185
Conclusion
  • Initial prototype of automated flow without state
    explosion for ASIC design
  • From HDLs (control / data splitting)
  • Existing tools for data-path synthesis
  • Direct synthesis guarantees implementation(HDL ?
    Petri net, Petri-net-based encoding)
  • Synthesis of large controllers by efficient spec
    models (Free-choice Petri nets trace
    expressions)
  • Exploration of the design space (optimization) by
    property-preserving transformations
  • Logic synthesis by structural methods
  • Quality of design often acceptable
  • Timing post-optimization can be applied

186
Synthesis from asynchronous HDL
  • CSP based languages
  • CSP communicating sequential processes T.
    Hoare
  • Two synthesis techniques
  • based on program transformations Caltech
  • based on direct compilation Philips
  • Tools are more mature than for asynchronous
    synthesis from standard HDL
  • Complete shift in design methodology is required

187
Using CSP for control generation
  • After li goes high do full handshake at the
    right, then complete handshake at the left and
    iterate.

ro
li
Q element
ri
lo
STG
li
ro
ri
ro-
ri-
lo
li-
lo-
liroriro-not rilonot lilo-
CSP
  • sequencing operator
  • ro ro goes high ro- ro goes low
  • li wait until li is high not li wait
    until li is low

188
Using CSP for control generation
liroriro-not rilonot lilo-
CSP
weak
ri
Production rules li -gt ro ri -gt ro- not ri
-gt lo not li -gt lo-
ro
li
  • Conflict ro and ro- are not mutually exclusive
    (since ri and li are not)
  • Eliminate conflict by state signal insertion (
    CSC)

189
Conflict elimination
lirorixxro-not rilonot
lix-not xlo-
CSP
Production rules not x and li -gt ro x or not
li -gt ro- x and not ri -gt lo not x or ri -gt
lo- ri -gt x not li -gt x-
ro
li
x
FF
not x
lo
ri
190
Conclusions
  • Generating circuits from CSP control program is
    similar to STG synthesis
  • One can be reduced to the other
  • Particular technique may vary. Direct CSP program
    transformations can be (and were) used instead of
    methods based on state space generation
  • See reference list for more details

191
Buffer example in Tangram
(a?byte b!byte) begin x0 var byte
forever do a?x0 b!x0 od end
a
b
Buffer
passive port
Each circle mapped to a netlist
active port

Q element
a
b
Data path
192
Summary
  • Tangram program is partitioned into data path and
    control
  • Data path is implemented as dual or single rail
  • Control is mapped to composition of standard
    elements ( etc)
  • Each standard element is mapped to a circuit
  • Post-optimization is done
  • Composing islands of control elements and
    re-synthesis with STG can give more aggressive
    optimization
  • Philips made a few chips using Tangram, including
    a product 8051 micro-controller in low-power
    pager Muna (25 wks battery life from one AAA
    battery)
  • Similar approach used in Balsa(Manchester Univ.,
    public domain)

193
Burst mode FSM
  • Close to synchronous FSMs with binary encoded I/O
  • Work in bursts
  • Input transitions fire
  • Output transitions fire
  • State signals change
  • Mostly limited to fundamental mode next input
    burst cannot arrive before stabilization at the
    outputs

s1
b-/x-
ab/y
a-/xy-
s2
s4
c-/y
c/y-
s3
194
Extended Burst mode
  • Directed dont cares (b) some concurrency is
    allowed for input transitions that do not
    influence an output burst
  • Conditional guards ltbgt if b1 then

s1
b-/x-
ab/y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
s3
195
Synthesis of XBM
  • Next state and output functions free of
    functional and logic hazards
  • Sequential feedbacks should not introduce new
    hazards
  • State assignment
  • one state of the BM spec to one layer of Karnaugh
    map
  • compatible layers are merged
  • layers are compatible if merging does not
    introduce CSC violations or hazards
  • Layers are encoded using race free encoding

196
XBM and STG
x-
a
b
s1
b-/x-
ab/y
y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
a-
c
s3
eps
y-
c-
y-
x
y
b-
197
Summary
  • Specification XBM is subclass of STGs
  • Synthesis techniques are extensions of
    synchronous state assignment and logic
    minimization
  • Timing
  • environment is limited to fundamental mode
    (difficult for pipelined and highly concurrent
    systems)
  • internals are delay insensitive
  • See reference list for details

198
Summary
  • Specification Signal Transition
    Graph(formalized timing diagram)
  • Synthesis
  • state encoding
  • Boolean function derivation
  • algebraic and Boolean sequential decomposition
  • technology mapping
  • Timing
  • delay model implies timing constraints
  • exploiting timing assumptions leads to
    minimization and generates further assumptions
  • Future work
  • integrated flow
  • testing
Write a Comment
User Comments (0)
About PowerShow.com