Title: Synthesis of synchronous elastic architectures
1Synthesis of synchronouselastic architectures
- Jordi Cortadella (Universitat Politècnica
Catalunya) - Mike Kishinevsky (Intel Corp.)
- Bill Grundmann (Intel Corp.)
2Network of Computing Units
Out
In
B3
B1
B2
3Network of Computing Units
Out
In
B3
B1
B2
4Network of Computing Units
Out
In
B3
B1
B2
5Latency-insensitive (elastic) system
Out
In
B3
B1
B2
Every block onlymakes one step when all inputs
are valid
6Why
- Scalable
- Modular (Plug Play)
- Tolerance to variable latency
- Communication
- Computation
- Not asynchronous
- Use existing design paradigms
- CAD tools
7Outline
- The cost of latency insensitivity
- SELF an elastic protocol
- Basic implementation (linear pipelines)
- General netlists (forks and joins)
- Formal models and verification
- Synthesis of elastic architectures
- Related work
8Latency-insensitive block
Whats the cost oflatency-insensitivity?
Core
Data
Data
9Communication channel
receiver
sender
Data
Data
Long wires slow transmission
10Pipelined communication
sender
receiver
Data
11Pipelined communication
sender
receiver
Data
12Pipelined communication
sender
receiver
Data
How about if the sender does not always send
valid data?
13The Valid bit
sender
receiver
Data
Data
Valid
Valid
14The Valid bit
sender
receiver
Data
Valid
15The Valid bit
sender
receiver
Data
Valid
16The Valid bit
sender
receiver
Data
Valid
17The Valid bit
sender
receiver
Data
Valid
How about if the receiver is not always ready ?
18The Stop bit
19The Stop bit
20The Stop bit
21The Stop bit
Back-pressure
22The Stop bit
Long combinational path
23Carlonis relay stations (double storage)
24Carlonis relay stations (double storage)
25Carlonis relay stations (double storage)
26Carlonis relay stations (double storage)
27Carlonis relay stations (double storage)
28Carlonis relay stations (double storage)
29Carlonis relay stations (double storage)
30Carlonis relay stations (double storage)
31Carlonis relay stations (double storage)
- Handshakes with short wires
- Double storage required
32Proposal an elastic protocol
- SELF (Synchronous ELastic Flow)
- Simple and provably correct
- Data-path with
- No area overhead
- No latency overhead
- Minimum energy
- Negligible control overhead
- Fine-grain elasticity
33Flip-flops vs. latches
sender
receiver
FF
FF
1 cycle
34Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
35Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
36Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
37Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
38Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
39Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
40Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
Flip-flops already have a double storage
capability, but
41Flip-flops vs. latches
sender
receiver
H
L
H
L
1 cycle
Not allowed in conventional FF-based design !
42Flip-flops vs. latches
sender
receiver
1 cycle
Lets make the master/slave latches independent
43Flip-flops vs. latches
sender
receiver
½ cycle
½ cycle
Lets make the master/slave latches independent
Only half of the latches (H or L) can move tokens
44Elastic buffer keeps datawhile stop is in flight
Cannot be done withSingle Edge Flops without
double pumping Use latches inside MS
W1R1
W2R1
W1R2
Carlonis relay station belongs to this class
W2R2
45Shorthand notation (clock lines not shown)
D
Q
En
En
clk
46SELF (linear communication)
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
Valid
Valid
1
1
1
1
Stop
Stop
S
S
S
S
47SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
48SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
49SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
50SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
51SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
52SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
0
Valid
Valid
0
Stop
Stop
S
S
S
S
53SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
0
Valid
Valid
0
Stop
Stop
S
S
S
S
54SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
0
Valid
Valid
0
Stop
Stop
S
S
S
S
55SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
0
Valid
Valid
0
Stop
Stop
S
S
S
S
56SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
0
Valid
Valid
0
Stop
Stop
S
S
S
S
57SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
58SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
59SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
60SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
61SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
62SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
63SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
64SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
65SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
1
Stop
Stop
S
S
S
S
66SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
67SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
68SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
69SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
70SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
71SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
72SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
73SELF
sender
receiver
Data
Data
En
En
En
En
V
V
V
V
1
Valid
Valid
0
Stop
Stop
S
S
S
S
74The protocol
Data
Valid
Sender
Receiver
Stop
75The protocol
D
Data
1
Valid
Sender
Receiver
0
Stop
Transfer cycle Valid 1 ? Stop 0
76The protocol
D
Data
1
Valid
Sender
Receiver
1
Stop
Retry cycle Valid 1 ? Stop 1
Persistency G V ? S ? (DataD) ? Next (V
? DataD)
77The protocol
Sender
Receiver
D D C C C B A
Data
Data
0 1 1 0 1 1 1 1 0 1
Valid
Valid
0 0 1 0 0 1 1 0 0 0
Stop
Stop
78Elastic Half Buffer
Latch
Data
Eni
EHB
Vi
Vi-1
Si
Si-1
79Join
V1
V
EHB
S1
EHB
S
V2
S2
EHB
80(Lazy) Fork
V
V1
S1
V2
S
S2
81Eager Fork
S1
V1
V
V2
S
S2
82Elastic combinational paths
Fork
Join
Join / Fork
83Elastic combinational paths
Enable signal to data latches
Fork
Join
Join / Fork
84Elastic combinational paths
Datapath
Fork
Join
Control layer
Join / Fork
85Elastic buffer formal model
i i1
ik
Din
Dout
Vin
Vout
rd
wr
Sout
Sin
Buffer 0..? Initial state rd wr
0 Invariant wr ? rd
86Elastic buffer formal model
i i1
ik
Din
Dout
Vin
Vout
rd
wr
Sout
Sin
- Liveness properties (finite unbounded latencies)
- Finite forward latency G (rd ? wr ? F
Vout) - Finite backward latency G( ?Sout ? F ?Sin)
87Formal verification
i i1
ik
Din
Dout
Vin
Vout
rd
wr
Sout
Sin
?
Din
Dout
Implementation
Vin
Vout
Sin
Sout
88Formal verification
- The abstract FSM model is appropriate for
compositional verification - Verification of implementations with model
checking (1-bit abstractions of the datapath) - Buffer is a refinement of the spec
- In-order data-transmission
- Correct synchronization of fork/join structures
- Absence of deadlocks
- LTL specs SMV
89Formal verification
Dout
Din
Abstract model (NFSM)
Abstract model (NFSM)
Vin
Vout
Sin
Sout
?
Din
Dout
Abstract model (NFSM)
Vin
Vout
Sin
Sout
90Formal verification
Dout
Din
Abstract model (NFSM)
Abstract model (NFSM)
Vin
Vout
Sin
Sout
?
Din
Dout
Abstract model (NFSM)
Vin
Vout
Sin
Sout
91Formal verification
Dout
Din
Abstract model (NFSM)
Abstract model (NFSM)
Vin
Vout
Sin
Sout
?
Assuming the sameinitial contents (e.g. empty)
Din
Dout
Abstract model (NFSM)
Vin
Vout
Sin
Sout
92Flow equivalence
Synchronous
D a b c d e f g h i j k
Elastic
D a a b b b c d e e f g g h i i i j k En 1 0
1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1
93Elasticization
Synchronous
Elastic
94CLK
95FORK
IF/ID
ID/EX
EX/MEM
MEM/WB
F O R K
J O I N
PC
J O I N
CLK
96FORK
J O I N
F O R K
J O I N
CLK
97FORK
J O I N
F O R K
J O I N
CLK
98FORK
J O I N
F O R K
J O I N
CLK
99 Elastic control layer Generation of gated
clocks
CLK
100Bubble (empty latch) insertion
- Does not change functionality(flow equivalence)
- May affect performance (recycling)
- Throughput may decrease (tokens / cycle)
- Cycle time may be shortened
- Performance (tokens / time unit) ???
- New dimension for architectural exploration(fine
granularity !!!)
101Variable-latency Units
0 - k cycles
done
go
102Variable-latency units
- Telescopic units
- 1 cycle for fast operations
- 2 cycles for slow operations
- Examples
- Short / long additions (carry propagation)
- A 0, A / 1
- Dynamic changes in latency(fast if cold, slow if
hot)
103Some of Related work
- Latency insensitive design
- Carloni and a few follow-ups (large overhead)
- Wire pipelining Svensson, Nookala, Casu,
- Interlock pipelines (H. Jacobson et al.)
- Asynchronous design
- Micropipelines (Sutherland)
- Rings (Williams, Sparso)
- CHP and slack-elasticity (Martin, Burns, Manohar
et al.) - De-synchronization
- J. Cortadella et al.
- V. Varshavsky
- Synchronous implementation of CSP
- J. OLeary et al.
- A. Peeters et al.
104Summary
- SELF adds discrete time handshaking to the clock
with a very small overhead buffering - Gives a last minute extra tick to anybody who
wants it - Compositional theory proving correctness (Krstic
et al., FMCAD06) - Libraries of controllers have been designed and
their correctness verified - Elasticization CAD in progress
- New micro-architectural opportunities based on
variable latency units