Title: Elasticity and petri nets
1Elasticity and petri nets
- Jordi Cortadella, Universitat Politecnica de
Catalunya, Barcelona - Mike Kishinevsky, Intel Corp., Strategic CAD
Labs, Hillsboro
2Moores law
Source Intel Corp.
3Is the GHz race over ?
4Many-Core is here
Source Intel Corp.
5(No Transcript)
6Why this tutorial ?
- Digital circuits are complex concurrent systems
- Variability and power consumption are key
critical aspects in deep submicron technologies - Multi (many)-core systems will become a novel
paradigm - System design
- Applications
- Concurrent programming
- Theory of concurrency may play a relevant role in
this new scenario
7Elasticity
- Tolerance to delay variability
- Different forms of elasticity
- Asynchronous no clock
- Synchronous variability synchronized with a
clock - In all forms of elasticity, token-based
computations are performed(req/ack, valid/stop
signals are used)
8Outline
- Asynchronous elastic systems
- The basics circuits and elasticity
- Synthesis of asynchronous circuits from Petri
nets - Modern methods for the synthesis of large
controllers - De-synchronization from synchronous to
asynchronous - Synchronous elastic systems
- Basics of synchronous elastic systems
- Early evaluation and performance analysis
- Optimization of elastic systems and their
correctness
9The basicscircuits and elasticity
10Outline
- Gates, latches and flip-flops.Combinational and
sequential circuits. - Basic concepts on asynchronous circuit design.
- Petri net models for asynchronous controllers.
Signal Transition Graphs.
11Boolean functions
- Composed from logic gates
a
x
b
a
y
b
a
b
z
c
d
12Memory elements latches
Q
D
Q
D
L
H
En
En
Active high En 0 (opaque) Q prev(Q) En
1 (transparent) Q D
Active low En 1 (opaque) Q prev(Q) En
0 (transparent) Q D
13Memory elements flip-flop
Q
D
Q
L
H
D
FF
CLK
CLK
CLK
D
Q
14Finite-state automata
Inputs
Ouputs
CL
STATE
- Output function
- Next-state function
CLK
15Network of Computing Units
Out
In
B3
B1
B2
No combinational cycles
16Marked Graph Model
Circuit
Register
Combinational logic
Marked graph
17Basic concepts on asynchronous circuit design
18Outline
- What is an asynchronous circuit ?
- Asynchronous communication
- Asynchronous design styles (Micropipelines)
- Asynchronous logic building blocks
- Control specification and implementation
- Delay models and classes of async circuits
- Channel-based design
- Why asynchronous circuits ?
19Synchronous circuit
Implicit (global) synchronization between
blocks Clock period gt Max Delay (CL R)
20Asynchronous circuit
Ack
R
R
R
R
CL
CL
CL
Req
Explicit (local) synchronization Req / Ack
handshakes
21Motivation for asynchronous
- Asynchronous design is often unavoidable
- Asynchronous interfaces, arbiters etc.
- Modern clocking is multiphase and distributed
and virtually asynchronous (cf. GALS next
slide) - Mesachronous (clock travels together with data)
- Local (possibly stretchable) clock generation
- Robust asynchronous design flow is coming(e.g.
VLSI programming from Philips, Balsa fromUniv.
of Manchester, NCL from Theseus Logic )
22Globally Async Locally Sync (GALS)
Asynchronous World
Clocked Domain
Req3
Req1
R
R
CL
Ack3
Ack1
Local CLK
Req4
Req2
Ack4
Ack2
Async-to-sync Wrapper
23Key Design Differences
- Synchronous logic design
- proceeds without taking timing correctness(hazard
s, signal acking etc.) into account - Combinational logic and memory latches(registers)
are built separately - Static timing analysis of CL is sufficient
todetermine the Max Delay (clock period) - Fixed setup and hold conditions for latches
24Key Design Differences
- Asynchronous logic design
- Must ensure hazardfreedom, signal acking,local
timing constraints - Combinational logic and memory latches
(registers) are often mixed in complex gates - Dynamic timing analysis of logic is needed to
determine relative delays between paths - To avoid complex issues, circuits may be builtas
Delay-insensitive and/or Speed-independent (as
discussed later)
25Synchronous communication
1
1
0
0
1
0
- Clock edges determine the time instants where
data must be sampled - Data wires may glitch between clock
edges(setup/hold times must be satisfied) - Data are transmitted at a fixed rate(clock
frequency)
26Dual rail
1
1
1
0
0
0
- Two wires with L(low) and H (high) per bit
- LL spacer, LH 0, HL 1
- nbit data communication requires 2n wires
- Each bit is self-timed
- Other delay-insensitive codes exist (e.g.
k-of-n)and eventbased signalling (choice
criteria pin and power efficiency)
27Bundled data
1
1
0
0
1
0
- Validity signal
- Similar to an aperiodic local clock
- nbit data communication requires n1 wires
- Data wires may glitch when no valid
- Signaling protocols
- level sensitive (latch)
- transition sensitive (register) 2phase / 4phase
28Example memory read cycle
Valid address
Address
A
A
Valid data
Data
D
D
- Transition signaling, 4-phase
29Example memory read cycle
Valid address
A
A
Address
Valid data
Data
D
D
- Transition signaling, 2-phase
30Asynchronous modules
DATA PATH
Data IN
Data OUT
start
done
req in
req out
CONTROL
ack in
ack out
- Signaling protocol
-
- reqin start computation done reqout
ackout ackinreqin- start- reset
done- reqout- ackout- ackin-(more
concurrency is also possible)
31Asynchronous latches C element
Vdd
A
B
Z
A
B
Z
A
B
Z
Static Logic Implementation
A
B
van Berkel 91
Gnd
32C-element Other implementations
Vdd
A
Weak inverter
B
Z
B
A
Dynamic
Quasi-Static
Gnd
33Dual-rail logic
Dual-rail AND gate
Valid behavior for monotonic environment
34Completion detection
Dual-rail logic
35Differential cascode voltage switch logic
start
Z.t
Z.f
done
A.t
N-type transistor network
A.f
B.f
C.f
B.t
C.t
start
3input AND/NAND gate
36Example of dual-rail design
- Asynchronous dual-rail ripple-carry adder(A.
Martin, 1991) - Critical delay is proportional to logN(Nnumber
of bits) - 32bit adder delay (1.6m MOSIS CMOS) 11 ns
versus 40 ns for synchronous - Async cell transistor count 34versus
synchronous 28
37Bundled-data logic blocks
Single-rail logic
Conventional logic matched delay
38Micropipelines (Sutherland 89)
Micropipeline (2-phase) control blocks
Request-Grant-Done (RGD)Arbiter
Join
Merge
out0
in
out1
Select
Toggle
Call
39Micropipelines (Sutherland 89)
Aout
Ain
C
L
L
L
L
logic
logic
logic
Rin
Rout
40Data-path / Control
L
L
L
L
logic
logic
logic
Rin
Rout
CONTROL
Ain
Aout
41Control specification
A
A
B
B
A
A input B output
B
42Control specification
A
B
B
A
A
B
43Control specification
A
B
A
C
C
B
A
B
C
44Control specification
A
B
A
C
C
A
B
B
C
45Control specification
46A simple filter specification
IN
Rin
Ain
y 0 loop x READ (IN) WRITE (OUT,
(xy)/2) y x end loop
filter
Aout
Rout
OUT
47A simple filter block diagram
- x and y are level-sensitive latches (transparent
when R1) - is a bundled-data adder (matched delay between
Ra and Aa) - Rin indicates the validity of IN
- After Ain the environment is allowed to change
IN - (Rout,Aout) control a level-sensitive latch at
the output
48A simple filter control spec.
49A simple filter control impl.
50Taking delays into account
- Delay assumptions
- Environment 3 time units
- Gates 1 time unit
events x ? x ? y ? z ? z ? x ? x ? z
? z ? y ?
time 3 4 5 6 7
9 10 12 13 14
51Taking delays into account
x
x
y
z
z
very slow
Delay assumptions unbounded delays
events x ? x ? y ? z ? x ? x ? y
failure !
time 3 4 5 6 9
10 11
52WHY ASYNCHRoNOUS ?
53Motivation (designers view)
- Modularity for system-on-chip design
- Plug-and-play interconnectivity
- Average-case peformance
- No worst-case delay synchronization
- Many interfaces are asynchronous
- Buses, networks, ...
54Motivation (technology aspects)
- Low power
- Automatic clock gating
- Electromagnetic compatibility
- No peak currents around clock edges
- Security
- No electromagnetic difference between logical
0 and 1in dual rail code - Robustness
- High immunity to technology and environment
variations (temperature, power supply, ...)
55Dissuasion
- Concurrent models for specification
- CSP, Petri nets, ... no more FSMs
- Difficult to design
- Hazards, synchronization
- Complex timing analysis
- Difficult to estimate performance
- Difficult to test
- No way to stop the clock