Title: Advanced Tutorial on Hardware Design and Petri nets
 1Advanced Tutorial on Hardware Design and Petri 
nets
- Jordi Cortadella Univ. Politècnica de Catalunya 
- Luciano Lavagno Università di Udine 
- Alex Yakovlev Univ. Newcastle upon Tyne
2Tutorial Outline
- Introduction 
- Modeling Hardware with PNs 
- Synthesis of Circuits from PN specifications 
- Circuit verification with PNs 
- Performance analysis using PNs
3Introduction.Outline
- Role of Hardware in modern systems 
- Role of Hardware design tools 
- Role of a modeling language 
- Why Petri nets are good for Hardware Design 
- History of relationship between Hardware Design 
 and Petri nets
- Asynchronous Circuit Design
4Role of Hardware in modern systems 
- Technology soon allows putting 1 billion 
 transistors on a chip
- Systems on chip is a reality  1 billion 
 operations per second
- Hardware and software designs are no longer 
 separate
- Hardware becomes distributed, asynchronous and 
 concurrent
5Role of Hardware design tools
- Design productivity is a problem due to chip 
 complexity and time to market demands
- Need for well-integrated CAD with simulation, 
 synthesis, verification and testing tools
- Modelling of system behaviour at all levels of 
 abstraction with feedback to the designer
- Design re-use is a must but with max technology 
 independence
6Role of Modelling Language
- Design methods and tools require good modelling 
 and specification techniques
- Those must be formal and rigorous and easy to 
 comprehend (cf. timing diagrams, waveforms,
 traditionally used by logic designers)
- Todays hardware description languages allow high 
 level of abstraction
- Models must allow for equivalence-preserving 
 refinements
- They must allow for non-functional qualities such 
 as speed, size and power
7Why Petri nets are good
- Finite State Machine is still the main formal 
 tool in hardware design but it may be inadequate
 for distributed, concurrent and asynchronous
 hardware
- Petri nets 
- simple and easy to understand graphical capture 
- modelling power adjustable to various types of 
 behaviour at different abstraction levels
- formal operational semantics and verification of 
 correctnes (safety and liveness) properties
- possibility of mechanical synthesis of circuits 
 from net models
8A bit of history of their marriage
- 1950s and 60s Foundations (Muller  Bartky, 
 Petri, Karp  Miller, )
- 1970s Toward Parellel Computations (MIT, 
 Toulouse, St. Petersburg, Manchester )
- 1980s First progress in VLSI and CAD, 
 Concurrency theory, Signal Transition Graphs
 (STGs)
- 1990s First asynchronous design (verification 
 and synthesis) tools SIS, Forcage, Petrify
- 2000s Powerful asynchronous design flow
9Introduction to Asynchronous Circuits
- What is an asynchronous circuit? 
- Physical (analogue) level 
- Logical level 
- Speed-independent and delay-insensitive circuits 
- Why go asynchronous? 
- Why control logic? 
- Role of Petri nets 
- Asynchronous circuit design based on Petri nets
10What is an asynchronous circuit
- No global clock circuits are self-timed or 
 self-clocked
- Can be viewed as hardwired versions of parallel 
 and distributed programs  statements are
 activated when their guards are true
- No special run-time mechanism  the program 
 statements are physical components logic gates,
 memory latches, or hierarchical modules
- Interconnections are also physical components 
 wires, busses
11Synchronous Design
Clock
Data input
Data
Register Sender
Register Receiver
Clock
Logic
 Tsetup
 Thold
Timing constraint input data must stay unchanged 
within a setup/hold window around clock event. 
Otherwise, the latch may fail (e.g. metastability) 
 12Asynchronous Design
Req(est)
Ack(nowledge)
Data input
Data
Register Sender
Register Receiver
Req
Logic
Ack
Req/Ack (local) signal handshake protocol instead 
of global clock Causal relationship Handshake 
signals implemented with completion detection in 
data path 
 13Physical (Analogue) level
- Strict view an asynchronous circuit is a 
 (analogue) dynamical system  e.g. to be
 described by differential equations
- In most cases can be safely approximated by logic 
 level (0-to-1 and 1-to-0 transitions)
 abstraction even hazards can be captured
- For some anomalous effects, such as metastability 
 and oscillations, absolute need for analogue
 models
- Analogue aspects are not considered in this 
 tutorial (cf. reference list)
14Logical Level
- Circuit behaviour is described by sequences of up 
 (0-to-1) and down (1-to-0) transitions on inputs
 and outputs
- The order of transitions is defined by causal 
 relationship, not by clock (a causes b, directly
 or transitively)
- The order is partial if concurrency is present 
- A class of async timed (yet not clocked!) 
 circuits allows special timing order relations (a
 occurs before b, due to delay assumptions)
15Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b 
 16Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b
x
y
out
a
b
Data flow graph 
 17Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b
x
req1
ack1
req3
ack3
y
out
ack2
a
req2
b
Data flow graph
Control flow graph  Petri net 
 18Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2 
 19Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2 
 20Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
0
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part 
 21Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
0-gt1
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part 
 22Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
0-gt1
x1
0
yx1x2(x1x2)y
C
y
0-gt1
x2
Set-part
Reset-part
excited 
 23Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
1
x1
0
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
excited 
 24Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
1
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
stable (new value) 
 25Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
1
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part 
 26Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
1-gt0
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part 
 27Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
1-gt0
x1
1
yx1x2(x1x2)y
C
y
1-gt0
x2
Set-part
Reset-part
excited 
 28Muller C-element
Key component in asynchronous circuit design  
behaves like a Petri net transition
0
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part
stable (new value) 
 29Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
It acts symmetrically for pairs of 0-1 and 1-0 
transitions  waits for both input events to occur 
 30Muller C-element
Key component in asynchronous circuit design  
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
It acts symmetrically for pairs of 0-1 and 1-0 
transitions  waits for both input events to occur 
 31Muller C-element
Power
NMOS circuit implementation
y
x1
x2
x1
x2
Ground 
 32Muller C-element
Power
y
x1
x2
x1
x2
Ground 
 33Muller C-element
Power
y
x1
x2
x1
x2
Ground 
 34Why asynchronous is good
- Performance (work on actual, not max delays) 
- Robustness (operationally scalable no clock 
 distribution important when gate-to-wire delay
 ratio changes)
- Low Power (change-based computing  fewer 
 signal transitions)
- Low Electromagnetic Emission (more even 
 power/frequency spectrum)
- Modularity and re-use (parts designed 
 independently well-defined interfaces)
- Testability (inherent self-checking via ack 
 signals)
35Obstacles to Async Design
- Design tool support  commercial design tools are 
 aimed at clocked systems
- Difficulty of production testing  production 
 testing is heavily committed to use of clock
- Aversion of majority of designers, trained with 
 clock  biggest obstacle
- Overbalancing effect of periodic (every 10 years) 
 asynchronous euphoria
36Why control logic
- Customary in hardware design to separate control 
 logic from datapath logic due to different design
 techniques
- Control logic implements the control flow of a 
 (possibly concurrent) algorithm
- Datapath logic deals with operational part of the 
 algorithms
- Datapath operations may have their (lower level) 
 control flow elements, so the distinction is
 relative
- Examples of control-dominated logic a bus 
 interface adapter, an arbiter, or a modulo-N
 counter
- Their behaviour is a combination of partial 
 orders of signal events
- Examples of data-dominated logic are a register 
 bank or an arithmetic-logic unit (ALU)
37Role of Petri Nets
- We concentrate here on control logic 
- Control logic is behaviourally more diverse than 
 data path
- Petri nets capture causality and concurrency 
 between signalling events, deterministic and
 non-deterministic choice in the circuit and its
 environment
- They allow 
- composition of labelled PNs (transition or place 
 sync/tion)
- refinement of event annotation (from abstract 
 operations down to signal transitions)
- use of observational equivalence (lambda-events) 
- clear link with state-transition models in both 
 directions
38Design flow with Petri nets
Abstract behaviour synthesis
Abstract behavioural model Labelled Petri nets 
(LPNs)
Signalling refinement
Timing diagrams
Verification and Performance analysis
Logic behavioural model Signal Transition Graphs 
(STGs)
STG-based logic synthesis (deriving boolean 
functions)
Syntax-direct translation (deriving circuit 
structure)
Decomposition and gate mapping
Circuit netlist
Library cells 
 39Tutorial Outline
- Introduction 
- Modeling Hardware with PNs 
- Synthesis of Circuits from PN specifications 
- Circuit verification with PNs 
- Performance analysis using PNs
40Modelling.Outline
- High level modelling and abstract refinement 
 processor example
- Low level modelling and logic synthesis 
 interface controller example
- Modelling of logic circuits event-driven and 
 level-driven parts
- Properties analysed 
41High-level modellingProcessor Example
Instruction Fetch
Instruction Execution 
 42High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 43High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 44High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 45High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 46High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 47High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 48High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 49High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 50High-level modellingProcessor Example
Instruction Fetch
Instruction Execution (not exactly yet!)
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 51High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 52High-level modellingProcessor Example
Instruction Fetch
Instruction Execution (now it is!)
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 53High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode 
One-word Instruction Execute 
Memory Read
Two-word Instruction Execute 
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode 
 Instruction Register Load 
 54High-level modellingProcessor Example
- The details of further refinement, circuit 
 implementation (by direct translation) and
 performance estimation (using UltraSan) are in
-  A. Semenov, A.M. Koelmans, L.Lloyd and A. 
 Yakovlev. Designing an asynchronous processor
 using Petri Nets, IEEE Micro, 17(2)54-64, March
 1997
- For use of Coloured Petri net models and use of 
 Design/CPN in processor modeling
-  F.Burns, A.M. Koelmans and A. Yakovlev. 
 Analysing superscala processor architectures with
 coloured Petri nets, Int. Journal on Software
 Tools for Technology Transfer, vol.2, no.2, Dec.
 1998, pp. 182-191.
-  
55Low-level ModellingInterface Example
- Insert VME bus figure 1  timing diagrams
56Low-level ModellingInterface Example
- Insert VME bus figure 2 - STG
57Low-level ModellingInterface Example
- Details of how to model interfaces and design 
 controllers are in
-  A.Yakovlev and A. Petrov, 
-  complete the reference
58Low-level ModellingInterface Example
- Insert VME bus figure 3  circuit diagram
59Logic Circuit Modelling 
Event-driven elements
Petri net equivalents
C
Muller C-element
Toggle 
 60Logic Circuit Modelling 
Level-driven elements
Petri net equivalents
y(0)
x0
x(1)
y1
y0
x1
NOT gate
x0
x(1)
z(0)
z1
y0
y(1)
b
NAND gate
x1
z0
y1 
 61Event-driven circuit example
- Insert the eps file for fast fwd pipeline cell 
-  control
62Level-driven circuit example
- Insert the eps file for the example with 
-  two inverters and OR gate
63Properties analysed
- Functional correctness (need to model 
 environment)
- Deadlocks 
- Hazards 
- Timing constraints 
- Absolute (need for Time(d) Petri nets) 
- Relative (compose with a PN model of order 
 conditions)
64Adequacy of PN modelling
- Petri nets have events with atomic action 
 semantics
- Asynchronous circuits may exhibit behaviour that 
 does not fit within this domain  due to inertia
a b
a
a
00
10
01
b
11
b 
 65Other modelling examples
- Examples with mixed event and level based 
 signalling
- Lazy token ring arbiter spec 
- RGD arbiter with mutex
66Lazy ring adaptor
Lr
R
dum
dum
G
Rr
La
D
Ra
t0 (token isnt initially here)
t1
t0 
 67Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t0-gt1-gt0 (token must be taken from the right and 
past to the left
t1
t0 
 68Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t1 (token is already here)
t1
t0 
 69Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t0-gt1 (token must be taken from the right)
t1
t0 
 70Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t1 (token is here)
t1
t0 
 71Tutorial Outline
- Introduction 
- Modeling Hardware with PNs 
- Synthesis of Circuits from PN specifications 
- Circuit verification with PNs 
- Performance analysis using PNs
72Synthesis.Outline
- Abstract synthesis of LPNs from transition 
 systems and characteristic trace specifications
- Handshake and signal refinement (LPN-to-STG) 
- Direct translation of LPNs and STGs to circuits 
-  Examples 
- Logic synthesis from STGs 
-  Examples
73Synthesis from trace specs
- Modelling behaviour in terms of characteristic 
 predicates on traces (produce LPN snippets)
- Construction of LPNs as compositions of snippets 
- Examples n-place buffer, 2-way merge 
74Synthesis from transition systems
- Modelling behaviour in terms of a sequential 
 capture  transition system
- Synthesis of LPN (distributed and concurrent 
 object) from TS (using theory of regions)
- Examples one place buffer, counterflow pp 
75Synthesis from process-based languages
- Modelling behaviour in terms of a process 
-  (-algebraic) specifications (CSP, ) 
- Synthesis of LPN (concurrent object with explicit 
 causality) from process-based model (concurrency
 is explicit but causality implicit)
- Examples modulo-N counter 
76Refinement at the LPN level
- Examples of refinements, and introduction of 
 silent events
- Handshake refinement 
- Signalling protocol refinement (return-to-zero 
 versus NRZ)
- Arbitration refinement 
- Brief comment on what is implemented in Petrify 
 and what isnt yet
77Translation of LPNs to circuits
- Examples of refinements, and introduction of 
 silent events
78Why direct translation?
- Logic synthesis has problems with state space 
 explosion, repetitive and regular structures
 (log-based encoding approach)
- Direct translation has linear complexity but can 
 be area inefficient (inherent one-hot encoding)
- What about performance?
79Direct Translation of Petri Nets
- Previous work dates back to 70s 
- Synthesis into event-based (2-phase) circuits 
 (similar to micropipeline control)
- S.Patil, F.Furtek (MIT) 
- Synthesis into level-based (4-phase) circuits 
 (similar to synthesis from one-hot encoded FSMs)
- R. David (69, translation FSM graphs to CUSA 
 cells)
- L. Hollaar (82, translation from parallel 
 flowcharts)
- V. Varshavsky et al. (90,96, translation from 
 PN into an interconnection of David Cells)
80Synthesis into event-based circuits
- Patils translation method for simple PNs 
- Furteks extension to 1-safe net 
- Pragmatic extensions to Patils set (for 
 non-simple PNs)
- Examples modulo-N counter, Lazy ring adapter
81Synthesis into level-based circuits
- Davids method for FSMs 
- Holaars extensions to parallel flow charts 
- Varshavskys method for 1-safe Petri nets 
- Examples counter, VME bus, butterfly circuit
82Davids original approach
a
x1
yb
x1
x2
b
d
ya
yc
c
x2
x1
x2
CUSA for storing state b
Fragment of flow graph 
 83Hollaars approach
(0)
M
(1)
K
A
(1)
N
M
N
(1)
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell 
 84Hollaars approach
1 
M
0
K
A
(1)
N
M
N
0
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell 
 85Hollaars approach
1 
M
0
K
A
(1)
N
M
N
1
B
(1)
L
L
K
0
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell 
 86Varshavskys Approach
Controlled
Operation
p1
p2
p2
p1
(0)
(1)
(1)
(0)
(1)
1
To Operation 
 87Varshavskys Approach
p1
p2
p2
p1
0-gt1
1-gt0
(1)
(0)
(1)
1-gt0 
 88Varshavskys Approach
p1
p2
p2
p1
1-gt0
0-gt1
1-gt0
0-gt1
1
1-gt0-gt1 
 89Translation in brief 
This method has been used for designing control 
of a token ring adaptor Yakovlev et al.,Async. 
Design Methods, 1995 The size of control was 
about 80 David Cells with 50 controlled hand 
shakes 
 90Direct translation examples
- In this work we tried direct translation 
- From STG-refined specification (VME bus 
 controller)
- Worse than logic synthesis 
- From a largish abstract specification with high 
 degree of repetition (mod-6 counter)
- Considerable gain to logic synthesis 
- From a small concurrent specification with dense 
 coding space (butterfly circuit)
- Similar or better than logic synthesis
91Example 1 VME bus controller
Result of direct translation (DC unoptimised) 
 92VME bus controller
After DC-optimisation (in the style of Varshavsky 
et al WODES96) 
 93David Cell library 
 94Data path control logic
Example of interface with a handshake control 
(DTACK, DSR/DSW) 
 95Example 2 Flat mod-6 Counter
- TE-like Specification 
-  ((p?q!)5p?c!) 
- Petri net (5-safe) 
q!
5
p?
5
c! 
 96Flat mod-6 Counter
Refined (by hand) and optimised (by Petrify) 
Petri net 
 97Flat mod-6 counter
Result of direct translation (optimised by hand) 
 98David Cells and Timed circuits
(a) Speed-independent
(b) With Relative Timing 
 99Flat mod-6 counter
(a) speed-independent
(b) with relative timing 
 100Butterfly circuit
STG after CSC resolution
Initial Specification
a
b
x
a
a-
z
y
x-
b-
a-
b
b-
y-
z- 
 101Butterfly circuit
Speed-independent logic synthesis solution 
 102Butterfly circuit 
Speed-independent DC-circuit 
 103Butterfly circuit 
DC-circuit with aggressive relative timing