Is the die cast for the token game? - PowerPoint PPT Presentation

About This Presentation
Title:

Is the die cast for the token game?

Description:

HDL: Standard (VHDL, Verilog) or Async design specific (Balsa) LPNs ... Verilog to PN/STGs (Torino) B(PN^2) to M-net translation (PEP tool) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 102
Provided by: alexy156
Category:
Tags: cast | die | game | token | verilog

less

Transcript and Presenter's Notes

Title: Is the die cast for the token game?


1
Is the die cast for the token game?
  • Alex Yakovlev , Frank Burns, Alex Bystrov, Delong
    Shang, Danil Sokolov
  • University of Newcastle upon Tyne
  • ICATPN02 Adelaide

2
Casting dice for old and new token games
3
What is this talk about?
  • Firstly, about the role of Petri nets in modern
    hardware design process (design flow), which is a
    gamble of its own
  • Secondly, about searching for the right way of
    deriving logic circuits (computational
    structures) from Petri nets (behavioural
    specifications)
  • However, I wont talk here about use of Petri
    nets for circuit verification

4
Int. Technology Roadmap for Semiconductors says
  • 2010 will bring a system-on-a-chip with
  • 4 billion 50-nanometer transistors, run at 10GHz
  • Moores law steady growth at 60 in the number
    of transistors per chip per year as the
    functionality of a chip doubles every 1.5-2
    years.
  • Technology troubles process parameter variation,
    power dissipation (IBM S/390 chip operation PICA
    video), clock distribution etc. present new
    challenges for Design and Test
  • But the biggest threat of all is design cost

5
Design productivity gap
From ITRS99
A design team of 1000 working for 3 years on a
MPU chip would cost some 1B (25 time spent on
verification, 45 on redesign after first
silicon)
6
Design costs and time to market
How to reduce them?
New design approaches to facilitate design
component re-use (IP cores), but there is a
problem of timing closure
New CAD methods to minimise costs of
verification, testing and re-design
7
Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
2000
Year
8
Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
Clocks have to be localised The number of Time
Zones increases to 1000s and more
2000
Year
9
Self-timed Systems
  • Get rid of global clocking and build systems
    based on handshaking
  • Globally asynchronous locally synchronous (GALS)
  • Design the whole system in a self-timed way
  • Whatever way is followed new CAD tools for
    self-timed design are needed

10
The Timing Mode Spectrum
Fully delay-insensitive
Speed-independent
Asynchronous (self-timed)
With relative timing and i/o mode
Burst-mode and fundamental mode
Globally asynchronous locally synchronous (GALS)
Multiple clock domains
Clock gating and distribution
Synchronous (globally/locally clocked)
Single clock
11
GALS module with stoppable clock
Asynchronous World
Clocked Domain
Req3
Req1
R
R
CL
Ack3
Ack1
Local CLK
Req4
Req2
Ack4
Ack2
Async-to-sync Wrapper
12
GALS an Example
EnIn1
EnOut1
Sync Unit 1
In1
Out1
clk1
RCIn1
RCOut1
Clock generator
ACIn1
ACOut1
Async Interface
A1
A2
R1
R2
Sync Unit 2
EnOut2
EnIn2
Out2
In2
clk2
RCOut2
RCIn2
Clock generator
ACOut2
ACIn2
13
GALS Petri net model
clk1-
MutexIn1
MutexOut1
Clk10
clk1
RCIn1
ACIn1
ACOut1
RCOut1
t
t
A1
R2
RCIn1
RCOut2
R1
A2
ACOut2
ACIn2
14
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

15
Motivation
  • Complex self-timed controllers still cannot be
    designed fully automatically and provably correct
    (cf. work at Philips, Theseus Logic, Fulcrum,
    Self-Timed Solutions)
  • It is important to interface to HL hardware
    description languages, e.g. VHDL, Verilog
    (standard for digital design) and/or Tangram,
    Balsa (CSP-based)
  • Success (90s) of behavioural synthesis for sync
    design
  • Parts of architectural synthesis (CDFG
    extraction, scheduling and allocation) are
    similar to sync. design
  • Synthesis of RTL control/sequencer and its
    implementation should be completely new for
    asynchronous circuits
  • Need for a good intermediate (back-end) language

16
Motivation (conted)
  • Existing logic synthesis tools (cf. Petrify and
    Minimalist) can only cope with small-scale low
    level designs (state-space explosion, limited
    optimisation heuristics)
  • Logic synthesis produces circuits whose structure
    does not correspond to their behaviour structure
    (bad for analysis and testing)
  • Syntax-direct translation techniques may be a way
    forward but applied at what level?

17
Motivation for use of Petri nets
  • Implications to new research targets on
  • Translation between HDLs and Petri nets,
    particularly formal underpinning of semantical
    links between front-end and back-end formats
  • New composition and decomposition techniques
    (incl. various forms of refinement and
    transformation) applied to labelled PNs
  • New circuit mapping and optimisation techniques
    for different types of models (under various
    delay-dependence or relative time assumptions and
    different signalling schemes)
  • Combination of direct mapping with logic
    synthesis (e.g. circuits with peep-hole
    optimisation)

18
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

19
Intermediate language
  • What is the most adequate formal language for the
    intermediate (still behavioural) level?
  • You dont need one at all - directly map syntax
    into circuit structure (Design flow 1)
  • Petri nets, at the level of Signal Transition
    Graph (STG), and then use logic synthesis (Design
    flow 2)

20
Design Flow 1 (e.g. Tangram or Balsa (currently))
HDL
Syntax-direct compilation
Handshake circuit netlist
Direct mapping with Burst Mode FSM peephole
optimisation
QDI circuit netlist
21
HDL syntax directed mapping
  • do
  • if (XA) then
  • par
  • OP1
  • OP2
  • rap
  • else
  • seq
  • OP3
  • OP4
  • qes
  • if
  • od

Control flow is transferred between HDL syntax
constructs rather than between operations
22
Pros and cons of Flow 1
  • Pros
  • Simple linear-size translation, guarantees high
    productivity
  • Allows local optimisation and re-synthesis of
    parts
  • Testing can be programmed at high-level
  • Cons
  • Lack of global optimisation
  • Circuit structure follows the parsing tree of the
    specification - this leads to low performance

23
Design Flow 2 (STG logic synthesis)
STG specification
Analysis and optimisation (consistency, CSC,
relative timing) Extras (e.g. refining to FC
subclass for structural methods)
Synthesisable STG
Logic synthesis (via full State Space or
structural methods)
QDI circuit netlist
24
Logic synthesis (STGs Petrify)
State graph
STG spec
States with state coding problem
Total no. of states is 24 but only 16 binary codes
25
Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch Estimated
area 16.00 Rout Aout' Rout csc2 Ain
csc0 csc0 Aout' csc1' csc2' Rin csc0
csc1 csc1 (csc0 Rin') Rout csc2
csc1' (csc2 csc0) Set/reset pins
reset(Rout) set(csc1)
Output from Petrify
csc0, csc1, csc2 state encoding signals
26
Logic synthesis (STGs Petrify)
Resulting state graph (with csc signals) has 59
states and no coding conflicts (coding space is
27128)
27
Logic synthesis (STGs Petrify)
What if the system gets bigger?
28
Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch.1-2
Estimated area 34.00 R1out A1out' R1out
csc2 csc3' R2out R2out A2out' csc4
Ain csc0 csc0 A1out' A2out' csc1 csc2
csc3 csc4' csc0 (csc1 csc4' csc3 Rin)
csc1 R2out' (csc0' Rin csc1) csc2
R1out' csc2 csc3 csc3 csc0' (R1out' Rin
csc2' csc3) csc4 csc1 (csc4 csc0)
Set/reset pins reset(R1out) reset(R2out)
reset(csc1) reset(csc2) reset(csc3)
Logic is asymmetric
Delay grows out of proportion
29
Pros and cons of Flow 2
  • Pros
  • Guarantees global optimality
  • Allows HDL to STG translation for more
    pragmatic front-end (e.g. BlunnoLavagno
    Verilog to STG translation) and allows
    model-checking together with synthesis (so makes
    design provably correct)
  • Cons
  • State space size is a problem
  • Solving state-coding in a good way is a problem

30
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

31
Towards new design flow
  • How to combine advantages of both approaches?
  • Use them at different levels
  • Introduce intermediate behavioural level -
    labelled Petri nets (LPNs)
  • Perform semantical (based on execution order)
    translation of HDLs to LPNs
  • Use direct mapping for large LPNs
  • Decompose control and use STGs and logic
    synthesis at the low level (apply structural
    methods e.g. Pastor at al.)

32
New Design Flow
HDL Standard (VHDL, Verilog) or Async design
specific (Balsa)
CDFG and LPN Compilation (semantic)
LPNs
Verification (coherence etc.) Optimisation
(scheduling, dummies, fanin/fanout)
Synthesisable LPN
Direct mapping
DC netlist
33
New Design Flow possible sources of useful
translation techniques
  • HDL to PN translation
  • VHDL to Extended Timed PNs (Linkoping)
  • VHDL to Control Data Flow Graphs (Lyngby)
  • Verilog to PN/STGs (Torino)
  • B(PN2) to M-net translation (PEP tool)
  • But none of them caters for a good PN structure
    needed for direct mapping from PNs to circuits
    (mostly to work via state space exploration, esp.
    in model-checking)

34
Design flow
HDL specification
Control/data splitting
Hierarchical control spec
Datapath spec
STG
LPN
STG to circuit synthesis (Petrify direct
mapping)
LPN to circuit synthesis (direct mapping)
Data logic synthesis
Data logic
Hierarchical control logic
Our present focus
Controldata interfacing
HDL implementation
35
HDL syntax directed mapping
  • do
  • if (XA) then
  • par
  • OP1
  • OP2
  • rap
  • else
  • seq
  • OP3
  • OP4
  • qes
  • if
  • od

Control flow is transferred between HDL syntax
constructs rather than between operations
36
HDL-to-LPN (high-level control)
  • do
  • if (XA) then
  • par
  • OP1
  • OP2
  • rap
  • else
  • seq
  • OP3
  • OP4
  • qes
  • if
  • od

High level control Labelled Petri net (LPN)
37
Labelled PNs and Datapath
  • LPN is defined as (PN,OP,L) underlying PN
    (P,T,F,M0), operation alphabet OP and labelling
    function LT-gtOP
  • Operations (typically assignments, comparisons,
    calls to macros such as arbiters) in OP are
    defined as signatures on the elements of datapath
    (e.g. lists of input/output registers R and
    operation units involved in the operation U),
    e.g. op(i)ltR,Ugt

38
Labelled PNs and Datapath
  • Operations in OP are associated with req,ack
    (two-way, for assignments, or multi-way for
    comparisons and arbitration) handshakes hence
    opr(i) and opa(i) signals
  • Interface with actual req and ack signals
    associated with registers in R and op-units in U
    is either synthesized via Petrify (low-level
    control) or hardwired using MUXes and DEMUXes

39
Low-level control
Low level control Signal Transition Graphs (STG)
40
Direct mapping of LPN to David cells
DC1
(XA)
(XltgtA)
dum
dum
High-level control logic directly mapped from LPN
DC4
DC2
OP1
OP2
OP3
DC5
DC3
OP4
dum
Basic David cell (DC)
41
Direct mapping cell library
42
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

43
Direct mapping vs logic synthesis conceptual
difference
  • Logic synthesis uses a Petri net (STG) as a
    generator of an encoded state-space. The circuit
    structure is not directly related to the net
    structure (though some correspondence exists and
    is exploited in structural logic synthesis
    methods, Pastor et al.)
  • Direct mapping considers a PN literally, as a
    prototype of the circuit structure (cf.
    Varshavskys use of term modelling circuit)

44
Direct mapping vs logic synthesis
  • Direct mapping has linear computational
    complexity but can be area inefficient (inherent
    one-hot encoding)
  • Logic synthesis has problems with state space
    explosion, and with recognition of repetitive and
    regular structures (log-based encoding approach)

45
Direct Translation of Petri Nets
  • Previous work dates back to 70s
  • Synthesis into event-based (two-phase) circuits
  • S.Patil, F.Furtek (MIT)
  • Synthesis into level-based (four-phase) circuits
  • R. David (69, translation of FSM graphs to CUSA
    cells)
  • L. Hollaar (82, translation from parallel
    flowcharts)
  • V. Varshavsky et al. (90,96, translation from
    PN into an interconnection of David Cells)
  • See various examples of synthesis in both styles
    in YakovlevKoelmans (Petri net lectures, LNCS,
    1998)

46
Patils set of modules
Circuit equivalent
Petri net fragment
wire
place
inverter
marked place
C-element
join
C
XOR
merge
fork
fan-out
Effectively RGD arbiter
shared (conflict) place
S
switch
s
47
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
48
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
49
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
Two phase (NRZ) protocol
pr
gr
pa
ga
50
Other useful elements
Select
Call
Toggle
51
Direct synthesis example(modulo-k Up-Down
counter)
Mod-k counter LPN
Environment LPN
52
Direct synthesis example(modulo-k Up-Down
counter)
Decomposition (structural view)
53
Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
54
Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
55
Direct synthesis example(modulo-k Up-Down
counter)
56
Synthesis into level-based circuits
  • Davids method for asynchronous Finite State
    Machines
  • Hollaars extensions to parallel flow charts
  • Varshavskys method for 1-safe persistent Petri
    nets based on associating places with latches
    the method works for closed (autonomous) circuits
    with no input choice, arbitration and inputs can
    only be part of handshakes activated by control
    logic

57
Davids original approach
a
x1
yb
x1
x2
b
d
ya
yc
c
x2
x1
x2
CUSA element for storing state b
Fragment of a State Machine flow graph
58
Hollaars approach
(0)
M
(1)
K
A
(1)
N
M
N
(1)
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of a flow-chart (allows parallelism)
One-hot circuit cell
59
Hollaars approach
1
M
0
K
A
(1)
N
M
N
0
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
60
Hollaars approach
1
M
0
K
A
(1)
N
M
N
1
B
(1)
L
L
K
0
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
61
Varshavskys Approach
Controlled
Operation
p1
p2
p2
p1
(0)
(1)
(1)
(0)
(1)
1
To Operation
62
Varshavskys Approach
p1
p2
p2
p1
0-gt1
1-gt0
(1)
(0)
(1)
1-gt0
To Operation
63
Varshavskys Approach
p1
p2
p2
p1
1-gt0
0-gt1
1-gt0
0-gt1
1
1-gt0-gt1
To Operation
64
Varshavskys Approach
  • This method associates places with latches
    (flip-flops) so the state memory (marking) of
    PN is directly mimicked in the circuits state
    memory
  • Transitions are associated with controlled
    actions (e.g. activations of data path units or
    lower level control blocks by using handshake
    protocols)
  • Modelling discrepancy (be careful!)
  • in Petri nets removal of a token from pre-places
    and adding tokens in post-places is instantaneous
    (i.e. no intermediate states)
  • in circuits the move of a token has a duration
    and there is an intermediate state

65
Direct mapping of LPNs and STGs
66
Fast David cell
Fast DC Timing assumptions GasP section
The same with negative gates
67
Implementability condition for LPNs
  • Autonomous control interpretation each
    transition is associated with a handshake to the
    controlled part (datapath) or a dummy
  • Implementability Any 1-safe labelled PN with
    autonomous control semantics of transitions with
    no loops of less than three transitions can be
    directly mapped into a speed-implemented control
    circuit whose behaviour is equivalent (bisimilar)
    to the PN
  • Consistency of labelling transitions labelled by
    reference to the same datapath blocks must be
    conistent with the local semantics of those
    blocks (e.g. must not be mutually concurrent)

68
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

69
Direct mapping of STGs
STG specification
Mapped circuit
Rout
Aout
Here all signal transitions are associated with
handshakes and handshake compression must be done
before mapping
70
What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2
  • Associate with each output transition a latch
    (one per signal x), with each input some sampling
    logic and set (for x) or reset (for x-)
    handshake - pull for inputs, and push for outputs.

inp1
inp1-
push
out1
out1
Output latch
out1-
71
What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2
Problem long delay between input event and
output response
mux demux logic
pull
inp2
inp2
Input sample
inp2-
push
out1
inp2
out1
out1
Output latch
out1-
72
What about direct mapping of arbitrary STGs
  • Another problem for direct mapping STGs may
    contain self-loops (or read arcs) for testing
    level-oriented inputs and outputs

x
x-
x1
y
73
Low latency approach
  • Can we connect inputs directly to the control
    structure to minimise the i/o latency?

out1
inp2
74
The problem of mapping STGs
  • Given an 1-safe STG
  • Target netlist of David cells, input wires and
    output flip-flops
  • Procedure use direct mapping of elements of
    underlying PN into elements of the netlist
  • Problem need for intermediate form of STG, where
    I/O is connected to control by read arcs only

75
Device environment interface
76
Device environment interface
Input wire
Output latch
tracker
To derive circuit implementation we only use
tracker and i/o subnets
77
Direct mapping
78
Optimisation
Removing places from the tracker. Latency
reduction effect if the place between an input
and the following output is removed. Coding
conflicts are possible. Places perform state
separation.
Tracker
Tracker
79
Optimisation coding conflicts
Input signal a changes twice between p1 and
p5. Keeping p3 solves the conflict and preserves
low latency.
80
Irreducible input coding conflicts
  • Certain input labelling cannot be implemented in
    a speed-independent way, without timing
    assumptions (e.g. input changes are slower than
    David cell operation) or without changing the I/O
    interface (introduce new outputs response to the
    environment)

inp0
inp0
Inseparable states (for the tracker)
81
Implementability of STGs
  • Sufficient condition
  • an STG with a 1-safe underlying PN with
    consistent signal transition labelling
    (transitions of the same signal are in precedence
    and /- alternate) and monotonic input bursts
    (for each connected input-labelled subgraph each
    signal changes only once)
  • NS condition is an open problem!

82
Main talk outline
  • Motivation design flow problems
  • Backend language Petri nets?
  • New design flow two-level control
  • Direct mapping of PNs event-based and
    level-based
  • Direct mapping of STGs
  • Case studies
  • Conclusion

83
Communication channel example
  • A duplex delay-insensitive channel for low power
    and pin-efficiency proposed by Steve Furber
    (AINT2002)
  • Relatively simple data path (with handshake
    access via push and pull protocols)
  • Sophisticated control (involves arbitration,
    choice and concurrency)
  • Natural two-level control decomposition
  • Requires low-latency (existing STG and BM
    solutions produce too heavy logic)

84
Channel Structure
N-of-M code
Master
Slave
N-of-M code
N-of-M codes dual-rail, 3-of-6,2-of-7
Key Protocol Symbols (e.g. in dual rail) Start
(01), Ack (10), Slave-Ack (11), Data (01 or 10)
85
Protocol Specification
Protocol Automaton
Master
Slave
The protocol can be defined on an imaginary
Protocol Automaton receiving symbols from both
sides (it will hide all activity internal to
Master and Slave)
86
Protocol Specification
Protocol Automaton
Master
Slave
87
Controller Overview
Data path and low level control
High Level control
push
push
push
pull
pull
88
Low-level logic
Tx controller
Sending interface
89
LPN model for high level control (master)
Calls to local arbiters
Slave-Ack pull
pulls
Three-way pushes
pushes
Three-way pulls
dummies inserted for direct DC mapping
90
High level control (master) mapped directly from
LPN
dummies
push
pull
push
push
pull
arbiter1
push
arbiter2
pull
pull
push
push
91
Towards synthesis for higher performance
push
dummy
pull
pull
Is the dummy in the right place? It is on the
cycle of (output) push and (input)
pull pull-gtdummy-gtpush-gtpull-dummy-gtpush -gt
92
Towards synthesis for higher performance
Critical path
push
Non-critical path
dummy
Synthesis rule Dont insert dummies on critical
paths
pull
93
Synthesis for lower I/O latency LPN level
High-level control
internal actions
pull
push
pull


Low latency shortcut
pull logic
push logic
pull logic
input
input
output

Environment (channel)
94
Channel Cycle Time
Controller Implementation Simplex mode Duplex mode
Direct mapping from LPN 7.6 ns 8.3 ns
Logic synthesis from STG 12.7 ns 16.5 ns
  • These results were obtained for 0.6 micro CMOS
  • Further improvement can be achieved by more use
    of low latency techniques (at the gate level) and
    introducing aggressive relative timing, in David
    cells and low level logic

95
Case study VME bus controller
96
Case study VME bus controller
97
Case study VME bus controller
98
Case study VME bus controller
  • Circuit generated by logic synthesis (Petrify)
  • Smaller, though comparable in size
  • Transistor stacks are larger

99
Case study VME bus controller
Latency comparison between our method and Petrify
solution.
Transition Petrify Fast DC
ldtack -gt d 0.35ns 0.29ns
ldtack -gt d- 0.20ns 0.16ns
d -gt dtack 0.27ns 0.27ns
dsw- -gt dtack- 0.42ns 0.44ns
ldtack- -gt lds (rd) 0.38ns 0.21ns
ldtack -gt lds (wr) 0.38ns 0.29ns
dsw- -gt lds- 0.33ns 0.26ns
Number of transistors 32 56
100
Conclusion
  • Hierarchical (eg. Protocol) controller synthesis
    can go via back-end LPN/STG models
  • Direct mapping from LPNs/STGs yields fast
    circuits that are easy to analyse and test
  • Translation from PNs to David cell netlists
    implemented in tool pn2dc
  • Translation from VHDL specs to LPNs and STGs
    implemented in tools fsm2lpn and fsm2stg
  • Further work needed on
  • Formal link between HDLs and PNs (semantics and
    equivalence), leading to better synthesis of PNs
    from HDLs
  • Optimisation techniques at LPN/STG and circuit
    levels
  • See our papers in Async02 and ISCAS02

101
Open problems
  • Formally characterise properties of PNs that
    make them good for circuit design, like
    optimality wrt I/O response time, worst/average
    case cycle time, positions of silent (dummy)
    events
  • Control (place/transition nets)datapath separate
    versus use of high-level nets for both
  • Testing via Petri nest specification (faults in
    PNs stuck tokens, transitions )
Write a Comment
User Comments (0)
About PowerShow.com