Title: Quasi-static Scheduling for Reactive Systems
1Quasi-static Scheduling for Reactive Systems
- Jordi Cortadella, Universitat Politècnica de
Catalunya, Spain - Alex Kondratyev, Cadence Berkeley Labs, USA
- Luciano Lavagno, Politecnico di Torino, Italy
- Claudio Passerone, Politecnico di Torino, Italy
- Yosinori Watanabe, Cadence Berkeley Labs, USA
- Joint work with
- Robert Clarisó, Alex Kondratyev, Luciano
Lavagno, Claudio Passerone and Yosinori
Watanabe (UPC, Cadence Berkeley Labs,
Politecnico di Torino)
2Outline
- The problem
- Synthesis of concurrent specifications
- Previous work Dataflow networks
- Static scheduling of SDF networks
- Quasi-Static Scheduling of process networks
- Petri net representation of process networks
- Scheduling and code generation
- Open problems
3Embedded Software Synthesis
- Specification concurrent functional netlist
(Kahn processes, dataflow actors, SDL processes,
) - Software implementation (smaller) set of
concurrent software tasks - Two sub-problems
- Generate code for each task
- Schedule tasks dynamically
- Goals
- minimize real-time scheduling overhead
- maximize effectiveness of compilation
4Environmental controller
5Environmental controller
TEMP-FILTER float sample, last last
0 forever sample READ(TSENSOR) if
(sample - last gt DIF) last sample
WRITE(TDATA, sample)
TSENSOR
HSENSOR
TEMP FILTER
HUMIDITY FILTER
HDATA
TDATA
CONTROLLER
AC-on
DRYER-on
ALARM-on
6Environmental controller
TEMP-FILTER float sample, last last
0 forever sample READ(TSENSOR) if
(sample - last gt DIF) last sample
WRITE(TDATA, sample)
TSENSOR
HSENSOR
TEMP FILTER
HUMIDITY FILTER
HDATA
TDATA
HUMIDITY-FILTER float h, max forever h
READ(HSENSOR) if (h gt MAX) WRITE(HDATA, h)
CONTROLLER
AC-on
DRYER-on
ALARM-on
7Environmental controller
CONTROLLER float tdata, hdata forever
select(TDATA,HDATA) case TDATA tdata
READ(TDATA) if (tdata gt TFIRE)
WRITE(ALARM-on,10) else if (tdata gt
TMAX) WRITE(AC-on, tdata-TMAX) case HDATA
hdata READ(HDATA) if (hdata gt HMAX)
WRITE(DRYER-on, 5)
TSENSOR
HSENSOR
TEMP FILTER
HUMIDITY FILTER
HDATA
TDATA
CONTROLLER
AC-on
DRYER-on
ALARM-on
8Environ.
Processes
OS
Tsensor
T-FILTERwakes up
Operating system
T-FILTERexecutes
T-FILTERsleeps
Hsensor
H-FILTERwakes up
H-FILTERexecutes sends datato HDATA
H-FILTERsleeps
CONTROLLERwakes up
CONTROLLERexecutes reads datafrom HDATA
. . .
9Operating system
- Goal improve performance
- Reduce operating system overhead
- Reduce communication overhead
- How? Do as much as possible statically
- Scheduling
- Compiler optimizations
TSENSOR
HSENSOR
TEMP FILTER
HUMIDITY FILTER
HDATA
TDATA
CONTROLLER
AC-on
DRYER-on
ALARM-on
10Outline
- The problem
- Synthesis of concurrent specifications
- Previous work Dataflow networks
- Static scheduling of SDF networks
- Quasi-Static Scheduling of process networks
- Petri net representation of process networks
- Scheduling and code generation
- Open problems
11A bit of history
- Kahn process networks (58) formal model
- Karp computation graphs (66) seminal work
- Dennis Dataflow networks (75) programming
language for MIT DF machine - Lees Static Data Flow networks (86) efficient
static scheduling - Several recent implementations(Ptolemy, Khoros,
Grape, SPW, COSSAP, SystemStudio, DSPStation,
Simulink, )
12Intuitive semantics
- (Often stateless) actors perform computation
- Unbounded FIFOs perform communication via
sequences of tokens carrying values - (matrix of) integer, float, fixed point
- image of pixels, ..
- Determinacy
- unique output sequences given unique input
sequences - Sufficient condition blocking read
- (process cannot test input queues for emptiness)
13Intuitive semantics
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
i(-1)
i
? c1
? c2
o
14Examples of Dataflow actors
- SDF Static Dataflow fixed number of input and
output tokens - BDF Boolean Dataflow control token determines
number of consumed and produced tokens
1
1
1
T
F
select
merge
F
T
15Static scheduling of DF
- Key property of DF networks output sequences do
not depend on firing sequence of actors - SDF networks can be statically scheduled at
compile-time - execute an actor when it is known to be fireable
- no overhead due to sequencing of concurrency
- static buffer sizing
- Different schedules yield different
- code size
- buffer size
- pipeline utilization
16Balance equations
- Number of produced tokens must equal number of
consumed tokens on every edge - Repetitions (or firing) vector vS of schedule S
number of firings of each actor in S - vS(A) np vS(B) nc
- must be satisfied for each edge
np
nc
A
B
17Balance equations
A
2
3
2
1
1
1
B
C
1
1
- Balance for each edge
- 3 vS(A) - vS(B) 0
- vS(B) - vS(C) 0
- 2 vS(A) - vS(C) 0
- 2 vS(A) - vS(C) 0
18Balance equations
- M vS 0
- iff S is periodic
- Full rank (as in this case)
- no non-zero solution
- no periodic schedule
- (too many tokens accumulate on A?B or B?C)
19Balance equations
- Non-full rank
- infinite solutions exist (linear space of
dimension 1) - Any multiple of q 1 2 2T satisfies the
balance equations - ABCBC and ABBCC are minimal valid schedules
- ABABBCBCCC is non-minimal valid schedule
20Static SDF scheduling
- Main SDF scheduling theorem (Lee 86)
- A connected SDF graph with n actors has a
periodic schedule iff its topology matrix M has
rank n-1 - If M has rank n-1 then there exists a unique
smallest integer solution q to - M q 0
21From repetition vector to schedule
- Repeatedly schedule fireable actors up to number
of times in repetition vector - q 1 2 2T
- Can find either ABCBC or ABBCC
- If deadlock before original state, no valid
schedule exists (Lee 86)
22Compilation optimization
- Assumption code stitching
- (chaining custom code for each actor)
- More efficient than C compiler for DSP
- Comparable to hand-coding in some cases
- Explicit parallelism, no artificial control
dependencies - Main problem memory and processor/FU allocation
depends on scheduling, and vice-versa
23Code size minimization
- Assumptions (based on DSP architecture)
- subroutine calls expensive
- fixed iteration loops are cheap
- (zero-overhead loops)
- Global optimum single appearance schedule
- e.g. ABCBC ? A (2BC), ABBCC ? A (2B) (2C)
- may or may not exist for an SDF graph
- buffer minimization relative to single appearance
schedules - (Bhattacharyya 94, Lauwereins 96, Murthy 97)
24Buffer size minimization
- Assumption no buffer sharing
- Example
-
- q 100 100 10 1T
- Valid SAS (100 A) (100 B) (10 C) D
- requires 210 units of buffer area
- Better (factored) SAS (10 (10 A) (10 B) C) D
- requires 30 units of buffer areas, but
- requires 21 loop initiations per period (instead
of 3)
25Scheduling more powerful DF
- SDF is limited in modeling power
- More general DF is too powerful
- non-Static DF is Turing-complete (Buck 93)
- bounded-memory scheduling is not always possible
- Boolean Data Flow Quasi-Static Scheduling of
special patterns - if-then-else, repeat-until, do-while
- Dynamic Data Flow run-time scheduling
- may run out of memory or deadlock at run time
- Kahn Process Networks quasi-static scheduling
using Petri nets - conservative schedulable network may be declared
unschedulable
26Outline
- The problem
- Synthesis of concurrent specifications
- Compiler optimizations across processes
- Previous work Dataflow networks
- Static scheduling of SDF networks
- Code and data size optimization
- Quasi-Static Scheduling of process networks
- Petri net representation of process networks
- Scheduling and code generation
- Open problems
27Quasi-Static Scheduling
- Sequentialize concurrent operations as much as
possible - less communication overhead (run-time task
generation) - better starting point for compilation
(straight-line code from function blocks) - Must handle
- data-dependent control
- multi-rate communication
28The problem
- Given a network of Kahn processes
- Kahn process sequential function ports
- communication port-based, point-to-point,
uni-directional, multi-rate - Find a single task
- functionally equivalent to the originalnetwork
(modulo concurrency) - driven by input stimuli(no OS intervention)
TSENSOR
HSENSOR
TEMP FILTER
HUMIDITY FILTER
HDATA
TDATA
CONTROLLER
AC-on
DRYER-on
ALARM-on
29The scheduling procedure
- 1. Specify a network of processes
- process C communication operations
- netlist connection between ports
- 2. Translate to the computational model Petri
nets - 3. Find a schedule on the Petri net
- 4. Translate the schedule to a task
30TSENSOR
TSENSOR
TEMP FILTER
last 0
TDATA
sample READ(TSENSOR)
TEMP-FILTER float sample, last last 0 while
(1) sample READ(TSENSOR) if (sample -
lastgt DIF) last sample
WRITE(TDATA, sample)
F
T
last sample WRITE(TDATA,sample)
TDATA
31Petri nets for Kahn process networks
Sequential processes (1 token per process)
Input/Output ports (communication with the
environment)
Channels (point-to-point communication between
processes)
32Petri nets for Kahn process networks
True
True
False
False
- Data-dependent choices
- Conservative assumption (any outcome is possible)
33Scheduling game
Adversary
Scheduler
t1
t2
t3
Data choice inputs
The rest of transitions
t4
t5
t6
t1
t2
t1
t3
t4
t5
t6
34Scheduling game
Adversary
Scheduler
t1
t2
t3
Data choice inputs
The rest of transitions
t4
t5
t6
t1
t2
t1
t3
t4
t5
?
t6
35Scheduling game
Adversary
Scheduler
t1
t2
t3
Data choice inputs
The rest of transitions
t4
t5
?
t6
36Schedule generation
p0
- Schedule is an RG subset
- Finite
- Sequential
- Live wrt to source transitions
- All FCS transitions are fired in a state
(FCS always conflicting transitions)
Depth first traversal with backtracking
37Schedule generation
Await states
38Handling infinity
PN with source transitions has infinite
reachability space
Need for termination conditions during traversal
- Irrelevance Criterion
- Impose place bounds by the structure of the PN.
- Identify irrelevant nodes in the reachability
tree. - If the algorithm hits an irrelevant node,
backtrack.
Bounds the reachability space!!!
39Irrelevance criterion
bound of placemax of
v is irrelevant node iff
max(34-1, 1) 6
1. v succeds u,
2. ?p, M(u, p) ? M(v, p),
3. ?p, if M(u, p) lt M(v, p), then M(u, p) ?
the bound of p.
v is as at least capable as u u already hits the
bounds
Irrelevance is more than marking, it is
markinghistory!!!
40Quality of irrelevance criterion
Heuristic for the general Petri nets
irrelevant
For unique and/or free choice PNs irrelevance
may be exact (if yes, then schedulability is
decidable in this class)
Open issue
41Properties of the Algorithm
- Claim1
- If the algorithm terminates successfully, a
schedule is obtained. - Claim2
- If the algorithm does NOT terminate successfully,
no schedule exists under given termination
conditions
Semi-decision procedure!!!
42Divide and conquer
43Divide and conquer
44Checking SSS independence
Marking equations
Consumption of tokens
SSS independence
N. and S. condition
M0(p) worst_change(p,a)
SSS_change(p,a) ? 0
Worst consumption of p in SSS(a)
Worst consumption of p in other SSSs
Complexity of checking O(? SSS)
Composition has exponentially larger number of
states!!!
45Code generation
Initialization
I1
system
Await state
I1
I2
I2
- Generated code
- ISRs driven by input stimuli (I1 and I2)
- Each tasks contains threads from one await
state to another await state
Choice
I1
I2
T
F
F
T
I1
I2
46Code generation
I1
system
I1
I2
I2
- Generated code
- ISRs driven by input stimuli (I1 and I2)
- Each tasks contains threads from one await
state to another await state
I1
I2
T
F
F
T
I1
I2
47Code generation
Init
I1
system
I1
I2
I2
C9
C1
C4
- Generated code
- ISRs driven by input stimuli (I1 and I2)
- Each tasks contains threads from one await
state to another await state
C5
C2
C3
C11
F
I2
I1
I1
I2
C8
C6
C10
C7
T
48Code generation
enum state S1, S2, S3 S
C0
I1
I2
C9
C1
C4
C5
C2
C3
C11
F
I2
I1
I1
I2
C8
C6
C10
C7
T
49Code generation
enum state S1, S2, S3 S Init () C0() S
S1 return
C0
50Code generation
enum state S1, S2, S3 S ISR1 ()
switch(S) case S1 C1() C2() SS2
return case S2 C3() C2() return case
S3 C6() C7() C11() C5() return
I1
C1
C5
C2
C3
C11
I1
I1
C6
C7
51Code generation
enum state S1, S2, S3 S
ISR2 () switch(S) case S1 C4()
C5() SS3 break case S2 C10() C11()
C5() SS3 return case S3 if (C8())
C7() C11() C5() return
else C9() S
S1 return
I2
C9
C4
C5
C11
F
I2
I2
C8
C10
C7
T
52Code generation
enum state S1, S2, S3 S Init () C0() S
S1 return ISR1 () switch(S) case
S1 C1() C2() SS2 return case S2 C3()
C2() return case S3 C6() C7() C11()
C5() return ISR2 () switch(S)
case S1 C4() C5() SS3 break case S2
C10() C11() C5() SS3 return case S3 if
(C8()) C7() C11() C5()
return else
C9() S S1 return
C0
I1
I2
C9
C1
C4
C5
C2
C3
C11
F
I2
I1
I1
I2
C8
C6
C10
C7
T
53Code generation
enum state S1, S2, S3 S Init () C0() S
S1 return ISR1 () switch(S) case
S1 C1() C2() SS2 return case S2 C3()
C2() return case S3 C6() C7() C11()
C5() return ISR2 () switch(S)
case S1 C4() C5() SS3 break case S2
C10() C11() C5() SS3 return case S3 if
(C8()) C7() C11() C5()
return else
C9() S S1 return
Reset
Init ()
S
I1
ISR1 ()
I2
ISR2 ()
54Experimental Results
Thdr
Thdr
QSS
Tvld
TdecMV
Tpredict
Tvld
TdecMV
Tpredict
Tisiq
Tidct
Tadd
Tisiq
Tidct
Tadd
- QSS applied to a subset of the MPEG-2 Decoder
- (5 processes out of the original 11)
55The MPEG2 decoder
- Performance increased by 45
- reduction of communication (no internal FIFOs
between statically scheduled processes) - reduction of run-time scheduling (OS)
- no reduction in computation
56Open problems
- Is a system schedulable ? (decidability)
- False paths in concurrent systems(data
dependencies) - Synthesis for concurrent architectures
- Timing models
57(Quasi) Static Scheduling approaches
- Lee et al. 86 Static Data Flow cannot specify
data-dependent control - Buck et al. 94 Boolean Data Flow undecidable
schedulability check, heuristic pattern-based
algorithm - Thoen et al. 99 Event graph no schedulability
check, no task minimization - Lin 97 Safe Petri Net no schedulability check,
single-rate, reachability-based algorithm - Thiele et al. 99 Bounded Petri Net partial
schedulability check, reachability-based
algorithm - Cortadella et al. 00 General Petri Net maybe
undecidable schedulability check, balance
equation-based algorithm
58The false path problem (example)
while (true) a rnd() Write(ct, a, 1)
if (a gt 0.5) Write(dt, d, 2) else
Write(dt, d, 1)
while (true) Read(ct, a, 1) if (a gt 0.5)
Read(dt, d, 2) else
Read(dt, d, 1)
ct
dt
Process P1
Process P2
If P1 does Write(dt, d, 2), then P2 never does
Read(dt, d, 1), i.e. this path is false!
59False path elimination
- The designer manually tags sets of correlated
conditions - An implicit data-dependent correlation become
explicit communication and control-dependent
synchronization - the tool automatically adds synchronization
channels to model correlation - Scheduling is then possible
- synchronization channels can be deleted after
code generation (no overhead in final
implementation)
60The false path problem (example)
while (true) a rnd() Write(ct, a, 1)
if (a gt 0.5) Write(dt, d, 2) else
Write(dt, d, 1)
while (true) Read(ct, a, 1) if (a gt 0.5)
Read(dt, d, 2) else
Read(dt, d, 1)
ct
dt
Process P1
Process P2
61False path elimination algorithm
port
pragma tag sync if (cond1) Write(syncT, d,
1) stm1 else Write(syncF, d, 1)
stm2
pragma tag sync (void) (cond2) switch(Select(syn
cT,syncF))
pragma tag sync (void) (cond2) switch(Select(syn
cT,syncF)) case 0 Read(syncT, d, 1) stm3
break case 1 Read(syncF, d, 1) stm4
break
Process P1
Process P2
62False path elimination algorithm
- 1. For each correlated control pair, add two
ports lttaggtT and lttaggtF to processes P1 and P2
and connect them. - 2. Add Write statements at the beginning of both
branches of if-then-else, writing on the created
ports in process P1. - 3. Delete if-then-else from process P2 and add a
switch on the output of a Select statement on the
created ports. - 4. Fill in the case clauses with the appropriate
code from the branches in process P2, reading
data from the created ports. - 5. Finally, apply QSS and eliminate the added
synchronization.
63False path elimination algorithm
(void) (cond2) if (cond1) Write(syncT,
P1_d, 1) Read(syncT, P2_d, 1) stm1
stm3 else Write(syncF, P1_d, 1)
Read(syncF, P2_d, 1) stm2 stm4
(void) (cond2) if (cond1) stm1 stm3
else stm2 stm4
port
pragma tag sync if (cond1) Write(syncT, d,
1) stm1 else Write(syncF, d, 1)
stm2
pragma tag sync (void) (cond2) switch(Select(syn
cT,syncF)) case 0 Read(syncT, d, 1) stm3
break case 1 Read(syncF, d, 1) stm4
break
syncT
syncF
Process P1
Process P2
QSS
Process P
64Conclusions
- QSS shows significant gains in real examples
- Current theory has several open problems
- Future extensions are
QSS