Data Flow Models - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Data Flow Models

Description:

Von Neumann imperative language style: program counter is king ... Largely a failure: memory spaces anathema to the dataflow formalism. Applications of Dataflow ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 66
Provided by: ryanka5
Category:
Tags: data | flow | formalism | models

less

Transcript and Presenter's Notes

Title: Data Flow Models


1
Data Flow Models
  • ECE 253 Embedded System Design
  • Ryan Kastner
  • February 5, 2007

2
Philosophy of Dataflow Languages
  • Drastically different way of looking at
    computation
  • Von Neumann imperative language style program
    counter is king
  • Dataflow language movement of data the priority
  • Scheduling responsibility of the system, not the
    programmer

3
Dataflow Language Model
  • Processes communicating through FIFO buffers

Process 2
Process 1
FIFO Buffer
FIFO Buffer
FIFO Buffer
Process 3
4
Dataflow Languages
  • Every process runs simultaneously
  • Processes can be described with imperative code
  • Compute compute receive compute transmit
  • Processes can only communicate through buffers

5
Dataflow Communication
  • Communication is only through buffers
  • Buffers usually treated as unbounded for
    flexibility
  • Sequence of tokens read guaranteed to be the same
    as the sequence of tokens written
  • Destructive read reading a value from a buffer
    removes the value
  • Much more predictable than shared memory

6
Dataflow Languages
  • Once proposed for general-purpose programming
  • Fundamentally concurrent should map more easily
    to parallel hardware
  • A few lunatics built general-purpose dataflow
    computers based on this idea
  • Largely a failure memory spaces anathema to the
    dataflow formalism

7
Applications of Dataflow
  • Not a good fit for, say, a word processor
  • Good for signal-processing applications
  • Anything that deals with a continuous stream of
    data
  • Becomes easy to parallelize
  • Buffers typically used for signal processing
    applications anyway

8
Applications of Dataflow
  • Perfect fit for block-diagram specifications
  • Circuit diagrams
  • Linear/nonlinear control systems
  • Signal processing
  • Suggest dataflow semantics
  • Common in Electrical Engineering
  • Processes are blocks, connections are buffers

9
Kahn Process Networks
Wait()
Send()
  • Proposed by Kahn in 1974 as a general-purpose
    scheme for parallel programming
  • Laid the theoretical foundation for dataflow
  • Unique attribute deterministic
  • Difficult to schedule
  • Too flexible to make efficient, not flexible
    enough for a wide class of applications
  • Never put to widespread use

10
Kahn Process Networks
  • Key idea
  • Reading an empty channel blocks until data is
    available
  • No other mechanism for sampling communication
    channels contents
  • Cant check to see whether buffer is empty
  • Cant wait on multiple channels at once

11
Kahn Processes
  • A C-like function (Kahn used Algol)
  • Arguments include FIFO channels
  • Language augmented with send() and wait()
    operations that write and read from channels

12
A Kahn Process
  • From Kahns original 1974 paper
  • process f(in int u, in int v, out int w)
  • int i bool b true
  • for ()
  • i b ? wait(u) wait(w)
  • printf("i\n", i)
  • send(i, w)
  • b !b

u
f
w
v
What does this do?
Process alternately reads from u and v, prints
the data value, and writes it to w
13
A Kahn Process
  • From Kahns original 1974 paper
  • process f(in int u, in int v, out int w)
  • int i bool b true
  • for ()
  • i b ? wait(u) wait(w)
  • printf("i\n", i)
  • send(i, w)
  • b !b

14
A Kahn Process
  • From Kahns original 1974 paper
  • process g(in int u, out int v, out int w)
  • int i bool b true
  • for()
  • i wait(u)
  • if (b) send(i, v) else send(i, w)
  • b !b

v
g
u
w
What does this do?
Process reads from u and alternately copies it to
v and w
15
A Kahn Process
  • From Kahns original 1974 paper
  • process h(in int u, out int v, int init)
  • int i init
  • send(i, v)
  • for()
  • i wait(u)
  • send(i, v)

h
u
v
What does this do?
Process sends initial value, then passes through
values.
16
A Kahn System
  • What does this do?

Prints an alternating sequence of 0s and 1s
Emits a 1 then copies input to output
h init 1
f
g
h init 0
Emits a 0 then copies input to output
17
Determinacy
x1,x2,x3
y1,y2,y3
F
  • Process continuous mapping of input sequence
    to output sequences
  • Continuity process uses prefix of input
    sequences to produce prefix of output sequences.
    Adding more tokens does not change the tokens
    already produced
  • The state of each process depends on token values
    rather than their arrival time
  • Unbounded FIFO the speed of the two processes
    does not affect the sequence of data values

18
Proof of Determinism
  • Because a process cant check the contents of
    buffers, only read from them, each process only
    sees sequence of data values coming in on buffers
  • Behavior of process
  • Compute read compute write read compute
  • Values written only depend on program state
  • Computation only depends on program state
  • Reads always return sequence of data values,
    nothing more

19
Determinism
  • Another way to see it
  • If Im a process, I am only affected by the
    sequence of tokens on my inputs
  • I cant tell whether they arrive early, late, or
    in what order
  • I will behave the same in any case
  • Thus, the sequence of tokens I put on my outputs
    is the same regardless of the timing of the
    tokens on my inputs

20
Adding Nondeterminism
  • Allow processes to test for emptiness
  • Allow processes themselves to be nondeterminate
  • Allow more than one process to read from a
    channel
  • Allow more than one process to write to a channel
  • Allow processes to share a variable

21
Scheduling Kahn Networks
  • Challenge is running processes without
    accumulating tokens

C
A
B
22
Scheduling Kahn Networks
  • Challenge is running processes without
    accumulating tokens

C
A
Only consumes tokens from A
Tokens will accumulate here
B
Always emit tokens
23
Demand-driven Scheduling?
  • Apparent solution only run a process whose
    outputs are being actively solicited
  • However...

C
A
Always consume tokens
B
D
Always produce tokens
24
Other Difficult Systems
  • Not all systems can be scheduled without token
    accumulation

a
b
Produces two as for every b
Alternates between receiving one a and one b
25
Tom Parks Algorithm
  • Schedules a Kahn Process Network in bounded
    memory if it is possible
  • Start with bounded buffers
  • Use any scheduling technique that avoids buffer
    overflow
  • If system deadlocks because of buffer overflow,
    increase size of smallest buffer and continue

26
Parks Algorithm in Action
  • Start with buffers of size 1
  • Run A, B, C, D

C
A
Only consumes tokens from A
0-1-0
0-1
B
D
0-1-0
27
Parks Algorithm in Action
  • B blocked waiting for space in B-gtC buffer
  • Run A, then C
  • System will run indefinitely

C
A
Only consumes tokens from A
0-1-0
1
B
D
0
28
Parks Scheduling Algorithm
  • Neat trick
  • Whether a Kahn network can execute in bounded
    memory is undecidable
  • Parks algorithm does not violate this
  • It will run in bounded memory if possible, and
    use unbounded memory if necessary

29
Using Parks Scheduling Algorithm
  • It works, but
  • Requires dynamic memory allocation
  • Does not guarantee minimum memory usage
  • Scheduling choices may affect memory usage
  • Data-dependent decisions may affect memory usage
  • Relatively costly scheduling technique
  • Detecting deadlock may be difficult

30
Kahn Process Networks
  • Their beauty is that the scheduling algorithm
    does not affect their functional behavior
  • Difficult to schedule because of need to balance
    relative process rates
  • System inherently gives the scheduler few hints
    about appropriate rates
  • Parks algorithm expensive and fussy to implement
  • Might be appropriate for coarse-grain systems
  • Scheduling overhead dwarfed by process behavior

31
Synchronous Dataflow (SDF)
  • Edward Lee and David Messerchmitt, Berkeley,
    1987
  • Restriction of Kahn Networks to allow
    compile-time scheduling
  • Basic idea each process reads and writes a fixed
    number of tokens each time it fires
  • loop
  • read 3 A, 5 B, 1 C computewrite 2 D, 1 E, 7 F
  • end loop

32
SDF and Signal Processing
  • Restriction natural for multirate signal
    processing
  • Typical signal-processing processes
  • Unit-rate
  • Adders, multipliers
  • Upsamplers (1 in, n out)
  • Downsamplers (n in, 1 out)

33
Asynchronous message passingSynchronous data
flow (SDF)
  • Asynchronous message passingtasks do not have
    to wait until output is accepted.
  • Synchronous data flow all tokens are consumed
    at the same time.

SDF model allows static scheduling of token
production and consumption.In the general case,
buffers may be needed at edges.
34
Multi-rate SDF System
  • DAT-to-CD rate converter
  • Converts a 44.1 kHz sampling rate to 48 kHz

1
1
2
3
2
7
8
7
5
1
Upsampler
Downsampler
35
Delays
  • Kahn processes often have an initialization phase
  • SDF doesnt allow this because rates are not
    always constant
  • Alternative an SDF system may start with tokens
    in its buffers
  • These behave like delays (signal-processing)
  • Delays are sometimes necessary to avoid deadlock

36
Example SDF System
Duplicate
  • FIR Filter (all single-rate)

One-cycle delay
dup
dup
dup
dup
Constant multiply (filter coefficient)
c
c
c
c
c




Adder
37
SDF Scheduling
  • Schedule can be determined completely before the
    system runs
  • Two steps
  • 1. Establish relative execution rates by solving
    a system of linear equations
  • 2. Determine periodic schedule by simulating
    system for a single round

38
SDF Scheduling
  • Goal a sequence of process firings that
  • Runs each process at least once in proportion to
    its rate
  • Avoids underflow
  • no process fired unless all tokens it consumes
    are available
  • Returns the number of tokens in each buffer to
    their initial state
  • Result the schedule can be executed repeatedly
    without accumulating tokens in buffers

39
Balance equations
  • Number of produced tokens must equal number of
    consumed tokens on every edge
  • Repetitions (or firing) vector vS of schedule S
    number of firings of each actor in S
  • vS(A) np vS(B) nc
  • must be satisfied for each edge

np
nc
A
B
40
Balance equations
  • Balance for each edge
  • 3 vS(A) - vS(B) 0
  • vS(B) - vS(C) 0
  • 2 vS(A) - vS(C) 0
  • 2 vS(A) - vS(C) 0

41
Balance equations
  • M vS 0
  • iff S is periodic
  • Full rank (as in this case)
  • no non-zero solution
  • no periodic schedule
  • (too many tokens accumulate on A-gtB or B-gtC)

42
Balance equations
  • Non-full rank
  • infinite solutions exist (linear space of
    dimension 1)
  • Any multiple of q 1 2 2T satisfies the
    balance equations
  • ABCBC and ABBCC are minimal valid schedules
  • ABABBCBCCC is non-minimal valid schedule

43
Static SDF scheduling
  • Main SDF scheduling theorem (Lee 86)
  • A connected SDF graph with n actors has a
    periodic schedule iff its topology matrix M has
    rank n-1
  • If M has rank n-1 then there exists a unique
    smallest integer solution q to
  • M q 0
  • Rank must be at least n-1 because we need at
    least n-1 edges (connected-ness), providing each
    a linearly independent row
  • Admissibility is not guaranteed, and depends on
    initial tokens on cycles

44
Admissibility of schedules
  • No admissible schedule
  • BACBA, then deadlock
  • Adding one token on A-gtC makes
  • BACBACBA valid
  • Making a periodic schedule admissible is always
    possible, but changes specification...

45
From repetition vector to schedule
  • Repeatedly schedule fireable actors up to number
    of times in repetition vector
  • q 1 2 2T
  • Can find either ABCBC or ABBCC
  • If deadlock before original state, no valid
    schedule exists (Lee 86)

46
Calculating Rates
  • Each arc imposes a constraint

3a 2b 0 4b 3d 0 b 3c 0 2c a 0 d
2a 0
4
1
b
2
3
3
d
c
6
2
Solution? a 2c b 3c d 4c
1
3
2
1
a
47
Calculating Rates
  • Consistent systems have a one-dimensional
    solution
  • Usually want the smallest integer solution
  • Inconsistent systems only have the all-zeros
    solution
  • Disconnected systems have two- or
    higher-dimensional solutions

48
An Inconsistent System
  • No way to execute it without an unbounded
    accumulation of tokens
  • Only consistent solution is do nothing

a c 0 a 2b 0 3b c 0 3a 2c 0
1
1
c
a
1
1
3
2
b
49
An Underconstrained System
  • Two or more unconnected pieces
  • Relative rates between pieces undefined

1
1
b
a
a b 0 3c 2d 0
2
3
d
c
50
Consistent Rates Not Enough
  • A consistent system with no schedule
  • Rates do not avoid deadlock
  • Solution here add a delay on one of the arcs

1
1
1
1
b
a
51
SDF Scheduling
  • Fundamental SDF Scheduling Theorem
  • If rates can be established, any scheduling
    algorithm that avoids buffer underflow will
    produce a correct schedule if it exists

52
Scheduling Example
  • Theorem guarantees any valid simulation will
    produce a schedule

a2 b3 c1 d4
4
1
b
Possible schedules BBBCDDDDAA BDBDBCADDA BBDDBDDC
AA many more BC is not valid
2
3
3
d
c
6
2
1
3
2
1
a
53
Scheduling Choices
  • SDF Scheduling Theorem guarantees a schedule will
    be found if it exists
  • Systems often have many possible schedules
  • How can we use this flexibility?
  • Reduced code size
  • Reduced buffer sizes

54
SDF Code Generation
  • Often done with prewritten blocks
  • For traditional DSP, handwritten implementation
    of large functions (e.g., FFT)
  • One copy of each blocks code made for each
    appearance in the schedule
  • I.e., no function calls

55
Code Generation
  • In this simple-minded approach, the schedule
  • BBBCDDDDAA
  • would produce code like
  • B
  • B
  • B
  • C
  • D
  • D
  • D
  • D
  • A
  • A

56
Looped Code Generation
  • Obvious improvement use loops
  • Rewrite the schedule in looped form
  • (3 B) C (4 D) (2 A)
  • Generated code becomes
  • for ( i 0 i lt 3 i) B
  • C
  • for ( i 0 i lt 4 i) D
  • for ( i 0 i lt 2 i) A

57
Single-Appearance Schedules
  • Often possible to choose a looped schedule in
    which each block appears exactly once
  • Leads to efficient block-structured code
  • Only requires one copy of each blocks code
  • Does not always exist
  • Often requires more buffer space than other
    schedules

58
Finding Single-Appearance Schedules
  • Always exist for acyclic graphs
  • Blocks appear in topological order
  • For SCCs, look at number of tokens that pass
    through arc in each period (follows from balance
    equations)
  • If there is at least that much delay, the arc
    does not impose ordering constraints
  • Idea no possibility of underflow

b
2
6
3
a
a2 b3 6 tokens cross the arc delay of 6 is
enough
59
Finding Single-Appearance Schedules
  • Recursive strongly-connected component
    decomposition
  • Decompose into SCCs
  • Remove non-constraining arcs
  • Recurse if possible
  • Removing arcs may break the SCC into two or more

60
Minimum-Memory Schedules
  • Another possible objective
  • Often increases code size (block-generated code)
  • Static scheduling makes it possible to exactly
    predict memory requirements
  • Simultaneously improving code size, memory
    requirements, sharing buffers, etc. remain open
    research problems

61
Cyclo-static Dataflow
  • SDF suffers from requiring each process to
    produce and consume all tokens in a single firing
  • Tends to lead to larger buffer requirements
  • Example downsampler
  • Dont really need to store 8 tokens in the buffer
  • This process simply discards 7 of them, anyway

8
1
62
Cyclo-static Dataflow
  • Alternative have periodic, binary firings
  • Semantics first firing consume 1, produce 1
  • Second through eighth firing consume 1, produce 0

1,1,1,1,1,1,1,1
1,0,0,0,0,0,0,0
63
Cyclo-Static Dataflow
  • Scheduling is much like SDF
  • Balance equations establish relative rates as
    before
  • Any scheduler that avoids underflow will produce
    a schedule if one exists
  • Advantage even more schedule flexibility
  • Makes it easier to avoid large buffers
  • Especially good for hardware implementation
  • Hardware likes moving single values at a time

64
Summary of Dataflow
  • Processes communicating exclusively through FIFOs
  • Kahn process networks
  • Blocking read, nonblocking write
  • Deterministic
  • Hard to schedule
  • Parks algorithm requires deadlock detection,
    dynamic buffer-size adjustment

65
Summary of Dataflow
  • Synchronous Dataflow (SDF)
  • Firing rules
  • Fixed token consumption/production
  • Can be scheduled statically
  • Solve balance equations to establish rates
  • Any correct simulation will produce a schedule if
    one exists
  • Looped schedules
  • For code generation implies loops in generated
    code
  • Recursive SCC Decomposition
  • CSDF breaks firing rules into smaller pieces
  • Scheduling problem largely the same
Write a Comment
User Comments (0)
About PowerShow.com