Title: Lecture 4 Clocked DataFlow Models
1Lecture 4Clocked Data-Flow Models
2Data Flow Model Hierarchy
- Kahn Process Networks (KPN) (asynchronous)
- Dataflow Networks
- special case of KPN
- actors, tokens and firings
- Static Data Flow (Clocked automata assumptions)
- special case of DN
- static scheduling
- code generation
- buffer sizing (resources!!)
- Other Data Flow models
- Boolean Data Flow
- Dynamic Data Flow
- Sequence Graphs, Dependency Graphs, Data Flow
Graphs
- Control Data Flow
3Data Flow Models
- Powerful formalism for data-dominated system
specification
- Partially-ordered model (over-specification)
- Deterministic execution independent of
scheduling
- Used for
- simulation
- scheduling
- memory allocation
- code generation
- for Digital Signal Processors (HW and SW)
4Data Flow Networks
- A Data Flow Network is a collection of actors
which are connected and communicate over
unbounded FIFO queues
- Actors firing follows firing rules
- Firing rule number of required tokens on inputs
- Function number of consumed and produced tokens
- Actors are functional i.e. have no internal
state
- Breaking processes of KPNs down into smaller
units of computation makes implementation easier
(scheduling)
- Tokens carry values
- integer, float, audio samples, image of pixels
- Network state number of tokens in FIFOs
5Intuitive semantics
- At each time, one actor is fired
- Can fire more but one is always safe (atomic
firing)
- When firing, actors consume input tokens and
produce output tokens
- Actors can be fired only if there are enough
tokens in the input queues
6Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
o
7Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
o
8Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
o
9Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
o
10Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
o
11Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
12Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
13Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
14Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
15Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
16Filter example
- Example FIR filter
- single input sequence i(n)
- single output sequence o(n)
- o(n) c1 i(n) c2 i(n-1)
c2
i
c1
i(-1)
o
17Scheduling Data Flow
- Given a set of Actors and Dependencies
- How to construct valid execution sequences?
- Static Scheduling
- Assume that you can predefine the execution
sequence
- FSM Scheduling
- Sequencing defined as control-dependent FSM
- Dynamic Scheduling
- Seqnencing determined dynamically (run-time) by
predefined rules
- In all cases, need to not violate resource or
dependency constraints
- In general, both actors and resources can
themselves have sequential (FSM) behaviors
18A RISC Instruction Execution
19Another RISC Instruction Execution
20A Complete RISC Behavior Graph
21Scheduling Best Valid Sequences
0
RISC Instruction Task
22Scheduling Best Valid Sequences
1
23Scheduling Best Valid Sequences
2
24Scheduling Best Valid Sequences
3
25Scheduling Best Valid Sequences
4
26Examples of Data Flow actors
- SDF Synchronous (or Static) Data Flow
- fixed number of input and output tokens per
invocation
- BDF Boolean Data Flow
- control token determines consumed and produced
tokens
T
F
TTTF
TTTF
fork
join
F
T
27Examples of Data Flow actors
- Sequence Graphs, Dependency Graph, Data Flow
Graph
- Each edge corresponds to exactly one value
- No buffering
- Special Case of SDF
- CDFG Control Data Flow Graphs
- Adds branching (conditionals) and iteration
constructs
- Many different models for this
1
1
1
1
1
1
1
Typical model in many behavioral/architectural
synthesis tools
28Synthesis in Temporal Domain
- Scheduling and binding can be done in different
orders or together
- Schedule
- Mapping of operations to time slots binding to
resources
- A scheduled sequencing graph is a labeled graph
Gupta
29Operation Types
- Operations have types
- Each resource may have several types and timing
constraints
- T is a relation that maps an operation to a
resource by matching types
- T V ? 1, 2, ..., nres.
- In general
- A resource type may implement more than one
operation type ( ALU)
- May have family of timing constraints
(data-dependent timing?!)
- Resource binding
- Notion of exclusive mapping
- Pipeline resources or other state?
- Arbitration
- Choice linked to complexity of interconnect
network
30(No Transcript)
31Scheduling and Binding
- Resource constraints
- Number of resource instances of each type ak
k1, 2, ..., nres.
- Link, register, and communication resources
- Scheduling
- Timing of operation
- Binding
- Location of operation
- Costs
- Resources ? area (power?)
- Registers, steering logic (Muxes, busses),
wiring, control unit
- Metric
- Start time of the sink node
- Might be affected by steering logic and schedule
(control logic) resource-dominated vs.
ctrl-dominated
32Architectural Optimization
- Optimization in view of design space flexibility
- A multi-criteria optimization problem
- Determine schedule f and binding b.
- Given area A, latency l and cycle time t
objectives
- Find non-dominated points in solution space
- Pareto-optimal solutions
- Solution space tradeoff curves
- Non-linear, discontinuous
- Area / latency / cycle time (Power?, Slack?,
Registers?, Simplicity?)
- Evaluate (estimate) cost functions
- Constrained optimization problems for resource
dominated circuits
- Min area solve for minimal binding
- Min latency solve for minimum l scheduling
Gupta
33Operation Scheduling
- Input
- Sequencing graph G(V, E), with n vertices
- Cycle time t.
- Operation delays D di i0..n.
- Output
- Schedule f determines start time ti of operation
vi.
- Latency l tn t0.
- Goal determine area / latency tradeoff
- Classes
- Unconstrained
- Latency or Resource constrained
- Hierarchical (accommodate control transfer!)
- Loop/Loop Pipelined
Gupta
34Min Latency Unconstrained Scheduling
- Simplest case no constraints, find min latency
- Given set of vertices V, delays D and a partial
order on operations E, find an integer labeling
of operations f V ? Z Such that
- ti f(vi).
- ti ? tj dj ? (vj, vi) ? E.
- l tn t0 is minimum.
- Solvable in polynomial time
- Bounds on latency for resource constrained
problems
Algorithm?
ASAP algorithm used topological order
35ASAP Schedules
- Schedule v0 at t00.
- While (vn not scheduled)
- Select vi with all scheduled predecessors
- Schedule vi at ti max tjdj, vj being a
predecessor of vi.
- Return tn.
36ALAP Schedules
- Schedule vn at t0l.
- While (v0 not scheduled)
- Select vi with all scheduled successors
- Schedule vi at ti min tj-dj, vj being a
succecessor of vi.
NOP
1
?
?
?
?
2
?
-
?
3
-
4
NOP
37Resource Constraint Scheduling
- Constrained scheduling
- General case NP-complete (3 or more resources)
- Minimize latency given constraints on area orthe
resources (ML-RCS)
- Minimize resources subject to bound on latency
(MR-LCS)
- Exact solution methods
- ILP Integer Linear Programming (Lin, Gebotys)
- Symbolic Scheduling (Haynal, Radevojevic)
- Hus heuristic algorithm for identical
processors
- Heuristics
- List scheduling
- Force-directed scheduling
- Taboo search, Monte-Carlo, many others
38Simplified ILP Formulation
- Use binary decision variables
- i 0, 1, ..., n
- l 1, 2, ..., l1 l given upper-bound on
latency
- xil 1 if operation i starts at step l, 0
otherwise.
- Set of linear inequalities (constraints),and an
objective function (min latency)
- Observations
-
- ti start time of op i.
- is op vi (still) executing at step l?
39Start Time vs. Execution Time
- Each operation vi , exactly one start time
- If di1, then the following questions are the
same
- Does operation vi start at step l?
- Is operation vi running at step l?
- But if di1, then the two questions should be
formulated as
- Does operation vi start at step l?
- Does xil 1 hold?
- Is operation vi running at step l?
- Does the following hold?
40Operation vi Still Running at Step l ?
- Is v9 running at step 6?
- Is x9,6 x9,5 x9,4 1 ?
- Note
- Only one (if any) of the above three cases can
happen
- To meet resource constraints, we have to ask the
same question for ALL steps, and ALL operations
of that type
41ILP Formulation of ML-RCS (cont.)
- Constraints
- Unique start times
- Sequencing (dependency) relations must be
satisfied
- Resource constraints
- Objective min cTt.
- t start times vector, c cost weight (e.g., 0 0
... 1)
- When c 0 0 ... 1, cTt
42ILP Example
- First, perform ASAP and ALAP (l 4)
- (we can write the ILP without ASAP and ALAP, but
using ASAP and ALAP will simplify the
inequalities)
v2
v1
v2
v1
v6
v8
v10
v6
v3
v7
v9
v11
v3
v4
v8
v4
v10
v7
v9
v5
v11
v5
vn
vn
43ILP Example Unique Start Times Constraint
- Without using ASAP and ALAP values
44ILP Example Dependency Constraints
- Using ASAP and ALAP, the non-trivial inequalities
are (assuming unit delay for and )
45ILP Example Resource Constraints
- Resource constraints (assuming 2 adders and 2
multipliers)
- Objective Min Xn,4
46ILP Formulation of Resource Minimization
- Dual problem to Latency Minimization
- Objective
- Goal is to optimize total resource usage, a.
- Objective function is cTa , where entries in
c are respective area costs of resources
- Constraints
- Same as ML-RCS constraints, plus
- Latency constraint added
- Note unknown ak appears in constraints.
Gupta
47Hus Algorithm
- Simple case of the scheduling problem
- All operations have unit delay
- All operations (and resources) of the same type
- Graph is forest
- Hus algorithm
- Greedy
- Polynomial AND optimal
- Computes lower bound on number of resources for a
given latencyOR computes lower bound on latency
subject to resource constraints
Gupta
48Basic Idea Hus Algorithm
- Relies on labeling of operations
- Based on their distances from the sink
- Length of the longest path passing through that
node
- Try to schedule nodes with higher labels
first(i.e., most critical operations have
priority)
- Schedule a nodes at a time
- a is the number of resources
- Only schedule nodes that have all their
parent/predecessors scheduled
- Each time you schedule one time step (start with
step 1, 2, 3,
Gupta
49Hus Algorithm
- HU (G(V,E), a)
- Label the vertices // label length of longest
path passing through the vertex
- l 1
- repeat
- U unscheduled vertices in V whose
predecessors have been scheduled (or have no
predecessors)
- Select S ? U such that S ? a and labels in
S are maximal
- Schedule the S operations at step l by
setting til, i vi ? S.
- l l 1
- until vn is scheduled.
50Hus Algorithm Example
Step 1 Label Vertices (Assume all operations
have unit delays)
Gupta
51Hus Algorithm Example
Find unscheduled vertices with scheduled parents
pick 3 (num. resources) that maximize labels
Gupta
52Hus Algorithm Example
Repeat until all nodes are scheduled
Gupta
53List Scheduling
- Heuristic methods for RCS and LCS
- Does NOT guarantee optimum solution
- Similar to Hus algorithm
- Greedy strategy
- Operation selection decided by criticality
- O(n) time complexity
- More general input
- Works on general graphs (unlike Hus)
- Resource constraints on different resource types
54List Scheduling Algorithm ML-RCS
LIST_L (G(V,E), a) l 1 repeat for eac
h resource type k Ul,k available vertices
in V Tl,k operations in progress. Select
Sk ? Ul,k such that Sk Tl,k ? ak
Schedule the Sk operations at step l
l l 1 until vn is scheduled.
55List Scheduling Example
Assumptions three multipliers with latency 2 1
ALU with latency 1
Gupta
56List Scheduling Algorithm MR-LCS
LIST_R (G(V,E), l) a 1, l 1 Compute t
he ALAP times tL. if t0L ible) repeat for each resource type k
Ul,k available vertices in V.
Compute the slacks si tiL - l, ? vi? Ul,k
. Schedule operations with zero slack, update
a Schedule additional Sk ? Ul,k under a const
raints l l 1 until vn is scheduled
.
57Force-Directed Scheduling
- Paulin and Knight DAC87
- Similar to list scheduling
- Can handle ML-RCS and MR-LCS
- For ML-RCS, schedules step-by-step
- BUT, selection of the operations tries to find
the globally best set of operations
- Difference with list scheduling in selecting
operations
- Select operations with least force
- Consider the effect on the type distribution
- Consider the effect on successor nodes and their
type distributions
- Idea
- Find the mobility mi tiL tiS of operations
- Look at the operation type probability
distributions
- Try to flatten the operation type distributions
Gupta
58Force-Directed Scheduling
- Rationale
- Reward uniform distribution of operations across
schedule steps
- Force
- Used as a priority function
- Related to concurrency sort operations for
least force
- Mechanical analogy Force constant x
displacement
- Constant operation-type distribution
- Displacement change in probability
- Definition operation probability density
- pi ( l ) Pr vi starts at step l .
- Assume uniform distribution
Gupta
59Force-Directed Scheduling Definitions
- Operation-type distribution (NOT normalized to
1)
-
- Operation probabilities over control steps
-
- Distribution graph of type k over all steps
- qk ( l ) can be thought of as expected operator
cost for implementing operations of type k at
step l.
60Example
0
61Forces
- Self-force
- Sum of forces to other steps
- Self-force for operation vi in step l
-
- Successor-force
- Related to scheduling of the successors
operations
- Delay an operation may cause the delay of its
successors
62Example operation v6
Multiply
Add
- It can be scheduled in the first two steps
- p(1) p(2) 0.5, p(3) p(4) 0.0
- Distribution q(1) 2.8, q(2) 2.3
- Assign v6 to step 1
- Variation in probability of step 1 1 0.5
0.5
- Variation in probability of step 2 0 0.5
-0.5
- Self-force 2.8 x 0.5 - 2.3 x 0.5 0.25
63Example operation v6
Multiply
Add
- Assign v6 step 2
- Variation in probability 0 0.5 -0.5
- Variation in probability 1 0.5 0.5
- Self-force 2.8 x -0.5 2.3 x 0.5 -0.25
64Example operation v6
Multiply
Add
- Successor-force
- Operation v7 assigned to step 3
- 2.3(0 0.5) 0.8(1 0.5) -.75
- Total-force -1
- Conclusion
- Least force is for step 2
- Assigning v6 to step 2 reduces concurrency
65Force Directed Scheduling Algorithm
66Conclusions
- ILP optimal, but exponential runtime (often)
- Hus
- Optimal and polynomial
- Very restricted cases
- List scheduling
- Extension to Hus for general case
- Greedy (fast) O(n2) but suboptimal
- Force directed O(n3)
- More complicated list scheduling algorithm
- Take into account more global view of the graph
- Still suboptimal
- Next Time Automata-Based Scheduling