Title: On%20Scheduling%20Expansive%20and%20Reductive%20Dags%20for%20Internet-Based%20Computing
1On Scheduling Expansive and Reductive Dags for
Internet-Based Computing
- Gennaro Cordasco
- University of Salerno, Italy
- Collaborators
- Arnold L. Rosenberg Univ. of
Massachusetts Amherst - Greg Malewicz Google Inc.
2Outline
- Motivation
- The Internet-Based Computing Pebble Game
- Decomposition-Based Scheduling Theory
- Priority Relation
- Dags Composition
- Expanding the repertoire of building-block dags
- M- and W- Strands, and Duality
- The Optimal Schedule for M- and W- strands
- Project in Progress
3Motivation
- Internet-Based computing platform
- Web Computing
- Grid Computing
- P2P Computing
- Why Internet-Based Computing?
- Remuneration (Commercial computational grids)
- Reciprocation (Open computational grids)
- Altruism (e.g. FightAids_at_home)
- Curiosity (e.g. Seti_at_home)
4Internet-Based Computing (IC)
- The owner of a massive job enlists the aid of
remote clients to compute the jobs tasks. - The owner (server) allocates tasks to clients one
task at a time. - A client receives its (k 1)-th task after
returning the results from its k-th task.
5Challenges in Internet-Based Computing
- Focus on jobs that have inter-task dependencies
(modeled as dags) we want to enhance their
utilization. - Unfortunately, IC platforms are characterized by
a temporal unpredictability - communication takes place over the internet
- remote clients are not dedicated, hence can be
unexpectedly slow - Temporal unpredictability precludes use of
standard strategies that were developed for
older platform.
6An Avenue of Idealization
- Fact
- Without further assumption, adversarial Clients
can confute any strategy the Server adopts. -
- Fact (Buyya-Abramson-Giddy, Kondo-Casanova-Wing-Be
rman, Sun-Wu) - Monitoring clients past performance and present
resources allows one to - mitigate the degree of temporal unpredictability
- match task complexity to client resources.
- Idealization
- Via monitoring, one can approximately ensure
the temporal unpredictability of clients affects
the timing, but not the order of task executions.
7The Formal Idealization
- Assumption Tasks are executed in the order of
allocation. - This assumption allows us to
- let time be event-driven (execute a node at
each step) - derive scheduling guidelines that are totally
under control of the Server.
8Our Overall Goal
- Determine how to schedule a DAG of tasks is such
a way that - Informally
- the danger of gridlock is lessened
- the utilization of available client resources is
enhanced - Formally
- the number of tasks that are eligible for
allocation is maximized at every step of the
computation
9The Computation-dag G
- A dag G(N,A) is used to model a computation
(computation-dag) - each node v ? N represents a task in the
computation - an arc (u ? v) ? A represents the dependence of
task v on task u v cannot be executed until u
is. - Given an arc (u ? v) ? A, u is parent of v, and v
is child of u in G. Each parentless node of G is
a source (node), and each childless node is a
sink (node).
10The IC Pebble Game
- The Players
- A single Server, S (the owner)
- A (finite or infinite) set of Clients, C1, C2,
- S has unlimited supplies of two types of
Pebbles - ELIGIBLE pebbles, whose presence indicates a
task eligible for execution - EXECUTED pebbles, whose presence indicates an
executed task
11The IC Pebble Game
- Rules of the Game
- S begins by placing an ELIGIBLE pebble on each
unpebbled source of G. -
Unexecuted sources are always eligible for
execution, having no parent whose prior execution
they depend on
ELIGIBLE
EXECUTED
unpebbled
12The IC Pebble Game
- Rules of the Game
- S begins by placing an ELIGIBLE pebble on each
unpebbled source of G. - At each step, S
- selects a node that contains an ELIGIBLE pebble,
- replaces that pebble by an EXECUTED pebble,
- places an ELIGIBLE pebble on each unpebbled node
of G all of whose parents contain EXECUTED
pebbles.
ELIGIBLE
EXECUTED
unpebbled
13IC Quality/Optimality of a Dag-Schedule
- The IC quality of a schedule for a dag
- the rate of producing ELIGIBLE nodes - the
larger, the better. - A schedule for a dag is IC optimal (ICO)
- It maximizes the number of ELIGIBLE nodes for all
steps.
14How Important is IC Quality?
- Consider the following dag
-
Non-optimal schedule Never more than 2 ELIGIBLE
nodes
Optimal schedule Roughly t1/2 ELIGIBLE nodes at
step t
ELIGIBLE
EXECUTED
unpebbled
15Optimality is not always possible
- For each step t of a play of the Game on a
dag G under a schedule S ES(t) denotes the
number of nodes of G that contain ELIGIBLE
pebbles at step t. - Consider the following dag
- ? schedule S ES(0) 3
- maxS ES(1)ES(1)3 (where S(s1,s2,s3) or
(s1,s3,s2)) - but ES(2)2
- However, maxS ES(2)ES(2)3 ( where
S(s2,s3,s1) or (s3,s2,s1)) - No schedule maximal at step 1 is maximal at step
2.
ELIGIBLE
EXECUTED
unpebbled
s1 s2 s3
16IC-Optimal Schedules for Common Dags
- Theorem. MRY06
- A schedule for
- is IC optimal iff is parent oriented.
- Parent oriented A nodes parents are executed
sequentially. - Meshes level by level, sequentially across each
level - Tree-dags in sibling pairs (nodes that share
a child) - FFT-dags in butterfly pairs (nodes that share
two children)
an evolving mesh-dag (2- or 3D) a reduction-mesh
a reduction-tree
an FFT dag
17Decomposition-Based Scheduling Theory
- Construct Dags from schedulable building blocks
- Choose bipartite building block dags that have
optimal schedules. - Theorem. MRY06
- Any schedule for these building blocks that
executes all sources sequentially is IC optimal
18The Priority relation
- Let two dag G1 and G2 respectively admit an
IC optimal schedule ?1 and ?2 then -
- G1 ? G2 means that the schedule ? that
entirely execute G1s non-sinks and then entirely
execute G2s non-sinks is at least as good as
any other schedule that execute both G1 and G2
(G1G2). - Lemma. MRY06 The relation ? is transitive
19Dag Composition
- Let G1 and G2 two dags, the composition of G1
and G2 is obtained by merging some k sources of
G2 with some k sinks of G1. - The dag obtained is composite of type G1 ? G2.
- Composition is associative
- Example M1,2 ? M2,3 ? M1,3
M1,3
M1,2
M2,3
20Dag Composition
- Theorem. MRY06 If the dag G is a composition of
connected bipartite dags Gi1?i?n, of type - G1 ? G2 ? ? Gn
- and if
- G1 ? G2 ? ? Gn (G is a ?linear composition)
-
- then executing G by executing the Gi in ?order
is IC optimal.
21Dags Composition
- Composite dags that admit IC-optimal
schedules can be very non-uniform in structure.
E.g.
22IC-Optimality via Dag-Decomposition
- The real problem is not to build a
computation-dag but rather to execute a given one - In MRY06 a framework which allows to convert a
real dag into a simplified and decomposed one
has been provided.
SG
G
23Expanding the repertoire of building-block dags
- For any finite sequence d of integers, each gt 1
the d-strands of Wd and Md are defined
inductively as follows. - For each integer d gt 1, Wd is the
(single-source) degree-d W-dag. It has one source
and d sinks its d arcs connect the source to
each sink.
W4
24Expanding the repertoire of building-block dags
- For any finite sequence d of integers, each gt 1
the d-strands of Wd and Md are defined
inductively as follows. - For each integer d gt 1, Wd is the
(single-source) degree-d W-dag. It has one source
and d sinks its d arcs connect the source to
each sink. - For each sequence d of integers, each gt 1, and
each integer d gt 1, Wd,d is obtained from Wd
and Wd by identifying (or, merging) the
rightmost sink of the former dag with the
leftmost sink of the latter.
W4,2,4
W4
W4,2,4,4
25Expanding the repertoire of building-block dags
- For any finite sequence d of integers, each gt 1
the d-strands of Wd and Md are defined
inductively as follows. - For each integer d gt 1, Wd is the
(single-source) degree-d W-dag. It has one source
and d sinks its d arcs connect the source to
each sink. - For each sequence d of integers, each gt 1, and
each integer d gt 1, Wd,d is obtained from Wd
and Wd by identifying (or, merging) the
rightmost sink of the former dag with the
leftmost sink of the latter. - Every W-strand Wd has a dual M-strand Md
that is obtained by reversing all arcs.
W4 M4 M4,2,4,3
W4,2,4,3
26Scheduling-Based Duality
- Expansive building block (W) and reductive
building block (M) are dual to one another in
two strong senses - Given a building block G of either type, and
given an optimal schedule S for G, one can
algorithmically derive from S an optimal schedule
S? for the building block G? obtained by
reversing all arcs of G - Given two building block, G1 and G2 of the same
type, if G1 ? G2, then G?2 ? G?1
27Scheduling-Based Duality
- Let W be a W-strand having n sources and m
sinks. - Let S be a schedule for W that execute its
source in the order u1,u2,,un. - S renders Ws sinks ELIGIBLE in a sequence of
packets P1,P2,,Pn (where Pi is the set of
sinks that become eligible when S executes ui). - A schedule for MW is dual to S if it executes
MWs sources in an order of the form
Pn,Pn-1,,P1
Where a,b,..c denotes a fixed, but
unspecified permutation of a,b,,c
28Scheduling-Based Duality
- Theorem.
- Let the W-strand W admit the IC-optimal schedule
S. Any schedule for MW that is dual to S is
IC-optimal
W
Bt
U Ws sources (Un) V Ws sinks (Vm) At
set of sources executed in the first t steps of
S Bt set of sinks ELIGIBLE after step t of S
At is ICO for W at step At V\Bt is ICO for
MW at step m-Bt
At
MW
V\Bt
29Scheduling-Based Duality An example
a b c d
- S(1,2) is IC-optimal for W3,2
- P1a,b, P2c,d
- S?(c,d,a,b) is IC-optimal for M3,2
1 2 W3,2
Where a,b,..c denotes a fixed, but
unspecified permutation of a,b,,c
1 2
a b c d
M3,2
30Scheduling-Based Duality (2)
- Theorem. For any W-strands W1 and W2 W1 ?W2
if, and only if, MW2 ?MW1 - An Example
W3 W2
M3 M2
31On Scheduling Strands IC-Optimally
- Theorem. Every sum of W-Strands and every
sum of M-Strand admits an IC-optimal schedule. - Let Src(S) denote S 's sources
- For X ? Src(S), e(XS) denotes the number of
sinks of S that are rendered ELIGIBLE when
precisely the sources in X are EXECUTED. - For each u ? Src(S)
- For any k ? 1,n, u(k) e(u,, u k - 1 S)
- Vu ?u(1),,u(n)?
- Order the vectors V1,,Vn lexicographically,
using the notation Va ?L Vb to denote this order. - A source s ? Src(S) is maximum if Vs ?L Vs for
all s ? Src(S).
S
1 2 3 4
32The greedy schedule SS
- The greedy schedule SS for S operates as follows.
- SS executes any maximum s ? Src(S) .
- SS removes from S the just-executed source s and
all sinks having s as their only parent. This
converts S to the sum of W-strands S - SS recursively executes S using schedule SS.
- The schedule SS is IC optimal for S
1 2 3 4
33The greedy schedule SS An example
S
- Consider the dag SW4,2W3,2
- V1?3,5,5,5?, V2?1,1,1,1?, V3?2,4,4,4?,
V4?1,1,1,1?, hence 1 is maximum - SS executes 1
-
- SW2W3,2
- V2?2,2,2?, V3?2,4,4?, V4?1,1,1?, hence 3 is
maximum - SS executes 3
- SW2W2
- V2?2,2?, V4?2,2?, hence 2 and 4 are maximum
- SS executes 2 and then 4 (or viceversa)
1
2
3
4
S
2
3
4
S
2
4
34Project in Progress
- Extend the priority relation ? to include
topological order. - Order of composition (rather than ?-priority) may
force a schedule to execute G1 before G2. - Determine how to invoke schedules that execute
building blocks in an interleaved rather than
sequential fashion. - Can now do this for bipartite dags.
- Experimentally determine the significance of
IC-optimality - Initial results suggest significant speedup
MFRW06
35Thanks for your attention
36References
- R04 A.L. Rosenberg (2004) On scheduling
mesh-structured computations for Internet-based
computing. IEEE Trans. Comput. 53, 11761186. - RY05 A.L. Rosenberg and M. Yurkewych (2005)
Guidelines for scheduling some common
computation-dags for Internet-based computing.
IEEE Trans. Comput. 54, 428438. - MRY06 G. Malewicz, A.L. Rosenberg, M. Yurkewych
(2006) Toward a theory for scheduling dags in
Internet-based computing. IEEE Trans. Comput.
55. See also, Intl. Parallel and Distr.
Processing Symp., 2005. - MFRW06 Grzegorz Malewicz, Ian Foster, Arnold L.
Rosenberg Michael Wilde (2006) A Tool for
Prioritizing DAGMan Jobs and Its Evaluation.
IEEE International Symposium on High Performance
Distributed Computing.