Title: Bandwidth-centric scheduling of independent tasks on a heterogeneous grid
1Bandwidth-centric schedulingof independent
taskson a heterogeneous grid
- Olivier Beaumont, ENS
- Larry Carter, UCSD
- Jeanne Ferrante, UCSD
- Arnaud Legrand, ENS
- Yves Robert, ENS
2Grid Computing
- Distributed heterogeneous computing
- Large number of independent tasks
- Data begins and ends at specific site
- Examples
- SETI _at_ home
- Factoring numbers
- Animated films
- Drug screening
3Computational grid
My computer
Intermediate nodes can compute too
Data starts here
Internet Gateway
Cluster Host
Partner site
Super- computer
Participating PCs and workstations
4Base Model Node
- Processor takes W0 time to do one task
- Takes C-1 time to receive task from parent
- Takes Ci time to send task to i-th child
- These 3 activities can be done concurrently
- but only one send at a time
5Example
A is the root of the tree all tasks start at A
3
A
Time for sending one task from A to B
1
2
Time for computing one task in C
2
6
B
C
1
2
D
Examples assume no communication of results back
to root
6Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
7Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
8Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
9Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
10Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
11Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
12Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
13Steady-state
3
A
1
2
Repeated pattern
6
2
C
B
Startup
Clean-up
1
A compute
2
D
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Steady-state 7 tasks every 6 time units
Total time 16 tasks in 16 time units
14Steady State Problem
w0
- One-level fork graph node k leaves
- Concurrent activities
- w0 time to execute task
- ci time to send to i-th child (only one at a
time) - Let Ri denote steady-state rates
- R0 tasks/second executed in node
- Ri tasks/second sent to and done by child i
- Constraints
- ? Ri ci 1
- Ri 1/wi for i 0, ..., k
Ck
C1
...
wk
w1
15Solution
- Sort by communication times c1 c2 ...
- Find largest p such that ? ci/wi 1
- For i 1, ..., p, set Ri 1/wi
- keep the first p children busy
- note that ? Ri ci 1 so far
- Set Rp1 e/cp1, where e 1- ? Ri ci
- give the (p1)-st child any leftover work
- Set R0 1/w0
- Keep the roots processor busy
w0
Ck
C1
...
wk
w1
Constraints Ri 1/wi i 0 ... k ? Ri
ci 1
16New law of efficient management
- Delegate work to whomever it takes you the least
time to explain the problem to! - Provided workers desk isnt overloaded.
- It doesnt matter if that person is a slow
worker. - Of course, slow workers will have full desktops
more often.
17With communication from above
- Three concurrent activities
- W0 time to execute task
- C-1 time to receive task from parent
- Ci time to send to i-th child
- New constraints
- R-1 ? Ri and R-1 c-1 1, where R-1 receive
rate - Solution
- R-1 min (1/c-1, 1/wo ? 1/wi e/c p1)
C-1
w0
Ck
C1
...
wk
w1
solution for one-level tree
18Steady-state for tree
Process root last
My computer
Internet Gateway
Cluster Host
Partner site
Super- computer
Reduce fork graphs to single node
19Steady-state for tree
Process root last
My computer
Internet Gateway
Cluster Host
Summary node
Super- computer
Reduce fork graphs to single node
20Steady-state for tree
Process root last
My computer
Internet Summary
Cluster Summary
Reduce fork graphs to single node
21Example
3
A
1
2
First find equivalent work-time for subtree
2
6
B
C
1
2
D
Subtrees rate is 1/6 1/2 2/3, equivalent to
node with w 3/2
22Example
3
A
1
2
Replace subtree with equivalent node
2
1.5
B
C
Subtrees rate is 1/6 1/2 2/3, equivalent to
node with w 3/2
23Example
3
A
1
2
2
1.5
B
C
Solve root tree Keep C busy (i.e. RC
1/1.5) e 1-1/1.5 1/3 RB e/2 1/6 rate
1/3 1/1.5 1/6 7/6
24Base Model Notation
- Three concurrent activities
- W0 time to do one task
- C-1 time to receive task from parent
- Ci time to send to i-th child (only one at a time)
c-1
to parent
receive
things stacked vertically can be done concurrently
w0
nodes processor
send
c1
ck
...
wk
w1
children
25Concurrent receive model
- Two concurrent activities
- Receive task from parent.
- EITHER send to i-th child OR execute one task
c-1
receive
w0
send
c1
ck
...
w1
wk
26Reduce to previous model
- Replace nodes processor by a new child.
c-1
c-1
receive
receive
8
w0
send
send
c1
ck
w0
ck
c1
w1
wk
w0
wk
w1
base model
concurrent receive model
27Concurrent work model
- 2 concurrent activities execute and one
communication
c-1
c-1
receive
w0
8
receive
send
send
c1
ck
c-1
c-1ck
c-1c1
w1
wk
wk
w1
w0
base model
concurrent receive model
28Other models have similar reductions
w0
receive
send
receive
w0
send
send
send
w0
receive
Fully parallel model
send
Concurrent send model
29Revised law of efficient management(if you cant
work and delegate at the same time)
- Delegate work to whomever it takes you the least
time to explain the problem to! - Provided workers desk isnt overloaded.
- Do it yourself only if thats faster than
explaining it to any available worker. - And fire anyone who isnt working when you are!
- Or reassign them to a better communicating or
slower-working manager
30OGO model of communication
- Recall LogP model Latency Overhead Gap
- Latency is irrelevant to steady-state
- We allow different Overheads at each end
- OGO model
- channel can send one task every G time units
- nodes processor is interrupted for time O
- Easy to extend our methods
- use no-concurrency model with Os
- take min with 1/G
31Flaky example
- Communicate with Alice via fax
- takes me 5 minutes to send message
- ties up the fax machine for 2 minutes
- Communicate with Bob via US mail
- takes me 1 minute to address letter
- doesnt reach him for 3 days
- Communicate with Carol via courier
- takes me 30 seconds to summon courier
- can only deliver two tasks per day
32Bandwidth-centric scheduling
- Prioritize children by communication times
- Request tasks from parent
- initially, request enough to fill your buffers
- make more requests as you delegate execute
- On receiving a task
- execute locally if idle
- satisfy childrens request according to
priorities - Occasionally re-adjust priorities
33Simulations
- 100 randomly-generated trees
- up to 5 children per node
- up to 10 non-leaf nodes
- Three strategies
- All children equal priority (first-come,
first-serve) - Theorems subset equal priority
- Theorems subset low priority to last child
- Buffer for one task per node (buffer starts full)
- Buffer four tasks per node
34Start-up issues
- Bandwidth-centric is optimal within an additive
constant - Can execute N tasks in N/w-1 k time
- Bound on start-up and clean-up time
- tree depth x longest nodes steady-state period
- Bound on period
- lcm(w0, w1, ..., wp, c-1, cp1)
35Start-up
4
A
2
1
8
2
Steady- State
C
B
Startup
8
D
2
A compute
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
Steady-state 8 tasks every 8 time units
36Modeling networks as trees
- Leiserson Fat Trees
- recursively partition graph into 2 parts.
- Alpern, Carter, Ferrante PMH model
- group nodes connected with fast links as a new
node with original ones children. - successively relax definition of fast.
- Shao ENV model
- measure bandwidth to you group ones similar
speeds.
37Summary
- Explicit solution to steady-state throughput
- heterogeneous speeds
- heterogeneous models
- Simple computation
- uniform tasks
- no dependences
- Suggests simple dynamic scheduling
38Current work
- Application model
- Bag of tasks
- Performance model
- Applications workflow
- Type of schedule
- Application
- Grid model
- Very heterogeneous tree
39Future work
- Application model
- Bag of tasks -gt parallel pipeline unequal
sizes - Performance model
- Applications workflow -gt total execution time
- Type of schedule
- Application
- Grid model
- Very heterogeneous tree -gt memory latency