Title: Bandwidth-centric scheduling of independent tasks on a heterogeneous grid
 1Bandwidth-centric schedulingof independent 
taskson a heterogeneous grid
- Olivier Beaumont, ENS 
- Larry Carter, UCSD 
- Jeanne Ferrante, UCSD 
- Arnaud Legrand, ENS 
- Yves Robert, ENS
2Grid Computing
- Distributed heterogeneous computing 
- Large number of independent tasks 
- Data begins and ends at specific site 
- Examples 
- SETI _at_ home 
- Factoring numbers 
- Animated films 
- Drug screening 
3Computational grid
 My computer
Intermediate nodes can compute too
Data starts here
 Internet Gateway
 Cluster Host
Partner site
Super- computer   
Participating PCs and workstations 
 4Base Model Node
- Processor takes W0 time to do one task 
- Takes C-1 time to receive task from parent 
- Takes Ci time to send task to i-th child 
- These 3 activities can be done concurrently 
- but only one send at a time
5Example
A is the root of the tree all tasks start at A
 3
A
Time for sending one task from A to B
1
2
Time for computing one task in C
 2
 6
B
C
1
 2
D
Examples assume no communication of results back 
to root 
 6Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 7Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 8Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 9Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 10Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 11Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 12Example
 3
A
1
2
 2
 6
B
C
A compute
1
A send
 2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time? 
 13Steady-state
 3
A
1
2
Repeated pattern
 6
 2
C
B
Startup
Clean-up
1
A compute
 2
D
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Steady-state 7 tasks every 6 time units
Total time 16 tasks in 16 time units 
 14Steady State Problem 
w0
- One-level fork graph node  k leaves 
- Concurrent activities 
- w0 time to execute task 
- ci time to send to i-th child (only one at a 
 time)
- Let Ri denote steady-state rates 
- R0  tasks/second executed in node 
- Ri  tasks/second sent to and done by child i 
- Constraints 
- ? Ri ci  1 
- Ri  1/wi for i  0, ..., k 
Ck
C1
...
wk
w1 
 15Solution 
- Sort by communication times c1  c2  ... 
- Find largest p such that ? ci/wi  1 
- For i  1, ..., p, set Ri  1/wi 
- keep the first p children busy 
- note that ? Ri ci  1 so far 
- Set Rp1  e/cp1, where e  1- ? Ri ci 
- give the (p1)-st child any leftover work 
- Set R0  1/w0 
- Keep the roots processor busy 
w0
Ck
C1
...
wk
w1
Constraints Ri  1/wi i  0 ... k ? Ri 
ci  1 
 16New law of efficient management
- Delegate work to whomever it takes you the least 
 time to explain the problem to!
- Provided workers desk isnt overloaded. 
- It doesnt matter if that person is a slow 
 worker.
- Of course, slow workers will have full desktops 
 more often.
17With communication from above 
- Three concurrent activities 
- W0 time to execute task 
- C-1 time to receive task from parent 
- Ci time to send to i-th child 
- New constraints 
- R-1  ? Ri and R-1 c-1  1, where R-1  receive 
 rate
- Solution 
- R-1  min (1/c-1, 1/wo  ? 1/wi  e/c p1)
C-1
w0
Ck
C1
...
wk
w1
solution for one-level tree 
 18Steady-state for tree
Process root last
 My computer
 Internet Gateway
 Cluster Host
Partner site
Super- computer   
Reduce fork graphs to single node 
 19Steady-state for tree
Process root last
 My computer
 Internet Gateway
 Cluster Host
Summary node
Super- computer   
Reduce fork graphs to single node 
 20Steady-state for tree
Process root last
 My computer
 Internet Summary
 Cluster Summary
Reduce fork graphs to single node 
 21Example
 3
A
1
2
First find equivalent work-time for subtree
 2
 6
B
C
1
 2
D
Subtrees rate is 1/6  1/2  2/3, equivalent to 
node with w  3/2 
 22Example
 3
A
1
2
Replace subtree with equivalent node
 2
1.5
B
C
Subtrees rate is 1/6  1/2  2/3, equivalent to 
node with w  3/2 
 23Example
 3
A
1
2
 2
1.5
B
C
Solve root tree Keep C busy (i.e. RC   
1/1.5) e  1-1/1.5  1/3 RB  e/2  1/6 rate  
1/3  1/1.5  1/6  7/6 
 24Base Model Notation
- Three concurrent activities 
- W0 time to do one task 
- C-1 time to receive task from parent 
- Ci time to send to i-th child (only one at a time)
c-1
to parent
receive
things stacked vertically can be done concurrently
w0
nodes processor
send
c1
ck
...
wk
w1
children 
 25Concurrent receive model
- Two concurrent activities 
- Receive task from parent. 
- EITHER send to i-th child OR execute one task
c-1
receive
w0
send
c1
ck
...
w1
wk 
 26Reduce to previous model
- Replace nodes processor by a new child.
c-1
c-1
receive
receive
8
w0
send
send
c1
ck
w0
ck
c1
w1
wk
w0
wk
w1
base model
concurrent receive model 
 27Concurrent work model
- 2 concurrent activities execute and one 
 communication
c-1
c-1
receive
w0
8
receive
send
send
c1
ck
c-1
c-1ck
c-1c1
w1
wk
wk
w1
w0
base model
concurrent receive model 
 28Other models have similar reductions
w0
receive
send
receive
w0
send
send
send
w0
receive
Fully parallel model
send
 Concurrent send model 
 29Revised law of efficient management(if you cant 
work and delegate at the same time)
- Delegate work to whomever it takes you the least 
 time to explain the problem to!
- Provided workers desk isnt overloaded. 
- Do it yourself only if thats faster than 
 explaining it to any available worker.
- And fire anyone who isnt working when you are! 
- Or reassign them to a better communicating or 
 slower-working manager
30OGO model of communication
- Recall LogP model Latency Overhead Gap 
- Latency is irrelevant to steady-state 
- We allow different Overheads at each end 
- OGO model 
- channel can send one task every G time units 
- nodes processor is interrupted for time O 
- Easy to extend our methods 
- use no-concurrency model with Os 
- take min with 1/G
31Flaky example
- Communicate with Alice via fax 
- takes me 5 minutes to send message 
- ties up the fax machine for 2 minutes 
- Communicate with Bob via US mail 
- takes me 1 minute to address letter 
- doesnt reach him for 3 days 
- Communicate with Carol via courier 
- takes me 30 seconds to summon courier 
- can only deliver two tasks per day
32Bandwidth-centric scheduling
- Prioritize children by communication times 
- Request tasks from parent 
- initially, request enough to fill your buffers 
- make more requests as you delegate  execute 
- On receiving a task 
- execute locally if idle 
- satisfy childrens request according to 
 priorities
- Occasionally re-adjust priorities 
33Simulations 
- 100 randomly-generated trees 
- up to 5 children per node 
- up to 10 non-leaf nodes 
- Three strategies 
- All children equal priority (first-come, 
 first-serve)
- Theorems subset equal priority 
- Theorems subset low priority to last child 
- Buffer for one task per node (buffer starts full) 
- Buffer four tasks per node
34Start-up issues
- Bandwidth-centric is optimal within an additive 
 constant
- Can execute N tasks in N/w-1  k time 
- Bound on start-up and clean-up time 
- tree depth x longest nodes steady-state period 
- Bound on period 
- lcm(w0, w1, ..., wp, c-1, cp1) 
35Start-up
 4
A
2
1
 8
 2
Steady- State
C
B
Startup
8
D
 2
A compute
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
Steady-state 8 tasks every 8 time units 
 36Modeling networks as trees
- Leiserson Fat Trees 
- recursively partition graph into 2 parts. 
- Alpern, Carter, Ferrante PMH model 
- group nodes connected with fast links as a new 
 node with original ones children.
- successively relax definition of fast. 
- Shao ENV model 
- measure bandwidth to you group ones similar 
 speeds.
37Summary
- Explicit solution to steady-state throughput 
- heterogeneous speeds 
- heterogeneous models 
- Simple computation 
- uniform tasks 
- no dependences 
- Suggests simple dynamic scheduling 
38Current work
- Application model 
- Bag of tasks 
- Performance model 
- Applications workflow 
- Type of schedule 
- Application 
- Grid model 
- Very heterogeneous tree
39Future work
- Application model 
- Bag of tasks -gt parallel  pipeline unequal 
 sizes
- Performance model 
- Applications workflow -gt total execution time 
- Type of schedule 
- Application 
- Grid model 
- Very heterogeneous tree -gt memory  latency