Bandwidth-centric scheduling of independent tasks on a heterogeneous grid - PowerPoint PPT Presentation

About This Presentation
Title:

Bandwidth-centric scheduling of independent tasks on a heterogeneous grid

Description:

ck. PCL seminar, July 2001. 26. Reduce to previous model. Replace node's processor by a new child. ... c-1 ck. c-1. c-1 c1. PCL seminar, July 2001. 28. Other ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 40
Provided by: car72
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Bandwidth-centric scheduling of independent tasks on a heterogeneous grid


1
Bandwidth-centric schedulingof independent
taskson a heterogeneous grid
  • Olivier Beaumont, ENS
  • Larry Carter, UCSD
  • Jeanne Ferrante, UCSD
  • Arnaud Legrand, ENS
  • Yves Robert, ENS

2
Grid Computing
  • Distributed heterogeneous computing
  • Large number of independent tasks
  • Data begins and ends at specific site
  • Examples
  • SETI _at_ home
  • Factoring numbers
  • Animated films
  • Drug screening

3
Computational grid
My computer
Intermediate nodes can compute too
Data starts here
Internet Gateway
Cluster Host
Partner site
Super- computer



Participating PCs and workstations
4
Base Model Node
  • Processor takes W0 time to do one task
  • Takes C-1 time to receive task from parent
  • Takes Ci time to send task to i-th child
  • These 3 activities can be done concurrently
  • but only one send at a time

5
Example
A is the root of the tree all tasks start at A
3
A
Time for sending one task from A to B
1
2
Time for computing one task in C
2
6
B
C
1
2
D
Examples assume no communication of results back
to root
6
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
7
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
8
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
9
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
10
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
11
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
12
Example
3
A
1
2
2
6
B
C
A compute
1
A send
2
D
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Time?
13
Steady-state
3
A
1
2
Repeated pattern
6
2
C
B
Startup
Clean-up
1
A compute
2
D
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
1
2
3
Steady-state 7 tasks every 6 time units
Total time 16 tasks in 16 time units
14
Steady State Problem
w0
  • One-level fork graph node k leaves
  • Concurrent activities
  • w0 time to execute task
  • ci time to send to i-th child (only one at a
    time)
  • Let Ri denote steady-state rates
  • R0 tasks/second executed in node
  • Ri tasks/second sent to and done by child i
  • Constraints
  • ? Ri ci 1
  • Ri 1/wi for i 0, ..., k

Ck
C1
...
wk
w1
15
Solution
  • Sort by communication times c1 c2 ...
  • Find largest p such that ? ci/wi 1
  • For i 1, ..., p, set Ri 1/wi
  • keep the first p children busy
  • note that ? Ri ci 1 so far
  • Set Rp1 e/cp1, where e 1- ? Ri ci
  • give the (p1)-st child any leftover work
  • Set R0 1/w0
  • Keep the roots processor busy

w0
Ck
C1
...
wk
w1
Constraints Ri 1/wi i 0 ... k ? Ri
ci 1
16
New law of efficient management
  • Delegate work to whomever it takes you the least
    time to explain the problem to!
  • Provided workers desk isnt overloaded.
  • It doesnt matter if that person is a slow
    worker.
  • Of course, slow workers will have full desktops
    more often.

17
With communication from above
  • Three concurrent activities
  • W0 time to execute task
  • C-1 time to receive task from parent
  • Ci time to send to i-th child
  • New constraints
  • R-1 ? Ri and R-1 c-1 1, where R-1 receive
    rate
  • Solution
  • R-1 min (1/c-1, 1/wo ? 1/wi e/c p1)

C-1
w0
Ck
C1
...
wk
w1
solution for one-level tree
18
Steady-state for tree
Process root last
My computer
Internet Gateway
Cluster Host
Partner site
Super- computer



Reduce fork graphs to single node
19
Steady-state for tree
Process root last
My computer
Internet Gateway
Cluster Host
Summary node
Super- computer



Reduce fork graphs to single node
20
Steady-state for tree
Process root last
My computer
Internet Summary
Cluster Summary
Reduce fork graphs to single node
21
Example
3
A
1
2
First find equivalent work-time for subtree
2
6
B
C
1
2
D
Subtrees rate is 1/6 1/2 2/3, equivalent to
node with w 3/2
22
Example
3
A
1
2
Replace subtree with equivalent node
2
1.5
B
C
Subtrees rate is 1/6 1/2 2/3, equivalent to
node with w 3/2
23
Example
3
A
1
2
2
1.5
B
C
Solve root tree Keep C busy (i.e. RC
1/1.5) e 1-1/1.5 1/3 RB e/2 1/6 rate
1/3 1/1.5 1/6 7/6
24
Base Model Notation
  • Three concurrent activities
  • W0 time to do one task
  • C-1 time to receive task from parent
  • Ci time to send to i-th child (only one at a time)

c-1
to parent
receive
things stacked vertically can be done concurrently
w0
nodes processor
send
c1
ck
...
wk
w1
children
25
Concurrent receive model
  • Two concurrent activities
  • Receive task from parent.
  • EITHER send to i-th child OR execute one task

c-1
receive
w0
send
c1
ck
...
w1
wk
26
Reduce to previous model
  • Replace nodes processor by a new child.

c-1
c-1
receive
receive
8
w0
send
send
c1
ck
w0
ck
c1
w1
wk
w0
wk
w1
base model
concurrent receive model
27
Concurrent work model
  • 2 concurrent activities execute and one
    communication

c-1
c-1
receive
w0
8
receive
send
send
c1
ck
c-1
c-1ck
c-1c1
w1
wk
wk
w1
w0
base model
concurrent receive model
28
Other models have similar reductions
w0
receive
send
receive
w0
  • Fully sequential model

send
send
send
w0
receive
Fully parallel model
send
Concurrent send model
29
Revised law of efficient management(if you cant
work and delegate at the same time)
  • Delegate work to whomever it takes you the least
    time to explain the problem to!
  • Provided workers desk isnt overloaded.
  • Do it yourself only if thats faster than
    explaining it to any available worker.
  • And fire anyone who isnt working when you are!
  • Or reassign them to a better communicating or
    slower-working manager

30
OGO model of communication
  • Recall LogP model Latency Overhead Gap
  • Latency is irrelevant to steady-state
  • We allow different Overheads at each end
  • OGO model
  • channel can send one task every G time units
  • nodes processor is interrupted for time O
  • Easy to extend our methods
  • use no-concurrency model with Os
  • take min with 1/G

31
Flaky example
  • Communicate with Alice via fax
  • takes me 5 minutes to send message
  • ties up the fax machine for 2 minutes
  • Communicate with Bob via US mail
  • takes me 1 minute to address letter
  • doesnt reach him for 3 days
  • Communicate with Carol via courier
  • takes me 30 seconds to summon courier
  • can only deliver two tasks per day

32
Bandwidth-centric scheduling
  • Prioritize children by communication times
  • Request tasks from parent
  • initially, request enough to fill your buffers
  • make more requests as you delegate execute
  • On receiving a task
  • execute locally if idle
  • satisfy childrens request according to
    priorities
  • Occasionally re-adjust priorities

33
Simulations
  • 100 randomly-generated trees
  • up to 5 children per node
  • up to 10 non-leaf nodes
  • Three strategies
  • All children equal priority (first-come,
    first-serve)
  • Theorems subset equal priority
  • Theorems subset low priority to last child
  • Buffer for one task per node (buffer starts full)
  • Buffer four tasks per node

34
Start-up issues
  • Bandwidth-centric is optimal within an additive
    constant
  • Can execute N tasks in N/w-1 k time
  • Bound on start-up and clean-up time
  • tree depth x longest nodes steady-state period
  • Bound on period
  • lcm(w0, w1, ..., wp, c-1, cp1)

35
Start-up
4
A
2
1
8
2
Steady- State
C
B
Startup
8
D
2
A compute
A send
B receive
B compute
C receive
C compute
C send
D receive
D compute
Steady-state 8 tasks every 8 time units
36
Modeling networks as trees
  • Leiserson Fat Trees
  • recursively partition graph into 2 parts.
  • Alpern, Carter, Ferrante PMH model
  • group nodes connected with fast links as a new
    node with original ones children.
  • successively relax definition of fast.
  • Shao ENV model
  • measure bandwidth to you group ones similar
    speeds.

37
Summary
  • Explicit solution to steady-state throughput
  • heterogeneous speeds
  • heterogeneous models
  • Simple computation
  • uniform tasks
  • no dependences
  • Suggests simple dynamic scheduling

38
Current work
  • Application model
  • Bag of tasks
  • Performance model
  • Applications workflow
  • Type of schedule
  • Application
  • Grid model
  • Very heterogeneous tree

39
Future work
  • Application model
  • Bag of tasks -gt parallel pipeline unequal
    sizes
  • Performance model
  • Applications workflow -gt total execution time
  • Type of schedule
  • Application
  • Grid model
  • Very heterogeneous tree -gt memory latency
Write a Comment
User Comments (0)
About PowerShow.com