Evaluating Task Assignment Policies for Distributed Supercomputing Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Task Assignment Policies for Distributed Supercomputing Servers

Description:

Runtime-Based-E. Which TAP is best according to literature? 6. Simulation Setup. Runtimes are taken from PSC's. Cray J90 and C90 traces. Arrival times are ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 26
Provided by: rob1129
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Task Assignment Policies for Distributed Supercomputing Servers


1
Evaluating Task Assignment Policies for
Distributed Supercomputing Servers
Bianca Schroeder, Mor Harchol-Balter Computer
Science Dept Carnegie Mellon University www.cs.cm
u.edu/bianca,harchol
2
The Distributed Server Model
Task Assignment Policy rule for assigning jobs
to hosts
  • Jobs are processed First-Come-First-Serve
  • Jobs are run to completion
  • Users provide upper bounds on runtime.


Motivation Xolas, Pleiades, NASA Ames and PSC
distributed server
3
Commonly used TAPs
  • 1. Random
  • 2. Round-Robin
  • 3. Shortest-Queue
  • Send job to host with fewest number jobs.
  • 4. Least-Work-Left
  • Send job to host with
  • least total work left.
  • Runtime-Based-E
  • Separate jobs by runtimes equal expected
    load.

4
What is a good TAP?
  • We want to minimize
  • 1. mean response time.
  • mean slowdown.
  • 3. variance in slowdown.

Additionally, desire fairness.
5
Which TAP is best according to literature?
  • Round-Robin
  • Random
  • Shortest-Queue
  • 4. Least-Work-Left
  • 5.

Optimal for exponentially- distributed
runtimes. Wolff 1989
Runtime-Based-E
Better for heavy-tailed runtime
distributions. Harchol-Balter 1998
6
Simulation Setup
  • Runtimes are taken from PSCs
  • Cray J90 and C90 traces.
  • Arrival times are
  • The system has 2 or more
  • hosts.
  • A. Poisson i.i.d.
  • B. taken from traces.

7
Simulation Results for Slowdown
Random
LWL
Slowdown
Runtime-Based
System Load
8
Simulation Results for Variance of Slowdown
Random
Variance
LWL
Runtime-Based
1
System Load
9
WHY does Runtime-Based work so well?
Recall, P-K formula for M/G/1 queue
FCFS
Second moment of Runtime Distribution
Mean Waiting Time
Runtime-Based reduces variance of runtime
distribution at the hosts. No other policy does
this!
10
Simulation Results for Slowdown
Random
LWL
Slowdown
Runtime-Based
System Load
11
Is balancing load optimal?
All policies we have seen so far balance load.


12
New Load Unbalancing
Runtime-Based-U
13
Simulation results for Runtime-Based-U Slowdown
Slowdown
Runtime-Based-E
Runtime-Based-U-fair
Runtime-Based-U-opt
System Load
14
Simulation results for Runtime-Based-UVariance
in slowdown
Variance
Runtime-Based-E
Runtime-Based-U-fair
Runtime-Based-U-opt
System Load
15
Why does Runtime-Based-U work so well?
  • Like Runtime-Based-E, it reduces
  • the variance in job sizes.
  • It unbalances load.

16
How unbalanced is the load under Runtime-Based-U?
Runtime-Based-E
Runtime-Based-U-fair
Fraction of total load going to host 1
Runtime-Based-U-opt
System Load
17
Difficulties for runtime-based policies
  • Knowing runtimes.
  • Finding cutoffs.
  • Simple calculation using
  1. Downey 1997
  2. Gibbons 1997
  3. Smith et al. 1998
  1. P-K formula
  2. Only 1/10 of trace data

18
Conclusion
Differences between TAPs are huge! Not intuitive
pre-analysis which TAPs are good!
  • Reducing variance at hosts
  • is important.
  • Load unbalancing may be
  • better than load balancing.
  • Penalizing long jobs may
  • actually be fair.

19
Simulation Results for Slowdown
Slowdown
System Load
20
Simulation Results for Slowdown
Slowdown
System Load
21
Simulation results for scaled interarrival times
22
Simulation results for scaled interarrival times
23
Simulation results for more than 2 hosts
Slowdown
Hosts
24
The SITA-E algorithmSize Interval Task
Assignment with Equal Load
S
Host 1
M
Host 2
Outside Arrivals
L
Host 3
XL
Host 4
The cutoffs are chosen as to balance the
load at the hosts.
25
How do you find the optimal or fair cutoff?
  • Fix the search space for cutoffs.
  • For each potential cutoff, use
  • to determine the expected slowdown.
  • 3. Pick the best cutoff for your metric.

a) the trace-data b) the P-K-formula.
Write a Comment
User Comments (0)
About PowerShow.com