Grid Prediction and Scheduling with Variance

About This Presentation

Title:

Grid Prediction and Scheduling with Variance

Description:

Tuning factor is the 'knob'to turn to decide how conservative a schedule should be ... Use that with performance models to create a conservative schedule ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 68

Provided by: jennifer328

Category:

more less

Transcript and Presenter's Notes

Title: Grid Prediction and Scheduling with Variance

1
Grid Prediction and Scheduling with Variance

Jennifer M. Schopf
Argonne National Lab
Feb 27, 2003

2
Scheduling and Prediction on the Grid

First step of Grid computing basic
functionality
Run my job
Transfer my data
Security
Next step more efficient use of the resources
Scheduling
Prediction
Monitoring

3
How can theseresources be used effectively?

Efficient scheduling
Selection of resources
Mapping of tasks to resources
Allocating data
Accurate prediction of performance
Good performance prediction modeling techniques

4
Outline

Predicting large file transfers
Joint with Sudharshan Vazkhudai
Scheduling with variance
Joint with Fran Berman
Variance scheduling with better predictors
Joint with Lingyun Yang

5
Grid Scheduling Architecture (GGF)
6
Scheduling

Gather information
Make a decision
Take an action
Action can be run a job or transfer a file
Replica selection is becoming more common place

7
High Energy PhysicsData Movement
PBytes/sec
100 MBytes/sec
Offline Processor Farm 20 TIPS
100 MBytes/sec
Tier 0
CERN Computer Centre
622 Mbits/sec
or Air Freight
(deprecated)
Tier 1
FermiLab 4 TIPS
France Regional Centre
Italy Regional Centre
Germany Regional Centre
622 Mbits/sec
Tier 2
622 Mbits/sec
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
1 MBytes/sec
Tier 4
Physicist workstations
Image courtesy H. Newman, Caltech and C.
Kesselman, ISI
8
Data Replication

Extremely large data sets
Distributed storage sites
One file may be available from a number of
different sources
Question where is the best source for me to copy
it from?

9
Replica Selection

Why not use something like Network Weather
Service (NWS) probes?
Wolski and Swany, UCSB
Logging and prediction
Small data transfers
CPU load, memory, etc.

10
Predictions of Large File Transfers

Large file transfers dont look like small file
transfers

11
Predicting File Transfers

(Work with Sudharshan Vazhkudai)
Log GridFTP file transfers
Part of Globus Toolkit
Allows buffer tuning, parallel streams
Defacto standard for grid file transfers
Use standard statistical predictions
Means, medians, autoregressive techniques

12
Sample of Predictions
13
Evaluating Predictors 1
14
Evaluating Predictors 2
15
Why didnt this work better?

Average 15-25 errors
GridFTP file transfers are sporadic, but standard
time series expect periodic nature
Current environment may not be captured by latest
measurements
So what if we could add information about the
background behavior?

16
Using NWS data
17
Some Details

Regression techniques expect a 1-to-1 mapping
between data streams
Throw away extra NWS data
Fill GridFTP data with last value
Fill GridFTP data with average

18
Why stop at just BW data?

Disk time is up to 30 of the transfer time, so
we also looked at combing GridFTP data with I/O
stat info as well as network weather service data

19
GridFTP, NWS and I/Ostat Results
20
Summary of File Prediction Work

Using additional information increases the
prediction accuracy
7-17 error
Weve also worked at predicting the variance

21
Variance

On average behavior may be different from
variance behavior
Example
Data transfer from A will take 5-7 minutes
Data transfer from B will take 3-9 minutes
Which to pick?
We looked at this question in the context of data
distributions on shared clusters

22
Scheduling with Variance

Scheduling techniques can be developed to make
use of the dynamic performance characteristics of
shared resources
The approach
Structural performance models
Stochastic values and predictions
Stochastic scheduling techniques
Joint work with Fran Berman, UCSD

23
Stochastic Value Parameters
Point Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
24
Successive Over-Relaxation (SOR)

Iterative solution to Laplaces equation
Typical stencil application
Divided into a red pahse and a black phase
2-d grid of data divided into strips

25
SOR
26
Models
27
Dedicated SOR Experiments

Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
10 mbit ethernet connection
Quiescent machines and network
Prediction within 3 before memory spill

28
Non-dedicated SOR results

Available CPU on workstations varied from .43 to
.53

29
Platforms with Scattered Range of CPU Availability
30
Improving structural models

Available CPU has range of 0.48 /- 0.05
Prediction should also have a range

31
Using Additional Information

Point value
Bandwidth reported as 7Mbits/sec
Single Value
Often a best guess, estimate under ideal
circumstances, or a value accurate only for a
given time frame

Stochastic value
Bandwidth reported as 7Mbits /- 2 Mbits
A set of possible values weighted by
probabilities
Represents a range of likely behavior

32
Stochastic Structural Models

Goal Extend structural models so that resulting
predictions are distributions
Structural model is an equation so
Need to represent stochastic information
Normal distribution
Interval
Histogram
Need to be able to mathematically combine the
stochastic values in a timely manner

33
Practical issues when using stochastic data

Who/what can supply stochastic data?
User
Data from past runs
On-line measurement tools
Network weather service time series data
Time frame
Given a time series, how much data should we
consider?

34
Accuracy of stochastic results

Result of a stochastic prediction will also be a
range of values
Need to consider how to achieve a tight (sharp)
interval
What to do if interval isnt tight

35
How can I use these predictions in scheduling?
Point Value Parameters
Stochastic Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
36
Using stochastic predictions

Simplest scheduling situation Given a data
parallel application, adjust amount of data
assigned to each processor to minimize execution
time

37
Delay in one can cause delay in all
38
Stochastic Scheduling

Examine
Stochastic data represented as normal
distributions
Data parallel codes
Fixed set of shared resources
Question How should data be distributed to
minimize execution time?
Approach Adjust data allocation so that a high
variance machine receives less work in order to
minimize the effects of contention

39
Time Balancing

Minimize execution time by assigning data so that
each processor finishes at roughly the same time
Di data assigned to processor I
Ui time per unit of data on processor I
Ci time to distributed the data
DiUi Ci Dj Uj Cj for all i,j
Sum Di Dtotal

40
Stochastic Time Balancing

Adapt time to compute a unit of data (ui) to
reflect stochastic information
Larger ui means smaller Di (less data)
If we have normal distributions
95 confidence interval corresponds to m-2sd,
m2sd
If we set u m2 sd
95 conservative schedule

41
Stochastic Time Balancing (cont)

Set of equations is now
Di (mi 2 sdi ) Ci Dj(mj 2 sdj ) Cj
for all i, j
Sum Di Dtotal

42
How do policies compare in a production
environment

4 contended Sparcs over 10 Mbit shared ethernet

43
Set of Schedules
44
Tuning factor

Tuning factor is the knobto turn to decide how
conservative a schedule should be
For example,m used to determine number of
standard deviations to add to mean
Let ui mi sdiTF
Solve
Di (mi sdiTF) Ci Dj (mjsdjTF) Cj

45
Extensible approach

Dont have to use mean and standard deviation
TF can be defined in a variety of ways

46
Defining our stochastic scheduling policy goals

Decrease execution time
Predictable performance
Avoid spikes in execution behavior
More conservative when in doubt

47
System of benefits and penalties

Based on Sih and Lees approach to scheduling
Benefit (give a less conservative schedule to)
Platforms with fewer varying machines
Low variance machines, especially those with
lower power

48
Partial ordering
49
Algorithm for TF
50
Scheduling Experiments

Platform-
4 contended PCs running Linux
100 mbit shared ethernet connection
3 policies run back to back
Mean Ui based on runtime mean pred.
VTF Ui based on mean and heuristic TF
evaluation
95TF Ui based on 95 conf. interval

51
Metrics

Window Which of each window of three runs has
fastest execution time?
Compare How often was one policy better than,
worse than, or split when compared with the
policy run just before and just after
Whats the right metric?

52
SOR- scheduling 1

Window Mean 9, CTF 27, 95TF 22 (of 57)
Compare Better Mixed Worse
Mean 3 4 12
VTF 10 7 3
95TF 6 9 4

53
CPU performance
54
SOR 2
SOR- scheduling 2

Window Mean 8, VTF 39, 95TF 11 (of 57)
Compare Better Mixed Worse
Mean 3 7 9
VTF 15 2 3
95TF 3 8 8

55
CPU
56
Experimental Conclusions

Stochastic information was more beneficial when
there was a higher variability in available CPU
Almost always we saw a reduction in variation in
actual execution times
Unclear when it is better to use which heuristic
scheduling policy at this point

57
What if we had a better predictor?

Create a predictor for the average CPU load for
some future time interval, and variation of CPU
load over some future time interval
Use that with performance models to create a
conservative schedule
(joint work with Lingyun Yang, UC, and Ian
Foster, UC/ANL)

58
Mixed Tendency Prediction

// Determine Tendency
if ((VT-1 - VT )lt0)
TendencyIncrease
else if ((VT - VT-1)lt0)
TendencyDecrease
if (TendencyIncrease) thenPT1 VT
IncrementConstant
IncrementConstant adaptation process
else if (TendencyDecrease) thenPT1 VT
VTDecrementFactor
DecrementFactor adaptation process
IncrementConstant is set initially to 0.1
DecrementFactor is set to 0.01

59
Comparison to NWS
Mixed tendency prediction strategy outperforms
the NWS predictors on all of the 38 CPU load time
series with different properties. It achieves a
prediction error that is 36 lower on average
than that achieved by NWS.
60

Add in slide tying this back into scheduling

61
Apply the new predictorto aggregated load
information
62
Apply new predictorto standard deviation data
63
Use this data inconservative scheduling algorithm

Cactus Application a simulation of a 3D scalar
field produced by two orbiting astrophysical
sources.
Data distribution based on
Ei(Di) start_up
(DiCompi(0) Commi(0)) effective CPU load
Open question how to define effective CPU load?

64
Compare Different Approaches

(1) One Step Scheduling(OSS) Use the
one-step-ahead prediction of the CPU load
(2) Predicted Mean Interval Scheduling (PMIS)
Use the interval load prediction
(3) Conservative Scheduling (CS) Use the
conservative load prediction - interval load
prediction added to a measure of the predicted
variance .
(4) History Mean Scheduling (HMS) Use the mean
of the history CPU load for the 5 min preceding
the application start. This approximates the
estimates used in other approaches.
(5) History Conservative Scheduling (HCS) Use
the conservative estimate CPU load - add the mean
and variance of the history CPU load collected
for 5 minutes preceding the application run as
the effective CPU load. This approximates Schopf
and Berman.

65
Results
66
Translate abbrev into policies better
67
Comparing Policies

68
Average Mean and Average Standard Deviation
69
Summary

Variance information gives us stochastic values
to help meet the prediction needs of Grid
computing
A stochastic scheduling policy that can make use
of predictions to achieve better execution times
and more predictable application behavior
Better predictions of stochastic values will
result in better policies

70
Collaborators/References

Sudharshan Vazhkudai (ANL/MS State)
IPDPS 2002, HPDC 2002, Grid2002
The AppLeS group Fran Berman (UCSD), Rich
Wolski (UCSB)
SC99, Schopf Thesis UCSD 1998
Lingyun Yang (University of Chicago) and Ian
Foster (UC, ANL)
IPDPS 2003, submitted to HPDC 2003

Add future work

72
Contact Information

Jennifer Schopf
jms_at_mcs.anl.gov
http//www.mcs.anl.gov.edu/jms

Write a Comment

User Comments (0)

About PowerShow.com

Grid Prediction and Scheduling with Variance - PowerPoint PPT Presentation

Grid Prediction and Scheduling with Variance

Tuning factor is the 'knob'to turn to decide how conservative a schedule should be ... Use that with performance models to create a conservative schedule ... – PowerPoint PPT presentation