Grid Prediction and Scheduling with Variance - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Grid Prediction and Scheduling with Variance

Description:

Tuning factor is the 'knob'to turn to decide how conservative a schedule should be ... Use that with performance models to create a conservative schedule ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 68
Provided by: jennifer328
Category:

less

Transcript and Presenter's Notes

Title: Grid Prediction and Scheduling with Variance


1
Grid Prediction and Scheduling with Variance
  • Jennifer M. Schopf
  • Argonne National Lab
  • Feb 27, 2003

2
Scheduling and Prediction on the Grid
  • First step of Grid computing basic
    functionality
  • Run my job
  • Transfer my data
  • Security
  • Next step more efficient use of the resources
  • Scheduling
  • Prediction
  • Monitoring

3
How can theseresources be used effectively?
  • Efficient scheduling
  • Selection of resources
  • Mapping of tasks to resources
  • Allocating data
  • Accurate prediction of performance
  • Good performance prediction modeling techniques

4
Outline
  • Predicting large file transfers
  • Joint with Sudharshan Vazkhudai
  • Scheduling with variance
  • Joint with Fran Berman
  • Variance scheduling with better predictors
  • Joint with Lingyun Yang

5
Grid Scheduling Architecture (GGF)
6
Scheduling
  • Gather information
  • Make a decision
  • Take an action
  • Action can be run a job or transfer a file
  • Replica selection is becoming more common place

7
High Energy PhysicsData Movement
PBytes/sec
100 MBytes/sec
Offline Processor Farm 20 TIPS
100 MBytes/sec
Tier 0
CERN Computer Centre
622 Mbits/sec
or Air Freight
(deprecated)
Tier 1
FermiLab 4 TIPS
France Regional Centre
Italy Regional Centre
Germany Regional Centre
622 Mbits/sec
Tier 2
622 Mbits/sec
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
1 MBytes/sec
Tier 4
Physicist workstations
Image courtesy H. Newman, Caltech and C.
Kesselman, ISI
8
Data Replication
  • Extremely large data sets
  • Distributed storage sites
  • One file may be available from a number of
    different sources
  • Question where is the best source for me to copy
    it from?

9
Replica Selection
  • Why not use something like Network Weather
    Service (NWS) probes?
  • Wolski and Swany, UCSB
  • Logging and prediction
  • Small data transfers
  • CPU load, memory, etc.

10
Predictions of Large File Transfers
  • Large file transfers dont look like small file
    transfers

11
Predicting File Transfers
  • (Work with Sudharshan Vazhkudai)
  • Log GridFTP file transfers
  • Part of Globus Toolkit
  • Allows buffer tuning, parallel streams
  • Defacto standard for grid file transfers
  • Use standard statistical predictions
  • Means, medians, autoregressive techniques

12
Sample of Predictions
13
Evaluating Predictors 1
14
Evaluating Predictors 2
15
Why didnt this work better?
  • Average 15-25 errors
  • GridFTP file transfers are sporadic, but standard
    time series expect periodic nature
  • Current environment may not be captured by latest
    measurements
  • So what if we could add information about the
    background behavior?

16
Using NWS data
17
Some Details
  • Regression techniques expect a 1-to-1 mapping
    between data streams
  • Throw away extra NWS data
  • Fill GridFTP data with last value
  • Fill GridFTP data with average

18
Why stop at just BW data?
  • Disk time is up to 30 of the transfer time, so
    we also looked at combing GridFTP data with I/O
    stat info as well as network weather service data

19
GridFTP, NWS and I/Ostat Results
20
Summary of File Prediction Work
  • Using additional information increases the
    prediction accuracy
  • 7-17 error
  • Weve also worked at predicting the variance

21
Variance
  • On average behavior may be different from
    variance behavior
  • Example
  • Data transfer from A will take 5-7 minutes
  • Data transfer from B will take 3-9 minutes
  • Which to pick?
  • We looked at this question in the context of data
    distributions on shared clusters

22
Scheduling with Variance
  • Scheduling techniques can be developed to make
    use of the dynamic performance characteristics of
    shared resources
  • The approach
  • Structural performance models
  • Stochastic values and predictions
  • Stochastic scheduling techniques
  • Joint work with Fran Berman, UCSD

23
Stochastic Value Parameters
Point Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
24
Successive Over-Relaxation (SOR)
  • Iterative solution to Laplaces equation
  • Typical stencil application
  • Divided into a red pahse and a black phase
  • 2-d grid of data divided into strips

25
SOR
26
Models
27
Dedicated SOR Experiments
  • Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
  • 10 mbit ethernet connection
  • Quiescent machines and network
  • Prediction within 3 before memory spill

28
Non-dedicated SOR results
  • Available CPU on workstations varied from .43 to
    .53

29
Platforms with Scattered Range of CPU Availability
30
Improving structural models
  • Available CPU has range of 0.48 /- 0.05
  • Prediction should also have a range

31
Using Additional Information
  • Point value
  • Bandwidth reported as 7Mbits/sec
  • Single Value
  • Often a best guess, estimate under ideal
    circumstances, or a value accurate only for a
    given time frame
  • Stochastic value
  • Bandwidth reported as 7Mbits /- 2 Mbits
  • A set of possible values weighted by
    probabilities
  • Represents a range of likely behavior

32
Stochastic Structural Models
  • Goal Extend structural models so that resulting
    predictions are distributions
  • Structural model is an equation so
  • Need to represent stochastic information
  • Normal distribution
  • Interval
  • Histogram
  • Need to be able to mathematically combine the
    stochastic values in a timely manner

33
Practical issues when using stochastic data
  • Who/what can supply stochastic data?
  • User
  • Data from past runs
  • On-line measurement tools
  • Network weather service time series data
  • Time frame
  • Given a time series, how much data should we
    consider?

34
Accuracy of stochastic results
  • Result of a stochastic prediction will also be a
    range of values
  • Need to consider how to achieve a tight (sharp)
    interval
  • What to do if interval isnt tight

35
How can I use these predictions in scheduling?
Point Value Parameters
Stochastic Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
36
Using stochastic predictions
  • Simplest scheduling situation Given a data
    parallel application, adjust amount of data
    assigned to each processor to minimize execution
    time

37
Delay in one can cause delay in all
38
Stochastic Scheduling
  • Examine
  • Stochastic data represented as normal
    distributions
  • Data parallel codes
  • Fixed set of shared resources
  • Question How should data be distributed to
    minimize execution time?
  • Approach Adjust data allocation so that a high
    variance machine receives less work in order to
    minimize the effects of contention

39
Time Balancing
  • Minimize execution time by assigning data so that
    each processor finishes at roughly the same time
  • Di data assigned to processor I
  • Ui time per unit of data on processor I
  • Ci time to distributed the data
  • DiUi Ci Dj Uj Cj for all i,j
  • Sum Di Dtotal

40
Stochastic Time Balancing
  • Adapt time to compute a unit of data (ui) to
    reflect stochastic information
  • Larger ui means smaller Di (less data)
  • If we have normal distributions
  • 95 confidence interval corresponds to m-2sd,
    m2sd
  • If we set u m2 sd
  • 95 conservative schedule

41
Stochastic Time Balancing (cont)
  • Set of equations is now
  • Di (mi 2 sdi ) Ci Dj(mj 2 sdj ) Cj
  • for all i, j
  • Sum Di Dtotal

42
How do policies compare in a production
environment
  • 4 contended Sparcs over 10 Mbit shared ethernet

43
Set of Schedules
44
Tuning factor
  • Tuning factor is the knobto turn to decide how
    conservative a schedule should be
  • For example,m used to determine number of
    standard deviations to add to mean
  • Let ui mi sdiTF
  • Solve
  • Di (mi sdiTF) Ci Dj (mjsdjTF) Cj

45
Extensible approach
  • Dont have to use mean and standard deviation
  • TF can be defined in a variety of ways

46
Defining our stochastic scheduling policy goals
  • Decrease execution time
  • Predictable performance
  • Avoid spikes in execution behavior
  • More conservative when in doubt

47
System of benefits and penalties
  • Based on Sih and Lees approach to scheduling
  • Benefit (give a less conservative schedule to)
  • Platforms with fewer varying machines
  • Low variance machines, especially those with
    lower power

48
Partial ordering
49
Algorithm for TF
50
Scheduling Experiments
  • Platform-
  • 4 contended PCs running Linux
  • 100 mbit shared ethernet connection
  • 3 policies run back to back
  • Mean Ui based on runtime mean pred.
  • VTF Ui based on mean and heuristic TF
    evaluation
  • 95TF Ui based on 95 conf. interval

51
Metrics
  • Window Which of each window of three runs has
    fastest execution time?
  • Compare How often was one policy better than,
    worse than, or split when compared with the
    policy run just before and just after
  • Whats the right metric?

52
SOR- scheduling 1
  • Window Mean 9, CTF 27, 95TF 22 (of 57)
  • Compare Better Mixed Worse
  • Mean 3 4 12
  • VTF 10 7 3
  • 95TF 6 9 4

53
CPU performance
54
SOR 2
SOR- scheduling 2
  • Window Mean 8, VTF 39, 95TF 11 (of 57)
  • Compare Better Mixed Worse
  • Mean 3 7 9
  • VTF 15 2 3
  • 95TF 3 8 8

55
CPU
56
Experimental Conclusions
  • Stochastic information was more beneficial when
    there was a higher variability in available CPU
  • Almost always we saw a reduction in variation in
    actual execution times
  • Unclear when it is better to use which heuristic
    scheduling policy at this point

57
What if we had a better predictor?
  • Create a predictor for the average CPU load for
    some future time interval, and variation of CPU
    load over some future time interval
  • Use that with performance models to create a
    conservative schedule
  • (joint work with Lingyun Yang, UC, and Ian
    Foster, UC/ANL)

58
Mixed Tendency Prediction
  • // Determine Tendency
  • if ((VT-1 - VT )lt0)
  • TendencyIncrease
  • else if ((VT - VT-1)lt0)
  • TendencyDecrease
  • if (TendencyIncrease) thenPT1 VT
    IncrementConstant
  • IncrementConstant adaptation process
  • else if (TendencyDecrease) thenPT1 VT
    VTDecrementFactor
  • DecrementFactor adaptation process
  • IncrementConstant is set initially to 0.1
  • DecrementFactor is set to 0.01

59
Comparison to NWS
Mixed tendency prediction strategy outperforms
the NWS predictors on all of the 38 CPU load time
series with different properties. It achieves a
prediction error that is 36 lower on average
than that achieved by NWS.
60
  • Add in slide tying this back into scheduling

61
Apply the new predictorto aggregated load
information
62
Apply new predictorto standard deviation data
63
Use this data inconservative scheduling algorithm
  • Cactus Application a simulation of a 3D scalar
    field produced by two orbiting astrophysical
    sources.
  • Data distribution based on
  • Ei(Di) start_up
  • (DiCompi(0) Commi(0)) effective CPU load
  • Open question how to define effective CPU load?

64
Compare Different Approaches
  • (1) One Step Scheduling(OSS) Use the
    one-step-ahead prediction of the CPU load
  • (2) Predicted Mean Interval Scheduling (PMIS)
    Use the interval load prediction
  • (3) Conservative Scheduling (CS) Use the
    conservative load prediction - interval load
    prediction added to a measure of the predicted
    variance .
  • (4) History Mean Scheduling (HMS) Use the mean
    of the history CPU load for the 5 min preceding
    the application start. This approximates the
    estimates used in other approaches.
  • (5) History Conservative Scheduling (HCS) Use
    the conservative estimate CPU load - add the mean
    and variance of the history CPU load collected
    for 5 minutes preceding the application run as
    the effective CPU load. This approximates Schopf
    and Berman.

65
Results
66
Translate abbrev into policies better
67
Comparing Policies
 
68
Average Mean and Average Standard Deviation
69
Summary
  • Variance information gives us stochastic values
    to help meet the prediction needs of Grid
    computing
  • A stochastic scheduling policy that can make use
    of predictions to achieve better execution times
    and more predictable application behavior
  • Better predictions of stochastic values will
    result in better policies

70
Collaborators/References
  • Sudharshan Vazhkudai (ANL/MS State)
  • IPDPS 2002, HPDC 2002, Grid2002
  • The AppLeS group Fran Berman (UCSD), Rich
    Wolski (UCSB)
  • SC99, Schopf Thesis UCSD 1998
  • Lingyun Yang (University of Chicago) and Ian
    Foster (UC, ANL)
  • IPDPS 2003, submitted to HPDC 2003

71
  • Add future work

72
Contact Information
  • Jennifer Schopf
  • jms_at_mcs.anl.gov
  • http//www.mcs.anl.gov.edu/jms
Write a Comment
User Comments (0)
About PowerShow.com