Title: Online Prediction of the Running Time Of Tasks
1Online Prediction of the Running Time Of Tasks
- Peter A. Dinda
- Department of Computer ScienceNorthwestern
University - http//www.cs.northwestern.edu/pdinda
2Overview
- Predict running time of task
- Application supplies task size (0.1-10 seconds
currently) - Task is compute-bound (current limit)
- Prediction is a confidence interval
- Expresses prediction error
- Statistically valid decision-making in scheduler
- Based on host load prediction
- Homogenous Digital Unix hosts (current limit)
- System is portable to many operating systems
Everything in talk is publicly available
3Outline
- Running time advisor
- Host load results
- Computing confidence intervals
- Performance evaluation
- Related work
- Conclusions
4A Universal Challenge in High Performance
Distributed Applications
- Highly variable resource availability
- Shared resources
- No reservations
- No globally respected priorities
- Competition from other users - background
workload - Running time can vary drastically
- Adaptation example goal soft real-time for
interactivity example mechanism server
selection - Performance queries
5Running Time Advisor (RTA)
background workload
What will be the running time of this 3 second
task if started now?
App
It will be 5.3 seconds
Host
nominal time running time on empty host, task
size
- Entirely user-level tool
- No reservations or admission control
- Query result is a prediction
6Variability and Prediction
Prediction
resource
High Resource Availability Variability
t
Low Prediction Error Variability
Predictor
resource
error
t
t
Characterization of variability
ACF
t
Exchange high resource availability
variability for low prediction error variability
and a characterization of that variability
7Running Time Advisor (RTA)
background workload
With 95 confidence, what will be the running
time of this 3 second task if started now?
App
It will be 4.1 to 6.3 seconds
Host
CI captures prediction error to the extentthe
application is interested in it Independent of
prediction techniques
8RTA API
9Outline
- Running time advisor
- Host load results
- Computing confidence intervals
- Performance evaluation
- Related work
- Conclusions
10Host Load Traces
- DEC Unix 5 second exponential average
- Full bandwidth captured (1 Hz sample rate)
- Long durations
http//www.cs.northwestern.edu/pdinda/LoadTraces
11Host Load Properties
- Self-similarity
- long-range dependence
- Epochal behavior
- non-stationarity
- Complex correlation structureLCR 98,
Scientific Programming, 34, 1999
12Host Load Prediction
- Fully randomized study on traces
- MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models
- AR(16) models most appropriate
- Covariance matrix for prediction errors
- Low overhead lt1 CPU
- HPDC 99, Cluster Computing, 34, 2000
13RPS Toolkit
- Extensible toolkit for implementing resource
signal prediction systems - Easy buy-in for users
- C and sockets (no threads)
- Prebuilt prediction components
- Libraries (sensors, time series, communication)
- Users have bought in
- Incorporated in CMU Remos, BBN QuO
CMU-CS-99-138
http//www.cs.northwestern.edu/RPS
14Outline
- Running time advisor
- Host load results
- Computing confidence intervals
- Performance evaluation
- Related work
- Conclusions
15A Model of the Unix Scheduler
Nominal running time
Task tnom
Background workload
Unix Scheduler
Actual running time
Task tact
Actual Load ltztgt
16A Model of the Unix Scheduler
Nominal running time
Task tnom
Background workload
Unix Scheduler
Predicted running time
gt
Task texp
Predicted Load ltztgt
gt
texp g(tnom,ltztgt) tact Error
17Available Time and Average Load
Available time from 0 to t
Average load from 0 to t
Load Signal replace with prediction of load
signal
tact is minimum t where at(t)tnom Fluid model,
Processor Sharing,Idealized Round-Robin,
18Discrete Time
- No magic here this is the obvious
discretization - is the sample interval
- ztj replaced with prediction
19Confidence Intervals
gt
gt
gt
gt
ztj replaced with ztj in prediction, giving
ali, ati, at(t)
gt
gt
Confidence interval for at(t) is a CI for ali
prediction errors
Since this is a sum, the central limit theorem
applies
Then a 95 confidence interval is
20The Variance of the Sum
- Prediction errors atj are not independent
- Predictors covariance matrix captures this
- Predictor makes it possible
- to compute this variance and thus the CI
- Important detail load discounting
21Outline
- Running time advisor
- Host load results
- Computing confidence intervals
- Performance evaluation
- Related work
- Conclusions
22Experimental Setup
- Environment
- Alphastation 255s, Digital Unix 4.0
- Workload host load trace playback LCR 2000
- Prediction system on each host
- AR(16), MEAN, LAST
- Tasks
- Nominal time U(0.1,10) seconds
- Interarrival time U(5,15) seconds
- 95 confidence level
- Methodology
- Predict CIs
- Run task and measure
http//www.cs.northwestern.edu/pdinda/LoadTraces/
playload
23Metrics
- Coverage
- Fraction of testcases within confidence interval
- Ideally should equal the target 95
- Span
- Average length of confidence interval
- Ideally as short as possible
- R2 between texp and tact
24General Picture of Results
- Five classes of behavior
- Ill show you two
- RTA Works
- Coverage near 95 in most cases is possible
- Predictor quality matters
- Better predictors lead to smaller spans on
lightly loaded hosts and to correct coverage on
heavily loaded hosts - AR(16) gt LAST gt MEAN
- Performance is slightly dependent on nominal time
25Most Common Coverage Behavior
26Most Common Span Behavior
27Uncommon Coverage Behavior
28Uncommon Span Behavior
29Related Work
- Distributed interactive applications
- QuakeViz/ Dv, Aeschlimann PDPTA99
- Quality of service
- QuO, Zinky, Bakken, Schantz TPOS, April 97
- QRAM, Rajkumar, et al RTSS97
- Distributed soft real-time systems
- Lawrence, Jensen assorted
- Workload studies for load balancing
- Mutka, et al PerfEval 91
- Harchol-Balter, et al SIGMETRICS 96
- Resource signal measurement systems
- Remos HPDC98
- Network Weather Service HPDC97, HPDC99
- Host load prediction
- Wolski, et al HPDC99 (NWS)
- Samadani, et al PODC95
- Hailperin 93
- Application-level scheduling
- Berman, et al HPDC96
30Conclusions
- Predict running time of compute-bound task
- Based on host load prediction
- Prediction is a confidence interval
- Confidence interval algorithm
- Covariance matrix
- Load discounting
- Effective for domain
- Digital Unix, 0.1-10 second tasks, 5-15 second
interarrival - Extensions in progress
31For More Information
- All software and traces are available
- RPS RTA RTSA http//www.cs.northwestern.edu/R
PS - Load Traces and playbackhttp//www.cs.northwester
n.edu/pdinda/LoadTraces - Prescience Lab
- Peter Dinda, Jason Skicewicz, Dong Lu
- http//www.cs.northwestern.edu/plab
32Outline
- Running time advisor
- Host load results
- Computing confidence intervals
- Performance evaluation
- Related work
- Conclusions
33A Universal Problem
Which host should the application send the task
to so that its running time is appropriate?
?
Task
Example Real-time
Known resource requirements
What will the running time be if I...
34Running Time Advisor
Predicted Running Time
Application notifies advisor of tasks
computational requirements (nominal time) Advisor
predicts running time on each host Application
assigns task to most appropriate host
?
Task
nominal time
35Real-time Scheduling Advisor
Application specifies tasks computational
requirements (nominal time) and its
deadline Advisor acquires predicted task running
times for all hosts Advisor chooses one of the
hosts where the deadline can be met
Predicted Running Time
deadline
?
Task
nominal time
deadline
36Confidence Intervals to Characterize Variability
3 to 5 seconds with 95 confidence
Application specifies confidence level (e.g.,
95) Running time advisor predicts running times
as a confidence interval (CI) Real-time
scheduling advisor chooses host where CI is less
than deadline CI captures variability to the
extent the application is interested in it
Predicted Running Time
deadline
?
Task
nominal time
deadline
95 confidence
37Prototype System
This Paper
38Load Discounting Motivation
- I/O priority boost
- Short tasks less effected by load
39Load Discounting
- Apply before using
load predictions - tdiscount is estimatable machine property