The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

About This Presentation

Title:

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

Description:

The Running Time Advisor. A Resource Signal-based Approach to ... Which host should the application send the task to so that its running time is appropriate? ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 50

Provided by: petera45

Learn more at: https://users.cs.northwestern.edu

more less

Transcript and Presenter's Notes

Title: The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

1
The Running Time AdvisorA Resource Signal-based
Approach to Predicting Task Running Time and Its
Applications

Peter A. Dinda
Carnegie Mellon University
http//www.cs.cmu.edu/pdinda

2
High Level Goals
Build systems that use statistics to help
distributed applications adapt to highly variable
resource availability Focus on information

Application-level performance predictions
Running time of compute-bound tasks
Adaptation advice
Host selection to meet soft real-time deadline
Resource signal approach
Host load signals

This Talk
3
Outline

Birds eye view
Adapting to highly variable resource availability
Dv/QuakeViz
Real-time scheduling advisor
Running time advisor
Confidence intervals
Performance results (feasible, practical, useful)
Prototype system
Host load prediction
Traces, structure, linear models, evaluation
RPS Toolkit
Conclusion

4
A Universal Challenge in High Performance
Distributed Applications

Highly variable resource availability
Shared resources
No reservations
No globally respected priorities
Competition from other users - background
workload
Running time can vary drastically
Adaptation

5
A Universal Problem
Which host should the application send the task
to so that its running time is appropriate?
?
Task
Known resource requirements
What will the running time be if I...
6
DV Framework For Distributed Interactive
Visualization

Large datasets (e.g., earthquake simulations)
Distributed VTK visualization pipelines
Active frames
Encapsulate data, computation, path through
pipeline
Launched from server by user interaction
Annotated with deadline
Dynamically chose on which host each pipeline
stage will execute and what quality settings to
use

http//www.cs.cmu.edu/dv
7
Example DV Pipeline for QuakeViz
local display and user
Logical View
resolution
contours
ROI
interpolation
isosurface extraction
Simulation Output
reading
rendering
scene synthesis
interpolation
morphology reconstruction
Physical View
interpolation
isosurface extraction
scene synthesis
deadline
deadline
deadline
Active Frame n2
Active Frame n1
Active Frame n
?
?
?
8
Real-time Scheduling Advisor

Distributed interactive applications
Examples CMU Dv/QuakeViz, BBN OpenMap
Assumptions
Sequential tasks initiated by user actions
Aperiodic arrivals
Resilient deadlines (soft real-time)
Compute-bound tasks
Known computational requirements
Best-effort semantics
Recommend host where deadline is likely to be met
Predict running time on that host
No guarantees

9
Running Time Advisor
Predicted Running Time
Application notifies advisor of tasks
computational requirements (nominal time) Advisor
predicts running time on each host Application
assigns task to most appropriate host
?
Task
nominal time
10
Real-time Scheduling Advisor
Application notifies advisor of tasks
computational requirements (nominal time) and its
deadline Advisor acquires predicted task running
times for all hosts Advisor recommends one of the
hosts where the deadline can be met
Predicted Running Time
deadline
?
Task
nominal time
deadline
11
Variability and Prediction
Prediction
resource
High Resource Availability Variability
t
Low Prediction Error Variability
Predictor
resource
error
t
t
Characterization of variability
ACF
t
Exchange high resource availability
variability for low prediction error variability
and a characterization of that variability
12
Confidence Intervals to Characterize Variability
3 to 5 seconds with 95 confidence
Application specifies confidence level (e.g.,
95) Running time advisor predicts running times
as a confidence interval (CI) Real-time
scheduling advisor chooses host where CI is less
than deadline CI captures variability to the
extent the application is interested in it
Predicted Running Time
deadline
?
Task
nominal time
deadline
95 confidence
13
Confidence Intervals And Predictor Quality
Bad Predictor No obvious choice
Good Predictor Two good choices
Predicted Running Time
Predicted Running Time
deadline
Good predictors provide smaller CIs Smaller CIs
simplify scheduling decisions
14
Overview of Research Results

Predicting CIs is feasible
Host load prediction using AR(16) models
Running time estimation using host load
predictions
Predicting CIs is practical
RPS Toolkit (inc. in CMU Remos, BBN QuO)
Extremely low-overhead online system
Predicting CIs is useful
Performance of real-time scheduling advisor

Measured performance of real system
Statistically rigorous analysis and evaluation
15
Experimental Setup

Environment
Alphastation 255s, Digital Unix 4.0
Workload host load trace playback
Prediction system on each host
Tasks
Nominal time U(0.1,10) seconds
Interarrival time U(5,15) seconds
Methodology
Predict CIs / Host recommendations
Run task and measure

16
Predicting CIs is Feasible
Near-perfect CIs on typical hosts
3000 randomized tasks
17
Predicting CIs is Practical - RPS System
lt2 of CPU At Appropriate Rate
1-2 ms latency from measurement to
prediction 2KB/sec transfer rate
18
Predicting CIs is Useful - Real-time Scheduling
Advisor
Host With Lowest Load
Predicted CI lt Deadline
Random Host
16000 tasks
19
Predicting CIs is Useful - Real-time Scheduling
Advisor
Predicted CI lt Deadline
Host With Lowest Load
Random Host
16000 tasks
20
Outline

Birds eye view
Adapting to highly variable resource availability
Dv/QuakeViz
Real-time scheduling advisor
Running time advisor
Confidence intervals
Performance results (feasible, practical, useful)
Prototype system
Host load prediction
Traces, structure, linear models, evaluation
RPS Toolkit
Conclusion

21
Design Space
Can the gap between the resources and the
application can be spanned? yes!
22
Resource Signals

Characteristics
Easily measured, time-varying scalar quantities
Strongly correlated with resource availability
Periodically sampled (discrete-time signal)
Examples
Host load (Digital Unix 5 second load average)
Network flow bandwidth and latency

Leverage existing statistical signal analysis and
prediction techniques
23
RPS Toolkit

Extensible toolkit for implementing resource
signal prediction systems
Easy buy-in for users
C and sockets (no threads)
Prebuilt prediction components
Libraries (sensors, time series, communication)
Users have bought in
Incorporated in CMU Remos, BBN QuO
Research users Bruce Lowekamp, Nancy Miller,
LeMonte Green

http//www.cs.cmu.edu/pdinda/RPS.html
24
Prototype System
RPS components can be composed in other ways
25
Research Results

Host load on real hosts has exploitable structure
Strong autocorrelation, self-similarity, epochal
behavior
Trace database and host load trace playback
Host load is predictable using simple linear
models
Recommendation AR(16) models or better for 1-30
sec predictions
RPS Toolkit for low overhead systems (lt2 of CPU)
C, ported to 5 OSes, incorporated in CMU Remos,
BBN QuO
Running time CIs can be computed from load
predictions
Load discounting, error covariances
Effective real-time scheduling advice can be
based on CIs
Know if deadline will be met before running task

26
Outline

Birds eye view
Adapting to Highly variable resource availability
Dv/QuakeViz
Real-time scheduling advisor
Running time advisor
Confidence intervals
Performance results (feasible, practical, useful)
Prototype system
Host load prediction
Traces, structure, linear models, evaluation
RPS Toolkit
Conclusion

27
Questions

What are the properties of host load?
Is host load predictable?
What predictive models are appropriate?
Are host load predictions useful?

28
Overview of Answers

Host load exhibits complex behavior
Strong autocorrelation, self-similarity, epochal
behavior
Host load is predictable
1 to 30 second timeframe
Simple linear models are sufficient
Recommend AR(16) or better
Predictions are useful
Can compute effective CIs from them

29
Host Load Traces

DEC Unix 5 second exponential average
Full bandwidth captured (1 Hz sample rate)
Long durations

30
If Host Load Was Random (White Noise)...
Time domain
Autocorrelation
Spectrogram
Frequency domain
31
Host Load Has Exploitable Structure
Time domain
Autocorrelation
Spectrogram
Frequency domain
32
Linear Time Series Models
Pole-zero / state-space models capture
autocorrelation parsimoniously
(2000 sample fits, largest models in study, 30
secs ahead)
33
Evaluation Methodology

Ran 190,000 randomly chosen testcases on the
traces
Evaluate models independently of
prediction/evaluation framework
No monitoring
30 testcases per trace, model class, parameter
set
Data-mine results

Offline and online systems implemented using RPS
Toolkit
34
Testcases

Models
MEAN, LAST/BM(32)
Randomly chosen model from AR(1..32), MA(1..8),
ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8),
ARFIMA(1..8,d,1..8)

35
Evaluating a Testcase
Measurements in Fit Interval
Model Type
ltzt-m,..., zt-2 , zt-1gt
Modeler
zt1,t1w
zt2,t2w
zt,tw
...
Model
...
...
...
zt1,t3
zt2,t4
Measurements in Test Interval
zt,t2
...
zt1,t2
zt2,t3
Load Predictor
zt,t1
ztn-1,, zt1 , zt
...
Prediction Stream
Error Estimates
Characterization of variation
Evaluator
One-time use
Measurement of variation
Production
Stream
Error Metrics
36
Measured Prediction Variance Mean Squared Error
zt1,t1w
zt2,t2w
zt,tw
...
w step ahead predictions
...
...
...
...
Load Predictor
zt1,t3
zt2,t4
zt,t2
, zt1 , zt
...
2 step ahead predictions
zt1,t2
zt2,t3
zt,t1
...
1 step ahead predictions
s2z
(m - zti)2
Variance of z
s2aw
w step ahead mean squared error
...
...
s2a2
2 step ahead mean squared error
(zti,ti1 - zti1 )2
s2a1
1 step ahead mean squared error
Good Load Predictor s2a1, s2a2 ,,s2aw ltlt s2z
37
Unpaired Box Plot Comparisons
Inconsistent low error
Consistent high error
97.5
Mean Squared Error
75
Consistent low error
Mean
50
25
Model A
Model B
Model C
2.5
Good models achieve consistently low error
38
1 second Predictions, All Hosts
97.5
75
Mean
50
25
2.5
Predictive models clearly worthwhile
39
30 second Predictions, All Hosts
97.5
75
Mean
50
25
2.5
Predictive models clearly beneficial even at long
prediction horizons
40
30 Second Predictions, High Load, Dynamic Host
97.5
75
Mean
50
25
2.5
Predictive models clearly worthwhile Begin to see
differentiation between models
41
Outline