The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

About This Presentation
Title:

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

Description:

The Running Time Advisor. A Resource Signal-based Approach to ... Which host should the application send the task to so that its running time is appropriate? ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications


1
The Running Time AdvisorA Resource Signal-based
Approach to Predicting Task Running Time and Its
Applications
  • Peter A. Dinda
  • Carnegie Mellon University
  • http//www.cs.cmu.edu/pdinda

2
High Level Goals
Build systems that use statistics to help
distributed applications adapt to highly variable
resource availability Focus on information
  • Application-level performance predictions
  • Running time of compute-bound tasks
  • Adaptation advice
  • Host selection to meet soft real-time deadline
  • Resource signal approach
  • Host load signals

This Talk
3
Outline
  • Birds eye view
  • Adapting to highly variable resource availability
  • Dv/QuakeViz
  • Real-time scheduling advisor
  • Running time advisor
  • Confidence intervals
  • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
  • Traces, structure, linear models, evaluation
  • RPS Toolkit
  • Conclusion

4
A Universal Challenge in High Performance
Distributed Applications
  • Highly variable resource availability
  • Shared resources
  • No reservations
  • No globally respected priorities
  • Competition from other users - background
    workload
  • Running time can vary drastically
  • Adaptation

5
A Universal Problem
Which host should the application send the task
to so that its running time is appropriate?
?
Task
Known resource requirements
What will the running time be if I...
6
DV Framework For Distributed Interactive
Visualization
  • Large datasets (e.g., earthquake simulations)
  • Distributed VTK visualization pipelines
  • Active frames
  • Encapsulate data, computation, path through
    pipeline
  • Launched from server by user interaction
  • Annotated with deadline
  • Dynamically chose on which host each pipeline
    stage will execute and what quality settings to
    use

http//www.cs.cmu.edu/dv
7
Example DV Pipeline for QuakeViz
local display and user
Logical View
resolution
contours
ROI
interpolation
isosurface extraction
Simulation Output
reading
rendering
scene synthesis
interpolation
morphology reconstruction
Physical View
interpolation
isosurface extraction
scene synthesis
deadline
deadline
deadline
Active Frame n2
Active Frame n1
Active Frame n
?
?
?
8
Real-time Scheduling Advisor
  • Distributed interactive applications
  • Examples CMU Dv/QuakeViz, BBN OpenMap
  • Assumptions
  • Sequential tasks initiated by user actions
  • Aperiodic arrivals
  • Resilient deadlines (soft real-time)
  • Compute-bound tasks
  • Known computational requirements
  • Best-effort semantics
  • Recommend host where deadline is likely to be met
  • Predict running time on that host
  • No guarantees

9
Running Time Advisor
Predicted Running Time
Application notifies advisor of tasks
computational requirements (nominal time) Advisor
predicts running time on each host Application
assigns task to most appropriate host
?
Task
nominal time
10
Real-time Scheduling Advisor
Application notifies advisor of tasks
computational requirements (nominal time) and its
deadline Advisor acquires predicted task running
times for all hosts Advisor recommends one of the
hosts where the deadline can be met
Predicted Running Time
deadline
?
Task
nominal time
deadline
11
Variability and Prediction
Prediction
resource
High Resource Availability Variability
t
Low Prediction Error Variability
Predictor
resource
error
t
t
Characterization of variability
ACF
t
Exchange high resource availability
variability for low prediction error variability
and a characterization of that variability
12
Confidence Intervals to Characterize Variability
3 to 5 seconds with 95 confidence
Application specifies confidence level (e.g.,
95) Running time advisor predicts running times
as a confidence interval (CI) Real-time
scheduling advisor chooses host where CI is less
than deadline CI captures variability to the
extent the application is interested in it
Predicted Running Time
deadline
?
Task
nominal time
deadline
95 confidence
13
Confidence Intervals And Predictor Quality
Bad Predictor No obvious choice
Good Predictor Two good choices
Predicted Running Time
Predicted Running Time
deadline
Good predictors provide smaller CIs Smaller CIs
simplify scheduling decisions
14
Overview of Research Results
  • Predicting CIs is feasible
  • Host load prediction using AR(16) models
  • Running time estimation using host load
    predictions
  • Predicting CIs is practical
  • RPS Toolkit (inc. in CMU Remos, BBN QuO)
  • Extremely low-overhead online system
  • Predicting CIs is useful
  • Performance of real-time scheduling advisor

Measured performance of real system
Statistically rigorous analysis and evaluation
15
Experimental Setup
  • Environment
  • Alphastation 255s, Digital Unix 4.0
  • Workload host load trace playback
  • Prediction system on each host
  • Tasks
  • Nominal time U(0.1,10) seconds
  • Interarrival time U(5,15) seconds
  • Methodology
  • Predict CIs / Host recommendations
  • Run task and measure

16
Predicting CIs is Feasible
Near-perfect CIs on typical hosts
3000 randomized tasks
17
Predicting CIs is Practical - RPS System
lt2 of CPU At Appropriate Rate
1-2 ms latency from measurement to
prediction 2KB/sec transfer rate
18
Predicting CIs is Useful - Real-time Scheduling
Advisor
Host With Lowest Load
Predicted CI lt Deadline
Random Host
16000 tasks
19
Predicting CIs is Useful - Real-time Scheduling
Advisor
Predicted CI lt Deadline
Host With Lowest Load
Random Host
16000 tasks
20
Outline
  • Birds eye view
  • Adapting to highly variable resource availability
  • Dv/QuakeViz
  • Real-time scheduling advisor
  • Running time advisor
  • Confidence intervals
  • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
  • Traces, structure, linear models, evaluation
  • RPS Toolkit
  • Conclusion

21
Design Space
Can the gap between the resources and the
application can be spanned? yes!
22
Resource Signals
  • Characteristics
  • Easily measured, time-varying scalar quantities
  • Strongly correlated with resource availability
  • Periodically sampled (discrete-time signal)
  • Examples
  • Host load (Digital Unix 5 second load average)
  • Network flow bandwidth and latency

Leverage existing statistical signal analysis and
prediction techniques
23
RPS Toolkit
  • Extensible toolkit for implementing resource
    signal prediction systems
  • Easy buy-in for users
  • C and sockets (no threads)
  • Prebuilt prediction components
  • Libraries (sensors, time series, communication)
  • Users have bought in
  • Incorporated in CMU Remos, BBN QuO
  • Research users Bruce Lowekamp, Nancy Miller,
    LeMonte Green

http//www.cs.cmu.edu/pdinda/RPS.html
24
Prototype System
RPS components can be composed in other ways
25
Research Results
  • Host load on real hosts has exploitable structure
  • Strong autocorrelation, self-similarity, epochal
    behavior
  • Trace database and host load trace playback
  • Host load is predictable using simple linear
    models
  • Recommendation AR(16) models or better for 1-30
    sec predictions
  • RPS Toolkit for low overhead systems (lt2 of CPU)
  • C, ported to 5 OSes, incorporated in CMU Remos,
    BBN QuO
  • Running time CIs can be computed from load
    predictions
  • Load discounting, error covariances
  • Effective real-time scheduling advice can be
    based on CIs
  • Know if deadline will be met before running task

26
Outline
  • Birds eye view
  • Adapting to Highly variable resource availability
  • Dv/QuakeViz
  • Real-time scheduling advisor
  • Running time advisor
  • Confidence intervals
  • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
  • Traces, structure, linear models, evaluation
  • RPS Toolkit
  • Conclusion

27
Questions
  • What are the properties of host load?
  • Is host load predictable?
  • What predictive models are appropriate?
  • Are host load predictions useful?

28
Overview of Answers
  • Host load exhibits complex behavior
  • Strong autocorrelation, self-similarity, epochal
    behavior
  • Host load is predictable
  • 1 to 30 second timeframe
  • Simple linear models are sufficient
  • Recommend AR(16) or better
  • Predictions are useful
  • Can compute effective CIs from them

29
Host Load Traces
  • DEC Unix 5 second exponential average
  • Full bandwidth captured (1 Hz sample rate)
  • Long durations

30
If Host Load Was Random (White Noise)...
Time domain
Autocorrelation
Spectrogram
Frequency domain
31
Host Load Has Exploitable Structure
Time domain
Autocorrelation
Spectrogram
Frequency domain
32
Linear Time Series Models
Pole-zero / state-space models capture
autocorrelation parsimoniously
(2000 sample fits, largest models in study, 30
secs ahead)
33
Evaluation Methodology
  • Ran 190,000 randomly chosen testcases on the
    traces
  • Evaluate models independently of
    prediction/evaluation framework
  • No monitoring
  • 30 testcases per trace, model class, parameter
    set
  • Data-mine results

Offline and online systems implemented using RPS
Toolkit
34
Testcases
  • Models
  • MEAN, LAST/BM(32)
  • Randomly chosen model from AR(1..32), MA(1..8),
    ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8),
    ARFIMA(1..8,d,1..8)

35
Evaluating a Testcase
Measurements in Fit Interval
Model Type
ltzt-m,..., zt-2 , zt-1gt
Modeler
zt1,t1w
zt2,t2w
zt,tw
...
Model
...
...
...
zt1,t3
zt2,t4
Measurements in Test Interval
zt,t2
...
zt1,t2
zt2,t3
Load Predictor
zt,t1
ztn-1,, zt1 , zt
...
Prediction Stream
Error Estimates
Characterization of variation
Evaluator
One-time use
Measurement of variation
Production
Stream
Error Metrics
36
Measured Prediction Variance Mean Squared Error
zt1,t1w
zt2,t2w
zt,tw
...
w step ahead predictions
...
...
...
...
Load Predictor
zt1,t3
zt2,t4
zt,t2
, zt1 , zt
...
2 step ahead predictions
zt1,t2
zt2,t3
zt,t1
...
1 step ahead predictions
s2z
(m - zti)2
Variance of z
s2aw
w step ahead mean squared error
...
...
s2a2
2 step ahead mean squared error
(zti,ti1 - zti1 )2
s2a1
1 step ahead mean squared error
Good Load Predictor s2a1, s2a2 ,,s2aw ltlt s2z
37
Unpaired Box Plot Comparisons
Inconsistent low error
Consistent high error
97.5
Mean Squared Error
75
Consistent low error
Mean
50
25
Model A
Model B
Model C
2.5
Good models achieve consistently low error
38
1 second Predictions, All Hosts
97.5
75
Mean
50
25
2.5
Predictive models clearly worthwhile
39
30 second Predictions, All Hosts
97.5
75
Mean
50
25
2.5
Predictive models clearly beneficial even at long
prediction horizons
40
30 Second Predictions, High Load, Dynamic Host
97.5
75
Mean
50
25
2.5
Predictive models clearly worthwhile Begin to see
differentiation between models
41
Outline
  • Birds eye view
  • Adapting to highly variable resource availability
  • Dv/QuakeViz
  • Real-time scheduling advisor
  • Running time advisor
  • Confidence intervals
  • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
  • Traces, structure, linear models, evaluation
  • RPS Toolkit
  • Conclusion

42
Related Work
  • Distributed interactive applications
  • QuakeViz/ Dv, Aeschlimann PDPTA99
  • Quality of service
  • QuO, Zinky, Bakken, Schantz TPOS, April 97
  • QRAM, Rajkumar, et al RTSS97
  • Distributed soft real-time systems
  • Lawrence, Jensen assorted
  • Workload studies for load balancing
  • Mutka, et al PerfEval 91
  • Harchol-Balter, et al SIGMETRICS 96
  • Resource signal measurement systems
  • Remos HPDC98
  • Network Weather Service HPDC97, HPDC99
  • Host load prediction
  • Wolski, et al HPDC99 (NWS)
  • Samadani, et al PODC95
  • Hailperin 93
  • Application-level scheduling
  • Berman, et al HPDC96

43
Conclusions
  • Help applications adapt tohighly variable
    resource availability
  • Resource signal prediction
  • Predict running times as confidence intervals
  • Predicting CIs is feasible
  • Host load prediction using AR(16) models
  • Running time estimation using host load
    predictions
  • Predicting CIs is practical
  • RPS Toolkit (inc. in CMU Remos, BBN QuO)
  • Extremely low-overhead online system
  • Predicting CIs is useful
  • Performance of real-time scheduling advisor

44
Future Work
  • New resource signals
  • Network bandwidth and latency (Remos)
  • New prediction approaches
  • Wavelets, nonlinearity, cointegration
  • Resource scheduler models
  • Better Unix scheduler model
  • Network models
  • Adaptation advisors
  • Applications and workloads
  • DV/QuakeViz, GIMP, Instrumentation

45
Tools/Venues for Future work
  • Resource signal methodolgy
  • RPS Toolkit
  • Remos
  • QuakeViz/DV
  • Grid Forum

46
Future Work (Long Term)
  • Experimental computer science research
  • Application-oriented view
  • Measurement studies and analysis
  • Statistical approach
  • Application services
  • Systems building

systems X applications X statistics
47
Teaching
  • Signals, systems, and statistics for computer
    scientists
  • Performance data analysis
  • Introduction to computer systems

48
Response of Typical AR(16)
49
Response of AR(1024)
Write a Comment
User Comments (0)