Title: Performance Models for Parallel Applications in Grids
1Performance Models for Parallel Applications in
Grids
Presented By Sanjay.H.A Ph.D Student
- Research Supervisor
- Dr. Sathish Vadhiyar
Grid Application Research Lab Super Computer
Education and Research Center Indian Institute of
Science
2Outline
- Introduction
- Rational Polynomial Model (ICPP 2006)
- Motivation and Objective
- Proposed Model
- Experiments and Results
- Limitations
- Adaptive Performance Model
- Motivation
- Automation Procedure
- Experiments and Results
- Procedure for reducing the Modeling Overheads
- Conclusions and Future Work
3Introduction
- Performance Model
- Predicts Execution time of an application
- provide insight into the performance
relationships between an application and the
system used for execution - Serves as objective function for system Resource
Negotiator/Scheduler - Identification of Bottlenecks in Applications and
System - Fine tuning of Algorithms
- Approaches
- Trace-event Simulation
- Parameterized analytical models
- Curve-fitting models
4Outline
- Introduction
- Rational Polynomial Model (ICPP 2006)
- Motivation and Objective
- Proposed Model
- Experiments and Results
- Limitations
- Adaptive Performance Model
- Motivation
- Automation Procedure
- Experiments and Results
- Procedure for reducing the Modeling Overheads
- Conclusions and Future Work
5Rational Polynomial Model
- Presented at ICPP 2006 held at Columbus, Ohio,
USA. - Motivation
- Grid / Distributed Environment
- Provides resource required to execute a
scientific application - Shared by multiple users
- Performance of application may be impacted in
dynamic and often unpredictable ways - Scheduler needs Performance model to match the
applications and machines - Curve-fitting modeling strategies needs lesser
information about application - Existing Curve-fitting Modeling Strategies assume
uniform Loading Conditions - This Assumption is Unrealistic in
Non-Dedicated Environments
6Objective
- To Analyze and Build a Curve-fitting model that
attempts to predict execution times for any load
conditions that may exist on the systems during
application execution.
7Rational Polynomial Model (cont ..)
- We evaluate the model with the help of ScalaPACK
eigen value problem - Measurement and prediction tool
- Network Weather Service (NWS)
- Loading Parameters
- Available CPU
- Available Bandwidth
- Models were evaluated and compared based on
- Average percentage prediction errors
- Trend shown by the actual and predicted values
8Proposed Model
- General Model
- Computation and Communication Complexities
- Predicting execution times of generic parallel
application - Non-dedicated environments with large load
dynamics - Measure available CPUs and bandwidth at periodic
intervals from the beginning to end of the
application execution - Take average of CPUs and bandwidth collected for
each machine - Avg_avail_cpu avg_avail_band
- Take minimum of all the avg_avail_cpu and
avg_avail_band - Min_avg_avail_cpu min_avg_avail_band
9Proposed Model(cont..)
- Training
- Problem size, min_avg_avail_cpu ,
min_avg_avail_band and Execution time - Prediction
- Min_avg_avail_cpu and min_avg_avail_band cannot
be measured before the application execution - Forecast the min_avg_avail_cpu and
min_avg_avail_band values based on the history - NWS forecasting tool
10Proposed Model(cont..)
- Coefficients
- Polynomial fit using data points in the training
range - Linear regression problem
- Axb
- Prediction of execution time
- Coefficients with system and application
parameters are used in the model.
11Experiment and Results
- Applications
- ScaLAPACK parallel eigen value problem
- Parallel Conjugate Gradient (CG) from NPB
- Parallel FFT application from FFTW package
- System Specifications
- 8-processor Intel Pentium IV
- 32-processor IBM P720
- Monitoring
- Available CPUs of the nodes and available
bandwidths of inter node links are collected
every 2 minutes - Load Dynamics
- Random CPU and Network loading during application
execution
12Load Dynamics
Network Load
CPU Load
13ScaLAPACK Eigen Value
- Intel System
- Computation complexity cubic
- Communication complexity quadratic
- Training 3000 7000
- Prediction 7500-12000
- Compared our model with 3 different models
14Prediction for Eigen value problem on 8 Intel
Processors
Our Model 11.86 Prophesy
16.42 Type1 Multi-variate 20.15 Type 2
Multi-variate 13.84
Avg. Percentage Prediction Error on IBM System
15
15Limitations
- Model does not considers number of processors as
input - User has to provide approximate complexities for
the model - Data required to train the model is large
- Performance model considered here is static
16Outline
- Introduction
- Rational Polynomial Model (ICPP 2006)
- Motivation and Objective
- Proposed Model
- Experiments and Results
- Limitations
- Adaptive Performance Model
- Motivation
- Automation Procedure
- Experiments and Results
- Procedure for reducing the Modeling Overheads
- Conclusions and Future Work
17Adaptive Performance Model
- Motivation
- User is unaware of characteristics of parallel
application , which he/she submits to grid. - User submits the application with different
problem size and different number of processor at
different time - Scheduler has to take dynamic scheduling decision
efficiently. - Performance model has to take care of change of
system parameters dynamically . -
- Automatically determines approximate complexities
and builds - Adaptive Performance Model
18Automation Procedure
Prediction and Model Evaluation phase
19Automation Procedure Cont..
- Metric for Goodness of fit
- Residual Sum of Square
- Error Variance
- Standard Error
- Training Experiments
- Design training range in terms of problem size
- Conduct training experiments on single processor
for single processor training range - Conduct training experiments on 2,4,and 8
processors for a designed training range -
20Automation Procedure Cont..
- Computation Modeling in terms of problem size
- Input Predefined function, CPU functions and
single processor training data. - Evaluate the all predefined functions and CPU
combination with training data - Top Computation functions are chosen considering
the metric standard error. - Output List of top Computation Complexity in
terms of problem - size and CPU
- Communication Modeling in terms of problem size
- Input Predefined functions, Bandwidth
functions, Computation
complexity list, 2 processor training data - Fixing Computation complexity, evaluate
predefined communication functions and bandwidth
combinations - Top computation and communication models are
chosen considering the metric standard error. - Output List of top Computation and
Communication Complexities with
problem size, CPU and bandwidth
21Automation Procedure Cont..
- Tuning the Model in terms of number of processor
- Input Processor functions, 2,4 and 8 processor
training data, Top Computation and
Communication Models - Evaluating the computation and communication
Models with processor functions. - Top computation model with processor functions
are chosen Considering the metric standard error. - Fixing the Computation model with problem size,
CPU and processor , top communication models with
processor functions are chosen - Output List of top Computation and
Communication Complexities with problem
size,processor, CPU and bandwidth
22Automation Procedure Cont..
- Prediction and Model evaluation Phase
- User Input problem size, Number of processor
- Min_avg_avail_cpu and min_avg_avail_bandwidth are
predicted using NWS forecaster by feeding past
cpu and bandwidth values - Execution time is predicted using top ranked
model. - After execution of that task, data will be added
to training set - To address the dynamic change of system
parameters we evaluate the model list with
training data and change the model rankings - For every function evaluation we are deleting
least ranked functions using percentile
technique. - Automatically removing the training data which
contains anomalies. -
23Experiments and Results
- Applications
- ScaLAPACK parallel eigen value problem
- Parallel Conjugate Gradient (CG)
- Parallel FFT application from FFTW package
- Integer sort
- Molecular Dynamics
- Poisson Equation in 2-D jacobi Decomposition
- All pair shortest path
- 8-processor Intel Pentium IV
- 200 random problem size and processor pair are
chosen for prediction
24Experiments and Results Cont..
Molecular Dynamics 15
Integer Sort 11
25Procedure for reducing the Modeling Overheads
- Previous technique 3 hrs (18000 iterations)
- Procedure
- Predefined functions 4 groups
- List of top functions
- 3 Representative Models for Last three groups
- Template final list for 3 representative models
- Current Technique 56 Min (1/3 of the previous
technique) - Results
- Parallel FFT 20
- Parallel Integer Sort 11
- Computation Phase
- Initially Polynomial and logarithmic functions
are evaluated - Prepare the list of top functions whose Std. Err.
is within the threshold limit - Remaining predefined functions are splitted into
3 groups that belongs to 3 polynomial models. - If the representative model is in the list then
that function group will be evaluated
26Procedure for reducing the Modeling Overheads
- Communication Phase
- For the computation model in the list , evaluate
all polynomial and logarithmic communication
functions with bandwidth functions, - Top function list
- Template bandwidth functions list will be ready
for 3 representative models - If the representative model is in the list then
that group will be evaluated with template
bandwidth function - Template final list will be ready if computation
model belongs to any of 3 polynomial models - Next computation model will be evaluated. If its
template final list is ready, directly template
final list will be evaluated. - Model in terms of Processors
- For each comp_comm_model , evaluate with
processor functions - If the computation and communication models
belongs to representative models then prepare the
template list for that combination - For the next comp_comm_model if the template list
is ready, then only those functions will be
evaluated.
27Outline
- Introduction
- Rational Polynomial Model (ICPP 2006)
- Motivation and Objective
- Proposed Model
- Experiments and Results
- Limitations
- Adaptive Performance Model
- Motivation
- Automation Procedure
- Experiments and Results
- Procedure for reducing the Modeling Overheads
- Conclusions and Future Work
28Conclusion
- We developed the automatic prediction technique
that - Determines approximate complexities for the model
- Adapts Performance Model for any loading
condition - Satisfies accuracy.
- We evaluated our technique with 7 parallel
application - We also proposed the technique to reduce the
overhead of automation technique. - In all cases the model gave good predictions of
execution times
29Future Work
- To Develop systematic methods to determine the
approximate complexities of any multiple
parameter application automatically - To develop a smart scheduler which takes
efficient decision based on our performance model - To Augment our techniques for predicting
execution time for complex multi-phase and
multi-component applications - To Extend our work to include I/O related
parameters for predicting the behavior of I/O
intensive scientific application.
30References
- Jay Yagnik, H.A Sanjay, Sathish Vadhiyar.
Performance modeling based on multidimensional
surface learning for performance prediction of
parallel application in Non-Dedicated
enviornament, ICPP-2006,pages-513-520,
Columbus,Ohio,USA - V. Taylor, X. Wu, J. Geisler, X. Li, Z. Lan, M.
Hereld, I. Judson, and R. Stevens. Prophesy
Automating the Modeling Process. In Proceedings
of the Third Annual International Workshop on
Active Middleware Services, pages 3-11, Tokyo,
Japan, August 2001. - V. Taylor, X. Wu, and R. Stevens. Prophesy An
Infrastructure for Performance Analysis and
Modeling of Parallel and Grid Applications. ACM - SIGMETRICS Performance Evaluation Review,
30(4)13-18, March 2003. - J. Schopf. Structural Prediction Models for
High-Performance Distributed Applications. In
Proceedings of the Cluster Computing Conference - (CCC '97), Atlanta, U.S.A., March 1997.
- J. Schopf and F. Berman. Using Stochastic
Information to Predict Application Behavior on
Contended Resources. International Journal on
Foundation in Computer Science, 12(3)341-364,
June 2001. - P. Dinda. Online Prediction of the Running Time
of Tasks. In Proceedings of the 10th IEEE
International Symposium on High Performance
Distributed - Computing (HPDC-10'01), pages 383-394, San
Francisco, U.S.A., August 2001. - DataFit. http//www.curvefitting.com/datafit.htm
31THANKYOU !
Questions ???