Performance Models for Parallel Applications in Grids - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Performance Models for Parallel Applications in Grids

Description:

... relationships between an application and the system used for execution ... with different problem size and different number of processor at different time ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 32
Provided by: sslSercI
Category:

less

Transcript and Presenter's Notes

Title: Performance Models for Parallel Applications in Grids


1
Performance Models for Parallel Applications in
Grids
Presented By Sanjay.H.A Ph.D Student
  • Research Supervisor
  • Dr. Sathish Vadhiyar

Grid Application Research Lab Super Computer
Education and Research Center Indian Institute of
Science
2
Outline
  • Introduction
  • Rational Polynomial Model (ICPP 2006)
  • Motivation and Objective
  • Proposed Model
  • Experiments and Results
  • Limitations
  • Adaptive Performance Model
  • Motivation
  • Automation Procedure
  • Experiments and Results
  • Procedure for reducing the Modeling Overheads
  • Conclusions and Future Work

3
Introduction
  • Performance Model
  • Predicts Execution time of an application
  • provide insight into the performance
    relationships between an application and the
    system used for execution
  • Serves as objective function for system Resource
    Negotiator/Scheduler
  • Identification of Bottlenecks in Applications and
    System
  • Fine tuning of Algorithms
  • Approaches
  • Trace-event Simulation
  • Parameterized analytical models
  • Curve-fitting models

4
Outline
  • Introduction
  • Rational Polynomial Model (ICPP 2006)
  • Motivation and Objective
  • Proposed Model
  • Experiments and Results
  • Limitations
  • Adaptive Performance Model
  • Motivation
  • Automation Procedure
  • Experiments and Results
  • Procedure for reducing the Modeling Overheads
  • Conclusions and Future Work

5
Rational Polynomial Model
  • Presented at ICPP 2006 held at Columbus, Ohio,
    USA.
  • Motivation
  • Grid / Distributed Environment
  • Provides resource required to execute a
    scientific application
  • Shared by multiple users
  • Performance of application may be impacted in
    dynamic and often unpredictable ways
  • Scheduler needs Performance model to match the
    applications and machines
  • Curve-fitting modeling strategies needs lesser
    information about application
  • Existing Curve-fitting Modeling Strategies assume
    uniform Loading Conditions
  • This Assumption is Unrealistic in
    Non-Dedicated Environments

6
Objective
  • To Analyze and Build a Curve-fitting model that
    attempts to predict execution times for any load
    conditions that may exist on the systems during
    application execution.

7
Rational Polynomial Model (cont ..)
  • We evaluate the model with the help of ScalaPACK
    eigen value problem
  • Measurement and prediction tool
  • Network Weather Service (NWS)
  • Loading Parameters
  • Available CPU
  • Available Bandwidth
  • Models were evaluated and compared based on
  • Average percentage prediction errors
  • Trend shown by the actual and predicted values

8
Proposed Model
  • General Model
  • Computation and Communication Complexities
  • Predicting execution times of generic parallel
    application
  • Non-dedicated environments with large load
    dynamics
  • Measure available CPUs and bandwidth at periodic
    intervals from the beginning to end of the
    application execution
  • Take average of CPUs and bandwidth collected for
    each machine
  • Avg_avail_cpu avg_avail_band
  • Take minimum of all the avg_avail_cpu and
    avg_avail_band
  • Min_avg_avail_cpu min_avg_avail_band

9
Proposed Model(cont..)
  • Training
  • Problem size, min_avg_avail_cpu ,
    min_avg_avail_band and Execution time
  • Prediction
  • Min_avg_avail_cpu and min_avg_avail_band cannot
    be measured before the application execution
  • Forecast the min_avg_avail_cpu and
    min_avg_avail_band values based on the history
  • NWS forecasting tool

10
Proposed Model(cont..)
  • Coefficients
  • Polynomial fit using data points in the training
    range
  • Linear regression problem
  • Axb
  • Prediction of execution time
  • Coefficients with system and application
    parameters are used in the model.

11
Experiment and Results
  • Applications
  • ScaLAPACK parallel eigen value problem
  • Parallel Conjugate Gradient (CG) from NPB
  • Parallel FFT application from FFTW package
  • System Specifications
  • 8-processor Intel Pentium IV
  • 32-processor IBM P720
  • Monitoring
  • Available CPUs of the nodes and available
    bandwidths of inter node links are collected
    every 2 minutes
  • Load Dynamics
  • Random CPU and Network loading during application
    execution

12
Load Dynamics
Network Load
CPU Load
13
ScaLAPACK Eigen Value
  • Intel System
  • Computation complexity cubic
  • Communication complexity quadratic
  • Training 3000 7000
  • Prediction 7500-12000
  • Compared our model with 3 different models

14
Prediction for Eigen value problem on 8 Intel
Processors
Our Model 11.86 Prophesy
16.42 Type1 Multi-variate 20.15 Type 2
Multi-variate 13.84
Avg. Percentage Prediction Error on IBM System
15
15
Limitations
  • Model does not considers number of processors as
    input
  • User has to provide approximate complexities for
    the model
  • Data required to train the model is large
  • Performance model considered here is static

16
Outline
  • Introduction
  • Rational Polynomial Model (ICPP 2006)
  • Motivation and Objective
  • Proposed Model
  • Experiments and Results
  • Limitations
  • Adaptive Performance Model
  • Motivation
  • Automation Procedure
  • Experiments and Results
  • Procedure for reducing the Modeling Overheads
  • Conclusions and Future Work

17
Adaptive Performance Model
  • Motivation
  • User is unaware of characteristics of parallel
    application , which he/she submits to grid.
  • User submits the application with different
    problem size and different number of processor at
    different time
  • Scheduler has to take dynamic scheduling decision
    efficiently.
  • Performance model has to take care of change of
    system parameters dynamically .
  • Automatically determines approximate complexities
    and builds
  • Adaptive Performance Model

18
Automation Procedure
Prediction and Model Evaluation phase
19
Automation Procedure Cont..
  • Metric for Goodness of fit
  • Residual Sum of Square
  • Error Variance
  • Standard Error
  • Training Experiments
  • Design training range in terms of problem size
  • Conduct training experiments on single processor
    for single processor training range
  • Conduct training experiments on 2,4,and 8
    processors for a designed training range

20
Automation Procedure Cont..
  • Computation Modeling in terms of problem size
  • Input Predefined function, CPU functions and
    single processor training data.
  • Evaluate the all predefined functions and CPU
    combination with training data
  • Top Computation functions are chosen considering
    the metric standard error.
  • Output List of top Computation Complexity in
    terms of problem
  • size and CPU
  • Communication Modeling in terms of problem size
  • Input Predefined functions, Bandwidth
    functions, Computation
    complexity list, 2 processor training data
  • Fixing Computation complexity, evaluate
    predefined communication functions and bandwidth
    combinations
  • Top computation and communication models are
    chosen considering the metric standard error.
  • Output List of top Computation and
    Communication Complexities with
    problem size, CPU and bandwidth

21
Automation Procedure Cont..
  • Tuning the Model in terms of number of processor
  • Input Processor functions, 2,4 and 8 processor
    training data, Top Computation and
    Communication Models
  • Evaluating the computation and communication
    Models with processor functions.
  • Top computation model with processor functions
    are chosen Considering the metric standard error.
  • Fixing the Computation model with problem size,
    CPU and processor , top communication models with
    processor functions are chosen
  • Output List of top Computation and
    Communication Complexities with problem
    size,processor, CPU and bandwidth

22
Automation Procedure Cont..
  • Prediction and Model evaluation Phase
  • User Input problem size, Number of processor
  • Min_avg_avail_cpu and min_avg_avail_bandwidth are
    predicted using NWS forecaster by feeding past
    cpu and bandwidth values
  • Execution time is predicted using top ranked
    model.
  • After execution of that task, data will be added
    to training set
  • To address the dynamic change of system
    parameters we evaluate the model list with
    training data and change the model rankings
  • For every function evaluation we are deleting
    least ranked functions using percentile
    technique.
  • Automatically removing the training data which
    contains anomalies.

23
Experiments and Results
  • Applications
  • ScaLAPACK parallel eigen value problem
  • Parallel Conjugate Gradient (CG)
  • Parallel FFT application from FFTW package
  • Integer sort
  • Molecular Dynamics
  • Poisson Equation in 2-D jacobi Decomposition
  • All pair shortest path
  • 8-processor Intel Pentium IV
  • 200 random problem size and processor pair are
    chosen for prediction

24
Experiments and Results Cont..
Molecular Dynamics 15
Integer Sort 11
25
Procedure for reducing the Modeling Overheads
  • Previous technique 3 hrs (18000 iterations)
  • Procedure
  • Predefined functions 4 groups
  • List of top functions
  • 3 Representative Models for Last three groups
  • Template final list for 3 representative models
  • Current Technique 56 Min (1/3 of the previous
    technique)
  • Results
  • Parallel FFT 20
  • Parallel Integer Sort 11
  • Computation Phase
  • Initially Polynomial and logarithmic functions
    are evaluated
  • Prepare the list of top functions whose Std. Err.
    is within the threshold limit
  • Remaining predefined functions are splitted into
    3 groups that belongs to 3 polynomial models.
  • If the representative model is in the list then
    that function group will be evaluated

26
Procedure for reducing the Modeling Overheads
  • Communication Phase
  • For the computation model in the list , evaluate
    all polynomial and logarithmic communication
    functions with bandwidth functions,
  • Top function list
  • Template bandwidth functions list will be ready
    for 3 representative models
  • If the representative model is in the list then
    that group will be evaluated with template
    bandwidth function
  • Template final list will be ready if computation
    model belongs to any of 3 polynomial models
  • Next computation model will be evaluated. If its
    template final list is ready, directly template
    final list will be evaluated.
  • Model in terms of Processors
  • For each comp_comm_model , evaluate with
    processor functions
  • If the computation and communication models
    belongs to representative models then prepare the
    template list for that combination
  • For the next comp_comm_model if the template list
    is ready, then only those functions will be
    evaluated.

27
Outline
  • Introduction
  • Rational Polynomial Model (ICPP 2006)
  • Motivation and Objective
  • Proposed Model
  • Experiments and Results
  • Limitations
  • Adaptive Performance Model
  • Motivation
  • Automation Procedure
  • Experiments and Results
  • Procedure for reducing the Modeling Overheads
  • Conclusions and Future Work

28
Conclusion
  • We developed the automatic prediction technique
    that
  • Determines approximate complexities for the model
  • Adapts Performance Model for any loading
    condition
  • Satisfies accuracy.
  • We evaluated our technique with 7 parallel
    application
  • We also proposed the technique to reduce the
    overhead of automation technique.
  • In all cases the model gave good predictions of
    execution times

29
Future Work
  • To Develop systematic methods to determine the
    approximate complexities of any multiple
    parameter application automatically
  • To develop a smart scheduler which takes
    efficient decision based on our performance model
  • To Augment our techniques for predicting
    execution time for complex multi-phase and
    multi-component applications
  • To Extend our work to include I/O related
    parameters for predicting the behavior of I/O
    intensive scientific application.

30
References
  • Jay Yagnik, H.A Sanjay, Sathish Vadhiyar.
    Performance modeling based on multidimensional
    surface learning for performance prediction of
    parallel application in Non-Dedicated
    enviornament, ICPP-2006,pages-513-520,
    Columbus,Ohio,USA
  • V. Taylor, X. Wu, J. Geisler, X. Li, Z. Lan, M.
    Hereld, I. Judson, and R. Stevens. Prophesy
    Automating the Modeling Process. In Proceedings
    of the Third Annual International Workshop on
    Active Middleware Services, pages 3-11, Tokyo,
    Japan, August 2001.
  • V. Taylor, X. Wu, and R. Stevens. Prophesy An
    Infrastructure for Performance Analysis and
    Modeling of Parallel and Grid Applications. ACM
  • SIGMETRICS Performance Evaluation Review,
    30(4)13-18, March 2003.
  • J. Schopf. Structural Prediction Models for
    High-Performance Distributed Applications. In
    Proceedings of the Cluster Computing Conference
  • (CCC '97), Atlanta, U.S.A., March 1997.
  • J. Schopf and F. Berman. Using Stochastic
    Information to Predict Application Behavior on
    Contended Resources. International Journal on
    Foundation in Computer Science, 12(3)341-364,
    June 2001.
  • P. Dinda. Online Prediction of the Running Time
    of Tasks. In Proceedings of the 10th IEEE
    International Symposium on High Performance
    Distributed
  • Computing (HPDC-10'01), pages 383-394, San
    Francisco, U.S.A., August 2001.
  • DataFit. http//www.curvefitting.com/datafit.htm

31
THANKYOU !
Questions ???
Write a Comment
User Comments (0)
About PowerShow.com