Improving Grid computing performance prediction using weighted templates - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Improving Grid computing performance prediction using weighted templates

Description:

Data-driven: analyze pools of data, accumulated over periods of time, e.g., Smith, Gibbons. ... Gibbon ... Reuse gibbon's idea. Any possible template Predictor ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 19
Provided by: nes5
Category:

less

Transcript and Presenter's Notes

Title: Improving Grid computing performance prediction using weighted templates


1
Improving Grid computing performance prediction
using weighted templates
  • Ariel Goyeneche
  • Centre for Parallel Computing,
  • Cavendish School of Informatics
  • University of Westminster
  • London
  • goyenea_at_wmin.ac.uk

2
Performance prediction solutions
  • Solution migrated from traditional computing
    instrumentation of applications and resources.
  • Grid simulation solutions where developed with
    the idea of understanding aspects of Grid
    computing environments.
  • Decision Support Systems
  • Model-driven pre-programmed model relating
    various parameters, e.g., Gamma Model, Downey.
  • Data-driven analyze pools of data, accumulated
    over periods of time, e.g., Smith, Gibbons.

3
Issues in performance prediction using
data-driven systems
  • Define similarity

R
R ?
R
R
R
R
R
R
R
R
R
R
4
Define similarity ?
  • Grid Resources may be compared
  • CPU, Memory, Network, etc.
  • Grid Services similarity is a bit harder.
  • Can be compared in different ways and using
    different parameters
  • name, submitting user, number of nodes requested,
    etc.
  • More cryptic parameters
  • Executable size, size input files, etc

5
Similarity normal distribution
  • Similar if in a normal distribution
  • The execution times are Normally distributed
    about an actual mean.
  • But
  • Same Service can have different execution time if
    parameters change
  • Question
  • What are the parameters that produce a normal
    distribution?
  • Answer
  • Lets test all possible combination of parameters
    (Templates)

6
Similarity normal distribution
R ?
R
R
R
R
R
R
R
R
7
Similarity confidence interval
  • Confidence interval characteristics in job
    workloads.
  • It produces the best fit (among all type of
    combinations of parameters) in order to
    characterize a job
  • A narrow interval indicate that a job can be
    classified using a given template.
  • The most stable set of parameters are offering
    the less narrow confidence interval

8
Examples
  • Gibbon
  • introduced the idea of templates compose of
    different characteristics to group similar jobs
  • templates for identifying good patterns for a
    given workload and statistical estimators
  • Template Predictor
  • (u, e, n, age) Mean
  • (u, e) Lineal regression
  • (e, n , age) Mean
  • (e) Lineal regression
  • (n, age) Mean
  • () Lineal regression
  • But
  • Definition of similarity uses very few
    characteristics and therefore produces
    misjudgements.
  • Only for malleable jobs

9
Examples
  • Smith
  • Reuse gibbons idea
  • Any possible template Predictor
  • (u, e, n, q) Mean/ Lineal regression
  • (u, e, n ) Mean/ Lineal regression
  • (u, e) Mean/ Lineal regression
  • () Mean/ Lineal regression
  • etc Mean/ Lineal regression
  • Comparison
  • This experiments Smith showed that Smiths
    solutions performed between 4 to 46 percentages
    better than Gibbsons solution

10
Prediction in the NGS
  • Recorddate, gridnode, jobId, jobState, jobName,
    jobOwner, gridjobOwner, resourcesUsedWalltime,
    Queue, ctime, execHost, etc.

Grid Node From Date To Date Entries
leeds.ac.uk 10/07/2006 19/10/2006 99263
oesc.ox.ac.uk 10/07/2006 19/10/2006 767630
man.ac.uk 10/07/2006 10/10/2006 66747
rl.ac.uk 10/07/2006 19/10/2006 122135
11
Smiths prediction in NGS
  • Results in Grid environments
  • Template WallTime (s) Error (s) Job
  • (u) 311.07 178.12 4444
  • (e-u) 30.77 41.97 2093
  • (u-gn) 94.81 30.92 2783
  • Smiths problems in Grid computing
  • 46 of the jobs have names or identifications
    assigned by default by the Grid middleware
  • 14 of the jobs are either the same executable
    using slightly different names or the same name
    for different executables.

12
Grid environment characteristics
  • Parameters that are not always normalized in Grid
    environments. For instance, Grid users not always
    provide a unique job name or identification
    across several Grid nodes.
  • Parameters that are hidden or not shown. A common
    Grid user routine is to include in the executable
    script the set of parameters.
  • Even though if the identification of jobs and
    publication of parameters can be solved, the use
    of only the walltime mean with the smallest
    confident of all possible templates may produce
    the grouping of jobs that are not related to each
    other, rather than by this function, and
    therefore generate misjudgements in future
    predictions..

13
Grid environment restriction
  • Parameter classifications
  • Binding group
  • Job name (e),
  • Grid user (u)
  • List of parameters (p)
  • Extended group
  • Queue (q)
  • Grid Node (gn)
  • Etc

14
New prediction approach
  • Use normal distribution and templates
  • But incorporate the accuracy level concept
  • Weighting of templates regarding how accurate
    they are along the time (after submission, the
    best template Weighed)
  • Therefore
  • 1) All templates starts with the same accuracy
    level
  • 2) Similarity is redefined taking into account
    the most accurate templates
  • 3) Among them, (if more than 1) confidence
    interval is used (as explained before)
  • 4) When submission is finished Weighting of best
    templates is done.

15
Prediction algorithm
  • Prediction face Given a new job submission
  • Divide the job characteristics into two sets as
    described before.
  • For each possible combination of templates
    compose of characteristics from the first set
    (Excluding the empty template)
  • Select from historical information the level of
    accuracy for each template
  • If a template does not have any accuracy level,
    include it with level 0
  • From all possible templates from point 2, select
    all templates with the highest prediction
    accuracy.
  • If the selection produces only one template
  • Calculate the Mean of the walltime
  • Otherwise, for each selected template
  • Extend them using all possible combination of
    characteristics belonging to the second set and
    apply the dynamic template algorithm (reference)
    to all of them.
  • Select the template with smallest confident
    interval and calculate the Mean of the walltime
  • Otherwise, select the mean of the template with
    highest prediction accuracy as a prediction
    result.
  • Incorporation of prediction accuracy face Once
    the job complete execution
  • Select the closest mean from all calculated
    templates and increase the prediction accuracy by
    one.

16
Results
Templates
Weighted templates
T AverageWallTime Error Entries
(e-u)(gn-q) 458.66 30.95 396
(e-u)(gn-q) 350.20 19.76 333
(e-u-p)(gn-q-n) 150.96 22.84 210
T AverageWallTime Error Entries
(e) 379.66 45.35 455
(n) 79.48 93.62 363
(q) 494.33 47.11 294
17
Results
- Templates
- Weighted templates
- Accuracy level starts to change - Confidence
interval is less used
- Accuracy level is well defined - Confidence
interval is hardly used
- Same accuracy level - Confidence interval is
used
18
Conclusion
  • In this paper the data-driven decision support
    systems for performance prediction using normal
    distribution, mean and templates was tested in a
    production Grid environment.
  • This research shows than defining similarity
    using two set of characteristics, a binding first
    level that concentrate only the relevant
    parameters and a dynamic second level that uses
    the reminder of the characteristics.
  • If a weight function based on historical
    prediction accuracy is applied to templates, the
    performance was improved in 54.
  • Pending issues
  • Different prediction functions within each
    similar set of data
  • Ageing-related queries
  • Minimum and maximum number of records in a
    similar group
Write a Comment
User Comments (0)
About PowerShow.com