Improving Grid computing performance prediction using weighted templates - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Improving Grid computing performance prediction using weighted templates

Description:

Data-driven: analyze pools of data, accumulated over periods of time, e.g., Smith, Gibbons. ... Gibbon ... Reuse gibbon's idea. Any possible template Predictor ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 19

Provided by: nes5

Category:

more less

Transcript and Presenter's Notes

Title: Improving Grid computing performance prediction using weighted templates

1
Improving Grid computing performance prediction
using weighted templates

Ariel Goyeneche
Centre for Parallel Computing,
Cavendish School of Informatics
University of Westminster
London
goyenea_at_wmin.ac.uk

2
Performance prediction solutions

Solution migrated from traditional computing
instrumentation of applications and resources.
Grid simulation solutions where developed with
the idea of understanding aspects of Grid
computing environments.
Decision Support Systems
Model-driven pre-programmed model relating
various parameters, e.g., Gamma Model, Downey.
Data-driven analyze pools of data, accumulated
over periods of time, e.g., Smith, Gibbons.

3
Issues in performance prediction using
data-driven systems

Define similarity

R
R ?
R
R
R
R
R
R
R
R
R
R
4
Define similarity ?

Grid Resources may be compared
CPU, Memory, Network, etc.
Grid Services similarity is a bit harder.
Can be compared in different ways and using
different parameters
name, submitting user, number of nodes requested,
etc.
More cryptic parameters
Executable size, size input files, etc

5
Similarity normal distribution

Similar if in a normal distribution
The execution times are Normally distributed
about an actual mean.
But
Same Service can have different execution time if
parameters change
Question
What are the parameters that produce a normal
distribution?
Answer
Lets test all possible combination of parameters
(Templates)

6
Similarity normal distribution
R ?
R
R
R
R
R
R
R
R
7
Similarity confidence interval

Confidence interval characteristics in job
workloads.
It produces the best fit (among all type of
combinations of parameters) in order to
characterize a job
A narrow interval indicate that a job can be
classified using a given template.
The most stable set of parameters are offering
the less narrow confidence interval

8
Examples

Gibbon
introduced the idea of templates compose of
different characteristics to group similar jobs
templates for identifying good patterns for a
given workload and statistical estimators
Template Predictor
(u, e, n, age) Mean
(u, e) Lineal regression
(e, n , age) Mean
(e) Lineal regression
(n, age) Mean
() Lineal regression
But
Definition of similarity uses very few
characteristics and therefore produces
misjudgements.
Only for malleable jobs

9
Examples

Smith
Reuse gibbons idea
Any possible template Predictor
(u, e, n, q) Mean/ Lineal regression
(u, e, n ) Mean/ Lineal regression
(u, e) Mean/ Lineal regression
() Mean/ Lineal regression
etc Mean/ Lineal regression
Comparison
This experiments Smith showed that Smiths
solutions performed between 4 to 46 percentages
better than Gibbsons solution

10
Prediction in the NGS

Recorddate, gridnode, jobId, jobState, jobName,
jobOwner, gridjobOwner, resourcesUsedWalltime,
Queue, ctime, execHost, etc.

Grid Node From Date To Date Entries
leeds.ac.uk 10/07/2006 19/10/2006 99263
oesc.ox.ac.uk 10/07/2006 19/10/2006 767630
man.ac.uk 10/07/2006 10/10/2006 66747
rl.ac.uk 10/07/2006 19/10/2006 122135
11
Smiths prediction in NGS

Results in Grid environments
Template WallTime (s) Error (s) Job
(u) 311.07 178.12 4444
(e-u) 30.77 41.97 2093
(u-gn) 94.81 30.92 2783
Smiths problems in Grid computing
46 of the jobs have names or identifications
assigned by default by the Grid middleware
14 of the jobs are either the same executable
using slightly different names or the same name
for different executables.

12
Grid environment characteristics

Parameters that are not always normalized in Grid
environments. For instance, Grid users not always
provide a unique job name or identification
across several Grid nodes.
Parameters that are hidden or not shown. A common
Grid user routine is to include in the executable
script the set of parameters.
Even though if the identification of jobs and
publication of parameters can be solved, the use
of only the walltime mean with the smallest
confident of all possible templates may produce
the grouping of jobs that are not related to each
other, rather than by this function, and
therefore generate misjudgements in future
predictions..

13
Grid environment restriction

Parameter classifications
Binding group
Job name (e),
Grid user (u)
List of parameters (p)
Extended group
Queue (q)
Grid Node (gn)
Etc

14
New prediction approach

Use normal distribution and templates
But incorporate the accuracy level concept
Weighting of templates regarding how accurate
they are along the time (after submission, the
best template Weighed)
Therefore
1) All templates starts with the same accuracy
level
2) Similarity is redefined taking into account
the most accurate templates
3) Among them, (if more than 1) confidence
interval is used (as explained before)
4) When submission is finished Weighting of best
templates is done.

15
Prediction algorithm

Prediction face Given a new job submission
Divide the job characteristics into two sets as
described before.
For each possible combination of templates
compose of characteristics from the first set
(Excluding the empty template)
Select from historical information the level of
accuracy for each template
If a template does not have any accuracy level,
include it with level 0
From all possible templates from point 2, select
all templates with the highest prediction
accuracy.
If the selection produces only one template
Calculate the Mean of the walltime
Otherwise, for each selected template
Extend them using all possible combination of
characteristics belonging to the second set and
apply the dynamic template algorithm (reference)
to all of them.
Select the template with smallest confident
interval and calculate the Mean of the walltime
Otherwise, select the mean of the template with
highest prediction accuracy as a prediction
result.
Incorporation of prediction accuracy face Once
the job complete execution
Select the closest mean from all calculated
templates and increase the prediction accuracy by
one.

16
Results
Templates
Weighted templates
T AverageWallTime Error Entries
(e-u)(gn-q) 458.66 30.95 396
(e-u)(gn-q) 350.20 19.76 333
(e-u-p)(gn-q-n) 150.96 22.84 210
T AverageWallTime Error Entries
(e) 379.66 45.35 455
(n) 79.48 93.62 363
(q) 494.33 47.11 294
17
Results
- Templates
- Weighted templates
- Accuracy level starts to change - Confidence
interval is less used
- Accuracy level is well defined - Confidence
interval is hardly used
- Same accuracy level - Confidence interval is
used
18
Conclusion

In this paper the data-driven decision support
systems for performance prediction using normal
distribution, mean and templates was tested in a
production Grid environment.
This research shows than defining similarity
using two set of characteristics, a binding first
level that concentrate only the relevant
parameters and a dynamic second level that uses
the reminder of the characteristics.
If a weight function based on historical
prediction accuracy is applied to templates, the
performance was improved in 54.
Pending issues
Different prediction functions within each
similar set of data
Ageing-related queries
Minimum and maximum number of records in a
similar group