Title: Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay
1Evaluating Provider Reliability in Risk-aware
Grid BrokeringIain Gourlay
2Outline
- AssessGrid background
- Problem Statement
- Basic Reliability
- Analysis of behaviour
- Stationarity Problem
- Weighted Reliability
- Simulations and Results
- What if a provider is unreliable?
- Alternative Bayesian Inference
- Summary and Conclusions
3AssessGrid Background
- AssessGrid addresses Risk Management in the Grid.
- This is a necessity in the drive towards
commercialisation of Grid technology - The goal is to move beyond best-effort, using
SLAs to specify agreed upon level of service.
However, - For resource providers, offering an SLA with
service guarantees and penalties is a business
risk! - For end-users, agreeing to an SLA is a business
risk! - A large part of AssessGrid is concerned with
methods to support providers with tools and
methods to - Monitor and collect useful data.
- Assess risk associated with accepting an SLA
request, based on this data.
4What is risk?
- Risk is Hazard, danger, exposure to mischance or
peril (Oxford English Dictionary). - Risk Management is a discipline that addresses
the possibility that future events may cause
adverse events. - Economics, Operations Research, Engineering,
Gambling, - In Risk Management, risk is quantified with two
parameters - Risk Probability of Occurrence x Impact
- Grid computing Event is SLA failure!
5Scenario
6Role of the Broker
- Key role Finding/Negotiating with providers on
behalf of end-users. - Broker can also act as an independent party
- Providers may have motivation to lie!
- Providers may have unidentified problems in their
infrastructure. - Here, we assume the broker is independent and
honest. - Broker can give a second opinion on risk
assessments. - Broker can agree its own SLAs (virtual provider).
7Problem statement What do we mean by reliability?
- A provider makes an SLA offer
- includes an estimate of the Probability of
Failure (PoF). - Each time an offer is accepted, the details are
stored in a database, including - Final status (Success/Fail)
- Offered PoF
- The problem is
- Given a providers past data, can their risk
assessments be considered reliable?
8What is reliable?
- Considering only systematic errors!
- Assume s SLAs in the database for the same
provider. - Offered PoFs,
- Assume number of fails
- We define a reliable provider as one that does
not systematically underestimate or overestimate
the PoF, so that
9Is it normal?
10Is it normal? (2)
11Basic Reliability Identifying Systematic Errors
- Using the providers offered PoFs
- The evaluation is based on the following measure
12Basic Reliability Identifying Systematic
Errors(2)
13Basic Reliability Identifying Systematic
Errors(3)
- We note that
- and recall the condition,
leading to
14Analysis How does the measure behave?
- Simple Example
- m SLAs in database.
- Offered PoF is constant, p.
- There is a systematic overestimation/underestimati
on of the PoF, such that -
15Analysis (2)
16Stationarity Problem
- Conditions are not static!
- Example 60 red balls in a bag.
- 40 blue balls in the same bag.
- You try to estimate the number of red balls by
taking a ball out and replacing it, repeating
this 50 times. - Someone is secretly removing a red ball and
replacing it with a blue after every sample. - E(red) 17.5
- Number of reds 10!
17Stationarity Problem(2)
- A providers behaviour could change as a
consequence of a variety of factors, e.g. - A providers infrastructure is updated.
- A providers risk assessment methodology or model
parameterisation may change. - A providers policy may change, for example due
to economic considerations.
18Weighted Reliability
- Use a weighted average, ensuring more recent SLAs
have a larger influence. - Total of mk SLAs are split into k categories,
with the kth consisting of the most recent SLAs. - Here, is the basic measure R over the
ith category.
19Simulations
- A database of SLAs is generated
- Each SLA object has an offered PoF, true Pof and
final status. - Reliability computed.
- Process repeated 10000 times for each scenario.
- Simple case considered here
- Offered PoF is fixed and true PoF is fixed.
20Results
21Results(2)
22Results (3)
23Results(4)
24Results (5)
25What if the provider is unreliable?
- Discrete approximation When SLA Offer received
with offered POF of p, estimate POF by looking at
failure rate for all SLAs with offered POF of p. - Then,
- If (reliability measure lt threshold) Believe
provider. - Else(PoF estimate numFails(POFp)/numSLAs(POFp)
- Use all SLAs with offered PoF within x of the
offered PoF in the current SLA.
26Weighted Average risk assessment
- Split km SLAs into k categories.
- Compute the estimate PoF, for each category,
i0,,k-1. -
27Never Trust Doctors
- You are tested for a disease, which 2 of the
population has. - The test never gives a false-negative.
- If you are clear, there is still a 5 chance of a
false positive. - You test positive.
- What is the probability you have the disease?
28Alternative Approach Bayesian Inference
- The provider offers a linguistic risk assessment,
e.g. the failure probability is - extremely low lt1
- very low 1-5
- low 5-10
- medium 10-20
- high 20-30
- very high 30-50
- extremely high gt50
- If the broker/end-user requests the PoF exact
value this can be provided.
29Alternative Approach Bayesian Inference (2)
- The broker does not consider the providers
reliability directly. Instead it takes the
following approach - Having received a linguistic risk assessment for
a new SLA, the broker first computes a prior
distribution for the PoF, given the linguistic
category by considering data across all other
providers. - The broker computes a posterior distribution,
based on the failure rate observed in past SLAs
from the same provider with the same linguistic
risk assessment. - The broker returns an object which contains
- (PoF_broker, confidence)
30Alternative Approach Bayesian Inference (3)
31Summary/Conclusions
- A detailed analysis has been carried out for a
method to identify providers who are
systematically unreliable. - The stationarity problem has been addressed.
- Weighted Average
- Results indicate good performance relative to
basic measure and moving average. - This can be extended to other measures for
non-systematic errors. - Bayesian approach has been considered and is also
promising.