Title: Towards a Radically New Theory of Software Reliability
1Towards a Radically New Theory of Software
Reliability
Aditya P. Mathur Head and Professor
Department of Computer Science, Purdue
University
ABB, Sweden Monday April 7, 2008
2Reliability
- Probability of failure free operation in a given
environment over a given time.
Mean Time To Failure (MTTF)
Mean Time To Disruption (MTTD)
Mean Time To Restore (MTTR)
3Claim
- Existing theories of software reliability
simplify the problem to the extent that they
(almost) maximize the uncertainty associated with
the estimated software reliability.
4Operational profile
- Probability distribution of usage of features
and/or scenarios.
Captures the usage pattern with respect to a
class of customers.
5Reliability estimation
Operational profile
6Issues Operational profile
- Variable. Becomes known only after customers
have access to the product. Is a stochastic
processa moving target!
Random test generation requires an oracle. Hence
is generally limited to specific outcomes, e.g.
crash, hang.
What about an operational profile with impulse?
This creates a non-differentiable probability
function of the time-to-failure.
7Issues Failure data
- Should we analyze the failures?
If yes then after the cause is removed then the
reliability estimate is invalid.
If the cause is not removed because the
failure is a minor incident then the
reliability estimate corresponds to irrelevant
incidents.
8Issues Failure rate
- That is, the failure rate, when unambiguously
defined, does not have a physical reality
rather, it is a technical device, whose sole
purpose is to convey the engineers personal
opinion about the life characteristic of
software. - Nozer Singpurwalla, The failure rate of
software does it exists?, IEEE Transactions
on Reliability, vol. 44, no. 3,1995.
9Issues Model selection
- Rarely does a model fit the failure data.
Model selection becomes a problem. 200 models to
choose from? New ones keep arriving!
- Markov chain models suffer from a lack of
estimate of transition probabilities. - To compute these probabilities, you need to
execute the application. - During execution you obtain failure data. Then
why proceed further with the model?
10Issues Markovian models
?12
?12
?131
?21
?32
?13
- Markov chain models suffer from a lack of
estimate of transition probabilities. - To compute these probabilities, you need to
execute the application. - During execution you obtain failure data. Then
why proceed further with the model?
11Issues Assumptions
- Software does not degrade over time memory leak
is not degradation and is not a random process a
new version is a different piece of software.
- Reliability estimate varies with operational
profile. Different customers see different
reliability. - Can we not have a reliability estimate that is
independent of operational profile? - Can we not advertise quality based on metric that
are a true representation of reliability..not
with respect to a subset of features but over the
entire set of features?
12Estimating Uncertainty
- Estimates of software reliability must the
associated with uncertainty. But how to quantify
uncertainty?
- Entropy based approach Katerina et al. 2002
- Moments based approach Katerina et al. 2003
- Monte Carlo approach Katerina et al. 2003
- Bayesian approach Dai et al. 2007
13Estimating Uncertainty
- Model the parameters as random variables.
- Use statistical (e.g. moments) or Simulation
approaches to estimate variance.
- Problem
- Does not correlate with likely faulty components
in the program under test.
14Sensitivity of Reliability to test adequacy
15Basis for an alternate approach
Why not develop a theory based on coverage of
testable items and test adequacy? Testable
items Variables, statements,conditions, loops,
data flows, methods, classes, etc.
Pros Errors hide in testable items.
Cons Coverage of testable items is inadequate.
Is it a good predictor of reliability?
Yes, but only when used carefully. Let us see
what happens when coverage is not used or not
used carefully.
16Saturation Effect
Rm
Rd
Rdf
Rf
Reliability
Rm
Rdf
Mutation
Rd
Dataflow
Rf
Decision
Functional
tfs
tfe
tds
tde
tdfs
tdfe
tms
tfe
Testing Effort
uuncertainty
FUNCTIONAL, DECISION, DATAFLOW AND MUTATION
TESTING PROVIDE TEST ADEQUACY CRITERIA.
17An experiment TeX
Tests generated randomly exercise less code than
those generated using a mix of black box and
white box techniques. Application TeX. Creator
Donald Knuth. Leath 92
18An experiment sort utility
UNIX sort utility DelFrate et al. 1995
19An experiment coverage-reliability correlations
Unix utilities and space application Garg 1995.
MS Thesis
20Modeling an application
21Reliability of a component
Reliability, probability of correct operation,
of function f based on a given finite set of
testable items.
R(f) ?(covered/total), 0lt?lt1.
Issue How to compute ? ?
Approach High correlation between coverage
metrics and failures has been established via
empirical studies. Such studies could provide
estimate of ? and its variance for different sets
of testable items.
22Reliability of a subsystem
Cf1, f2,..fn is a collection of components
that collaborate with each other to provide
services.
R(C) g(R(f1), R(f2), ..R(fn), R(I))
Issue 1 How to compute R(I), reliability of
component interactions?
Issue 2 What is g ?
Issue 3 Theory of systems reliability creates
problems when (a) components are in a loop and
(b) are dependent on each other.
23Scalability
Is the component based approach scalable?
Powerful coverage measures lead to better
reliability estimates whereas measurement of
coverage becomes increasingly difficult as more
powerful criteria are used.
Solution Use component based, incremental,
approach. Estimate reliability bottom-up. No need
to measure coverage of components whose
reliability is known.
24Next steps
Develop component based theory of reliability.
Base the new theory on existing work in software
testing and reliability.
Do experimentation with large systems to
investigate the applicability of the their and
its effectiveness in predicting and estimating
various reliability metrics.
25The Future
Boxed and embedded software with independently
variable Levels of Confidence.
Mackie Confidence 0.99
Level 0 1.0
Level 1 0.9999
26Select References
M. H. Chen. A. P. Mathur, and V. J. Rego. A Case
Study To Investigate Sensitivity Of Reliability
Estimates To Errors In The Operational Profile,
Proceedings of the Fifth International Symposium
on Software Reliability Engineering, IEEE
Computer Society Press, Monterey, California,
November 6-9, 1994, pp 276-281.
P. Garg. On code coverage and software
reliability. MS Thesis. Department of Computer
Science, Purdue University. May 1995.
F. Del Frate, P. Garg, A. P. Mathur, and A.
Pasquini. On the Correlation Between Code
Coverage and Software Reliability, Proceedings of
the Sixth International Symposium on Software
Reliability Engineering, IEEE Press,Toulouse,
France, pp 124-132, October 24-27, 1995
S. Krishnamurthy and A. P. Mathur. On the
Estimation of Reliability of a Software System
Using Reliabilities of its Components,
Proceedings of the 8th International Symposium on
Software Reliability Estimation, Albuquerque, New
Mexico, November 1997.
Katerina GosevaPopstojanova and Sunil Kamavaram.
Assessing Uncertainty in Reliability of
ComponentBased Software. Proceedings of the 14th
International Symposium on Software Reliability
Engineering (ISSRE03), 2003.
Yuan-Shun Dai and Min Xie and Quan Long and
Szu-Hui Ng. Uncertainty Analysis in Software
Reliability Modeling by Bayesian Analysis with
Maximum-Entropy Principle, IEEE Trans. Softw.
Eng.,V 33, No. 11, 2007, pp 781--795.