Title: An Empirical Study on Reliability Modeling for Diverse Software Systems
1An Empirical Study on Reliability Modeling for
Diverse Software Systems
- Xia Cai and Michael R. Lyu
- Dept. of Computer Science Engineering
- The Chinese University of Hong Kong
2Outline
- Introduction
- Objectives and previous work
- Analyses and investigations on reliability models
for diverse software systems - Reliability bounds model by Popov,Strigini, et al
- System reliability model by Dugan and Lyu
- Discussion
- Conclusion
3Introduction
- Design diversity is one of the two main
techniques for software fault tolerance - The rationale of this approach is the expectation
that software programs built differently will
fail differently - Reliability models attempt to estimate the
probability of coincident failures in multiple
versions - Empirical data are highly demanded for evaluation
and cross-validation of the usefulness and/or
effectiveness of these models
4Reliability models for design diversity
- Eckhardt and Lee (1985)
- Variation of difficulty on demand space
- Positive correlations between version failures
- Littlewood and Miller (1989)
- Forced design diversity
- Possibility of negative correlations
- Dugan and Lyu (1995)
- Markov reward model
- Tomek and Trivedi (1995)
- Stochastic reward net
- Popov, Strigini et al (2003)
- Subdomains on demand space
- Upper/lower bounds for failure probability
Conceptual models
Structural models
In between
5Our objectives
- To study reliability and fault correlation issues
in design diversity by means of mutantation
testing - To investigate and compare the prediction
performance of different existing reliability
models for design diversity
6Our previous work
- Motivated by the lack of empirical data, we
conducted the RSDIMU project in the year 2002. - It took more than 100 students 12 weeks to
develop 34 program versions - 1200 test cases were executed on these program
versions - 426 mutants were generated by injecting a single
fault identified in the testing phase - A number of analyses and evaluations were
conducted in our previous work
7Outline
- Introduction
- Objectives and previous work
- Analyses and investigations on reliability models
for diverse software systems - Reliability bounds model by Popov,Strigini, et al
- (PS model)
- System reliability model by Dugan and Lyu
- (DL model)
- Discussion
- Conclusion
8PS Model
- Proposed by P. T. Popov, L. Strigini, J. May and
S. Kuball (2003) - Target give the upper and likely lower bounds
for probability of coincident failures - Assumptions
- Given the knowledge on disjoint subdomains Si on
the demand space, i.e., - 1)the probability P(Si) of a random demand being
drawn from Si - 2)the probabilities of failure on demand (pfds)
of A and B for demands from Si, PASi and PBSi .
9PS Model (cont)
- Alternative estimates for probability of failures
on demand (pfd) of a 1-out-of-2 system
10PS Model (cont)
- Upper bound of system pfd
- Likely lower bound of system pfd
- - under the assumption of conditional independence
11Experimental setup
- Mutants are treated as program versions in our
experiment - 1200 test cases are divided into seven categories
by the system status - The first 800 test cases (manually designed for
functionality testing) are used as qualification
test and other 400 test cases (randomly
generated) as operational test
12Information on subdomains
- Failure data and demand profile
subdomains
hypothetical
Faults in operational test
real
Upper bounds
Lower bounds
Analysis
Programs passed qualification test
13Estimation Method
- Since no failure was observed in some subdomains,
we adopt confidence bounds method rather than
point estimates method in our experiment - One-sided confidence bounds (Bayesian Bounds) are
computed for the probabilities of failures - 90 confidence upper bounds as well as lower
bounds on pfds of mutants in subdomains under all
demand profiles were estimated
14Bayesian Bounds under DP4
- 90 confidence upper bounds on pfds in subdomains
- 90 confidence lower bounds on pfds in subdomains
15Upper bounds
- Upper bounds on the joint pfds under all Demand
Profiles
Failure
Lower
Analysis
16Lower Bounds
- Likely lower bounds on the joint pfds under
Demand Profiles
Failure
Upper
Analysis
17Analysis on upper/lower bounds
Mutant pairs Failure features Performance comparison Covariance in failures Upper bounds Lower bounds
(117, 305) No correlation Observed Fail differently Positive (DP1) Negative (others) Smaller than min(PA,PB) Larger than PAPB in DP1
(215, 382) Correlation Observed Mutant 382 performs worse in all subdomains Always positive Equal to P215 Larger in all DPs
(382, 403) Correlation Observed Perform differently Positive (DP12) Negative(DP34) Smaller than min(PA,PB) Larger in DP12
Failure
Lower
Upper
18Discussion
- With our data, the confidence bounds in PS model
are tighter than PAPB and min(PA, PB) under most
circumstances except - One program performs worse than the other in all
subdomains - Negative covariance holds between the failure
probability of two programs - Difficulties and limitations of PS model
- The way to divide the demand space into disjoint
subdomains - The thorough knowledge on the probability and
performance of all the versions in each subdomain
19DL Model
- Proposed by Dugan and Lyu (1995)
- 3-level reliability model
- A Markov model detailing the system structure
- Two fault trees presenting the causes of failures
in the initial configuration and the reconfigured
state - Assumptions
- Unrelated faults different erroneous results
- Related faults similar erroneous results
20DL Model
- Example Reliability model of DRB
21DL Model (cont)
- Fault tree models for 2-, 3-, and 4-version
systems
22Results of DL model with our project data
- The new experimental data is applied to verify
the effectiveness and consistency of DL model - Six mutants with various failure characteristics
are employed in the operational test
23Results of DL model with our project data
- Failure characteristics for 2,3,4-version
configurations
24Results of DL model with our project data
- Summary of parameter values
Prob. of unrelated faults
Prob. of related faults between two versions
Prob. of related faults in all versions
25Results of DL model with our project data
- Predicted reliability by different configurations
26Results of DL model with our project data
- Predicted safety by different configurations
27Discussion
- Compared our project with former project, the
reliability and safety performance of DRB, NVP,
NSCP shows consistency of DL model with respect
to our experimental data - The discrepancy in the first thousands of hours
may indicate dependence on operational domains - The simplified classification of related and
unrelated faults need to be improved by including
real-life scenarios - To achieve more accurate results, the information
about the correlation between successive
executions should be included
28Comparison of PS DL Model
PS Model DL Model
Assumptions The whole demand space can be partitioned into disjoint subdomains knowledge on subdomains should be given The faults among program versions can be classified into unrelated faults and related faults
Prerequisite 1.Probability of subdomains 2.Failure probabilities of programs on subdomains 1.Number of faults unrelated and related among versions 2. Probability of hardware and decider failure
Target system Specific 1-out-of-2 system configurations All multi-version system combinations
Measurement objective Upper and lower bounds for failure probability Average failure probability
Experimental results Give tighter bounds under most circumstances, yet whether tighter enough needs further investigation The prediction results agree well with observation, yet may have deviations to a specific system
29Conclusion
- Mutants are employed to investigate the
prediction performance of two reliability models - Advantages, limitations and performance of PS and
DL model are compared - With our data, the confidence bounds in PS model
are tighter than PAPB and min(PA, PB) under most
circumstances
30Conclusion
- The PS approach is helpful with our data to
analyze the behaviors of the versions under
subdomains in revealing the features of fault
correlation among diverse programs - Our analyses with DL model about the reliability
and safety features of DRB, NVP and NSCP are
consist with the original experiment, although
there are crossovers in the first thousands of
hours in the reliability curves
31Future work
- More test cases should be employed for
cross-validation on the prediction accuracy of PS
model and DL model - Other existing reliability models can be applied
for further comparisons with our experimental
data
32Q A
Dept. of Computer Science Engineering