Title: Software Reliability Methods and Experience Dave Dwyer USA
1Software Reliability Methods and Experience
Dave Dwyer USA EIS david.j.
dwyer_at_ baesystems.com
2Overview and outline
- Definitions
- Similarities and differences hardware and
software reliability - Foundations of Musas models reviewed
- Trachtenberg (Trachtenberg, Martin. The Linear
Software Reliability Model and Uniform Testing,
IEEE Transactions on Reliability, 1985, pp 8-16) - Downs (Downs, Thomas. An Approach to the
Modeling of Software Testing with Some
Applications, IEEE Transactions on Software
Engineering, Vol. SE-11, No. 4, April 1985, pp
375-386) - Instantaneous Failure Rate, a.k.a. failure
intensity - Hardware - Duane, Codier
- Software - analogous derivation
- Testing results
- SW reliability calculator
3SW reliability defined
- Software reliability defined
- The probability of failure-free operation for a
specified time in a specified environment for a
specified purpose (Software Engineering, 5th
edition, I. Somerville, Addison-Wesley, 1995) - The probability of failure-free operation of a
computer program for a specified time in a
specified environment (Software Reliability,
Musa, Iannino, Okumoto, McGraw-Hill, 1987) - We will use MTBF or its reciprocal, ?
4HW vs. SW reliability
- The hardware reliability discipline provided an
impetus to provide for safety margins in the
stresses, both mechanical and electrical - But margins of safety dont mean much in software
because it doesnt wear out - Software has x failures per million unique
executions if y executions/hour, then xy
failures/million hours - Once a process has been successfully executed,
that identical process is not going to fail in
the future
5Martin Trachtenberg (1985)
- Simulation testing showed that
- Testing the functions of the software system in a
random or round-robin order and fixing the
failures gives linearly decaying system error
rates - Testing and fixing each function exhaustively one
at a time gives flat system-error rates - Testing and fixing different functions at widely
different frequencies gives exponentially
decaying system error rates operational profile
testing, and - Testing strategies that result in linear decaying
error rates tend to require the fewest tests to
detect a given number of errors - Testing to the operational profile gives the
lowest time to reach an operational MTBF
6Downs Pure approach reflected the nature of
software (1985)
- The execution of a sequence of M paths
- The actual number of paths affected by a fault is
treated as a random variable c - Not all paths are equally likely to be executed
- ?j (N j)?, where
- N the total number of faults,
- j the number of corrected faults,
- ? -r log(1 c/M),
- r the number of paths executed/unit time
7Downs execution path parameters
8Our data analysis approach
- Cumulative 8-hour test shifts are recorded
- Failures plotted
- All
- First instance
- The last data point will be put at the end of the
test time - Only integration and system test data
9Failure rate is proportional to failure number,
Downs ?j ? (N j)r(c/M)
10Failure rate plots against failure number for a
range of non-uniform testing profiles, M1, M2
paths and N1, N2 initial faults in those paths
11Instantaneous failure intensity derivation
Duanes for hardware
Instantaneous ? for HW
Instantaneous ? for SW
Same Approach
Similar Result
12Background test example
- Console operation and operating profile
- Necessity of distinguishing failure priorities
- Priority 1 Prevents mission essential
capability - Priority 2 Adversely affects mission essential
capability with no alternative workaround - Priority 3 Adversely affects mission essential
capability with alternative workaround - Work shifts varied over test duration 1-3/day
- Calculation of failure intensity
13Corrective action for Priority 2 failures
suspended while Priority 1 failures corrected
14Codier, Duane 1964 RAMS HW reliability growth
- Ref. Appendix B, Notes on Plotting (Codier,
Ernest O., Reliability Growth in Real Life,
Proceedings, 1968 Annual Symposium on
Reliability, New York, IEEE, January 1968, pp
458-469) - 1. The latter points, having more information
content, must be given more weight than earlier
points (Trachtenberg, too) - 2. The normal curve-fitting procedures of drawing
the line through the center of gravity of all
the points should not be used - 3. Start the line on the last data point and seek
the region of highest density of points to the
left right for Musa plots of it
15How I draw a growth line through the points on a
reliability growth plot?
- Is there one point that is most important?
- Yes, the last point represents the cumulative
MTBF to date it has the most degrees of freedom - Should the trend line go through that point?
- Yes, it has the best measure of cumulative MTBF
- Would an Excel trend line go through that point?
- No, its just a least squares fit with all points
weighing the same - What is the least important point?
- The first it has the least degrees of freedom
16Questions Drawing a line through the points
(cont.)
- If the line goes through the last point, what
else should it go through? - The center of density of the other points (ref.
back to Duane, Codier) - What is the center of density?
- The center of density is where the center of mass
would be if The latter points are given more
weight than earlier points
17Example - Priority 1 data plotted
18Point estimates vs. instantaneous
19The formula for calculation of ?i correlates with
interval estimates of failure intensity
20Most recent data plot
21A calculator has been developed for BAE Systems
SW reliability practice 8349714
22Priority 1 data graph
23Questions?
- Anybody want a grad course in SW Reliability? I
need 5 more students - Rivier College can do that through
teleconference(e-mail david.j.dwyer_at_baesystems.c
om) - You will solve a real problem _at_ no charge to your
department (except tuition)