Title: Software Reliability Research
1Software Reliability Research
- Pankaj Jalote
- Professor, CSE, IIT Kanpur, India
2System Reliability
- System an entity that provides defined behavior
at interfaces - System is a hierarchy of subsystems, each
subsystem being a system - Reliability of a system - its ability to provide
failure-free operation - Failure the system behavior is incorrect or not
as expected is a random phenomenon
3Reliability Quantification
- Reliability of a system defined as failure
probability in a time period R(t) Prob that
system has not failed by time t - For rel work, often distribution of R(t) is
specified
4Reliability Quantification..
- Reliability can also be quantified by Mean Time
to Failure (MTTF) - Also by failure rate (no of failures per unit
time.) - From R(t), MTTF or failure rate can be determined
- Under some assumptions, failure rate and MTTF are
inversely related
5Software Reliability
- Software (un)reliability not caused due to aging
but due to bugs - The more the bugs, the lesser the reliability of
the software - Still failures seem random, hence rel theory can
be applied
6Software Reliability Research
- Two main threads
- Software reliability modeling how to model and
predict sw rel - Improving sw reliability by removing defects
through program checking, verification, testing, - Will discuss some work being done here in these
two
7Software Reliability Modeling
8Software Reliability
- Software systems often are one-off
- Measuring reliability in lab not practical as too
much failure data is needed requires time - Failures often result in fault removal, leading
to reliability improvement - Predicting future reliability from measured
reliability is harder - Hence different models needed
9Software Reliability Growth Models
- Assume that reliability is a function of the
defect level and as defects are removed,
reliability improves - Model the failure-fix process of software
evolution - Many models have been proposed in the last 3
decades - Model parameters determined from past data on
failures and fixes
10Reliability of Software Products
- For software products, a large population exists
in field and faults are not removed as failures
occur - According to SRGMs, the reliability should remain
the same - I.e. the failure rate should be constant
11Average Failure Rate of a MS Product
12Reasons for this Phenomenon
- Users learn with time and avoid failure causing
situation - Users start with exploring more, then limit to
some part of the product - Most users use a few product features
- Configuration related failures are much more in
the start - These failures reduce with time
13A New Model for Product Rel.
- For a user, there is a transient failure rate,
which decays with a factor - With time the transient goes, and failure rate
reaches a steady state - Steady state failure rate represents the
reliability of the product
14Failure Rate of a Unit
- Failure rate for one unit is? (i) ?0 ai ?f
- ?0 is the initial transient rate
- ?f is the final steady state rate
- a is the decay factor
15Applying it to a Product
- Considered the failure and sale data of a real
product for MS - Applying the model to the data and determining
parameters, we get - ?0 0.04 failures/month
- ?f 0.008 failures/month
- a 0.4 (i.e. 40 decay each month)
16Example
- Steady state failure rate is 1/6th of average
rate in month 2, 1/3rd of average rate in month 4 - I.e. initial MTTF could be 1/6th the steady state
MTTF - Steady state is reached quite soon in two to
three months
17Software Architecture Based Rel Estimation
18Sw Architecture
- Architecture is the components in the system and
how they are connected - Is decided very early in sw project
- If reliability and performance can be modeled
from architecture, can improve the architecture - Some work going on in arch. based perf. and rel
modeling
19Program Verification
20Program Verification
- Basic goal to ensure that program is free of
defects (bugs) as much as possible - Good program verification leads to higher
reliability
21Program Verification Techniques
- Testing program is executed with test data to
find bugs - Static analysis program source code is analyzed
- Dynamic analysis program run on some data and
assertions made - Model checking
- Formal verification
22Techniques
- Most techniques work in isolation
- Sometimes they are complimentary in their defect
detection capability - Combining techniques meaningfully can improve
reliability - We are working on techniques for combining
testing and static analysis
23State-based Testing Automation
24Testing
- Testing remains main verification activity most
reliance on it - Consumes as much as half of the total effort in a
sw product - Testing test case design, execution, checking
the results, then debugging, fixing, retesting - Each step is expensive
25Test Automation
- Test automation can help reduce cost and make
testing more effective - Most test automation approaches focus on data
collection, re-testing - Little effort in complete end-to-end automation
- We are working on automating OO testing using
state based models
26Summary
- Software reliability is a rich and wide area
- Exciting work going on across the world in
modeling, analysis, program checking, testing,
etc - Lots of open issues