Architecture Based Software Reliability

About This Presentation

Title:

Architecture Based Software Reliability

Description:

Center for Advanced Computing & Communication. Department of Electrical and Computer Engineering ... Scud missile missed by the Patriot (1991) Ariane 5 crash (1996) ... – PowerPoint PPT presentation

Number of Views:384

Avg rating:3.0/5.0

Slides: 71

Provided by: katerin8

Category:

more less

Transcript and Presenter's Notes

Title: Architecture Based Software Reliability

1
Architecture Based Software Reliability

Katerina Goseva Popstojanova
Kishor S. Trivedi
Center for Advanced Computing
Communication
Department of Electrical and Computer
Engineering
Duke University, Durham, NC

2
Outline

Motivation and advantages
Common requirements and classification
Models elaboration
Assumptions, limitations and applicability
Conclusion

3
Software Reliability

Increasing dependence on computer systems
Failures are more due to software faults than to
hardware faults
Examples of failures due to software
Excessive radiotherapy doses (1985-1987)
9 hours outage of the long-distance phone in USA
(1990)
Scud missile missed by the Patriot (1991)
Ariane 5 crash (1996)
8 hours delay in opening the London Stock
Exchange (2000)
Need for evaluation, prediction and improvement
of software reliability

4
Motivation

Black-box models treat software as a monolithic
whole, considering only its interactions with
external environment, without an attempt to
model its internal structure
With growing emphasis on reuse, software
development process moves toward component-based
software design
White-box approach can be used to analyze system
with many software components and how they fit
together

5
Advantages

Analyzing reliability and performance of the
application built from reusable and COTS
software components
Studying sensitivity of the application
reliability/performance to reliability/performanc
e
of components and interfaces
Guiding the process of identifying critical
components and interfaces
Allocation of resources to each of the
components,
so that the system reliability objective is
achieved
Evaluation of design alternatives

6
Common Requirements and Classification

Component identification
Architecture of the software
Failure behavior of the components and interfaces
Combining the architecture with failure behavior
state-based models
path-based models
additive models

7
State-based Models

Estimate software reliability analytically by
combining software architecture with failure
behavior
Method of solution
Composite method
Combine the architecture and failure behavior
into
composite model
Solve the composite model for reliability and
performance measures
Hierarchical method
Solve the architectural model
Superimpose the failure behavior on the solution
of the architectural model in order to predict
reliability

8
Software Architecture

Software behavior with respect to the manner in
which different components interact
May include the information about the execution
time of each component
Use control flow graph to represent architecture
Assume that transfer of control between
components has Markov property

9
Software Architecture - Contd

Sequential program architecture modeled by
Discrete Time Markov Chain (DTMC)
Continuous Time Markov Chain (CTMC)
Semi-Markov process (SMP)
These can be
absorbing - terminating applications
irreducible - continuously running applications

10
Failure Behavior of Components and
Interfaces

Failure can happen
during the execution of any component or
during the transfer of control between components
Failure behavior can be specified in terms of
reliability
constant failure rate
time-dependent failure intensity

11
Terminating ApplicationsCheung model (1980)

Architecture DTMC
Prtransfer of control from module to
module
Failure behavior components reliability
Solution method Composite
Two absorbing states S (correct output) and F
(failure) are added. Transition probability
matrix
P is modified appropriately. W is the matrix
obtained by deleting rows and columns
corresponding to S and F. The element M(1,n)
of the fundamental matrix M (I-W)-1
represents
the probability of reaching state n from state
1
System reliability is R M(1,n) Rn

12
Terminating Applications Kubat model (1989)

Architecture SMP
- density of the sojourn time in state
Failure behavior Component failure rate
Solution method Hierarchical
Reliability of component is
Embedded DTMC provides the expected number
of times each component is executed

13
Terminating Applications Kubat model - Contd

can be considered as the equivalent reliability
of the component that takes into account the
component utilization
System reliability becomes

14
Terminating ApplicationsGokhale et al. model
(1998)

Architecture DTMC
- expected time spent in component per
visit
Failure behavior Time-dependent failure rate
Solution method Hierarchical
Given and utilization of components
represented by the cumulative expected time
spent in the component per execution
component reliability is
System reliability becomes

15
Comments on Terminating Application Models

Once the reliabilities are estimated the
solution method in Kubat model reduces to
hierarchical treatment of Cheung model
Special case of Kubat model that assumes
deterministic execution times is equivalent to
the special case of Gokhale et al. model that
assumes constant failure intensities

16
Example 1

Terminating application
architecture described by
DTMC with transition
probability matrix Ppij
component reliabilities are

1
1
2
p23
p24
3
4
1
1
5
17
Example 1 - Contd

Solution method - Hierarchical

Vi is a clear indication of component usage
when p240.8 components 2
and 4 are invoked within a loop
many times which results in
a significantly higher expected
number of executions compared
to the case when p240.2
Application reliability is highly dependent on
the components
usage

18
Example 1 - Contd
Solution method - Composite
1
1-R1

R1
1-R2
2
P23 R2
P24 R2
R4
1-R4
3
4
F
1-R3
R3
1-R5
5
S
R5
19
Example 1 - Contd
1

p24 0.8
Application reliability varies
significantly with the variation in
the reliability of components
2 and 4
This is due to the fact that these
two components are invoked
large number of times

y
0.8
t
i
l
i
b
a
i
0.6
l
e
r
n
R4
o
i
0.4
t
a
c
R2
i
l
p
0.2
p
A
0
0.2
0.4
0.6
0.8
1
Reliability
of
a
component
Application reliability as a function of the
reliability of one component while
reliabilities of other components are fixed
20
Example 1 - Contd

p24 0.2
Application reliability does not
vary significantly with the
variation in the reliability of
component 4
This is due to the fact that
component 4 is invoked
few times
Even when R4 is low (including
R40), overall application
reliability is still high since
component 4, unlike other
components, is not necessary
invoked in each execution

1
R4
y
0.8
t
i
l
i
b
a
i
l
0.6
e
r
n
o
i
R2
0.4
t
a
c
i
l
p
0.2
p
A
0
0.2
0.4
0.6
0.8
1
Reliability
of
a
component
Application reliability as a function of the
reliability of one component while reliabilities
of other components are fixed
21
Continuously Running Applications
Littlewood model (1975)

Architecture CTMC
- transition rate from state to state
Failure behavior
When component is executed failures occur
according to a Poisson process with parameter
Transfer of control between component and
component fails with probability
Solution method Composite
Moment generating function of the number of
failures in time interval
Asymptotic analysis leads to a Poisson process
with parameter

22
Continuously Running Applications
Littlewood model (1979)

Architecture SMP
Failure behavior
When component is executed failures occur
according to a Poisson process with parameter
Transfer of control between component and
component fails with probability
Solution method Composite Asymptotic analysis
leads to Poisson process with parameter
where and

23
Continuously Running Applications
Laprie model (1984)

Architecture CTMC
Failure behavior constant failure rate
Solution method Hierarchical
Assuming that failure rates are much smaller
than execution rates leads to asymptotic
behavior
relative to the execution process
System failure rate tends to

24
Continuously Running Applications
Ledoux model (1999)

Architecture CTMC
Failure behavior
primary failures which lead to execution break
occur with constant failure rates
secondary failures are described as Poisson
process with rate
primary (secondary) interface failures occur with
probability ( )
Solution method Composite Using matrix-
analytical approach numerically are evaluated
distribution function of the number of failures
reliability
failure intensity function

25
Comments on Continuously Running Application
Models

Models that describe software architecture with
CTMC are special cases of Versatile Markov point
processes introduced by Neuts (1979) which are
shown to be equivalent to Batch Markovian Arrival
Processes (1991)
This is a rich class of point processes that have
been used extensively to model arrival processes
in queuing theory
Close relation of the BMAP with finite CTMC
results
in matrix-analytic approach that substantially
reduces
the computational complexity of the algorithmic
solution

26
Comments on Continuously Running Application
Models - Contd

Littlewood model (1975) and Ledoux model (1999)
which consider both components and interfaces
failures (with batch size of 1) are
Markovian Arrival Processes - MAP
Laprie model (1984) which considers only
components failures is
doubly stochastic Poisson process known as
Markov Modulated Poisson Process - MMPP

27
Example 2

Continuously running application
architecture described by CTMC
transition probabilities pij
expected execution time of
component

1
g5
g1
p23 g2
p24 g2
2
g4
3
4

component failure rates

g3
5
28
Example 2 - Contd
1

This is a composite model
that can be solved exactly
Since failure rates are much
smaller than execution rates
many exchanges of control
would take place between
successive program failures
This leads to the asymptotic
behavior relative to the
execution process and
allows us to adopt the
hierarchical solution method

l1

g5
g1
l2
2
p23 g2
p24 g2
g4
l4
F
4
3
l3
g3
5
l5
29
Example 2 - Contd
Solution method - Hierarchical

The proportion of time spent in
each component pi is a measure
of component usage
when p240.8 the proportion of
time spent in components 2
and 4 is significantly higher
compared to the case when
p240.2
Application failure rate i.e. reliability is
affected by the components usage

30
Example 2 - Contd
1
y
0.8
t
i
l
p24

i
0.2
b
i
a
l
0.6
e
r
n
o
i
0.4
p24

t
0.8
a
c
i
l
p
0.2
p
A
0
20
40
60
80
100
t
Application reliability as a function of time t
31
Path-based Models

Method used to combine software architecture
with failure behavior is not analytical
The sequence of components executed along each
path is obtained either experimentally by testing
or algorithmically
Path reliability is obtained by multiplying the
reliabilities of the components and interfaces
along the path
System reliability is estimated by averaging path
reliabilities over all paths

32
Path-based Models Shooman model (1976)

Architecture
Knowledge of all paths and their frequencies of
execution
Failure behavior
Failure probability of path
Solution method
Total number of failures in test runs
is
System failure probability on any test run is

33
Path-based Models Krishnamurthy-Mathur model
(1997)

Architecture Sequence of components along
different paths is observed using the component
traces collected during testing
Failure behavior component reliability
Solution method Reliability of a path traversed
when P is executed on test case is
Reliability of P with respect to test set
is

34
Path-based Models Yacoub,Cukic,Ammar model
(1999)

Architecture probabilistic model named Component
Dependency Graph is constructed using scenarios
Failure behavior
component reliability
transition reliability
Solution method Tree traversal algorithm
breadth expansions represent logical OR paths
translated in summation of reliabilities weighted
by the transition probability along each path
depth of each path represents the sequential
execution of components, logical AND
translated to multiplication of reliabilities

35
Comments on Path-based Models

Account for each component utilization along each
path, as well as among different paths
Difference between state-based and path-based
approach becomes evident when control flow graph
contains loops
state based models analytically account for the
infinite number of paths
path-based models
number of paths is restricted to one observed
experimentally during the testing, or
depth traversal of each path is terminated using
the average execution time of the application

36
Example 3
Path based approach
1
2
3
4
5
37
Example 3 - Contd
This sample of test cases results in the same
value for the transition probability p240.2
1
1
2

Assuming that components along
each path fail independently

0.8
0.2
Rin(6R1 R2 R3 R52R1 R22 R4R3 R5)/8
3
4
1

Considering intra-component
dependency by collapsing multiple
occurrences of a component

1
5
Rdep(6R1 R2 R3 R52R1 R2 R4R3 R5)/8
38
Example 3 - Contd
Comparison of the results
39
Example 3 - Contd
Another sample of test cases 12424235 12424242424
24235 results in p240.8
1
1
2
0.2
0.8
3
4
Rin(R1 R23R42 R3 R5 R1 R27R46 R3R5)/2
1
Rdep(R1 R2 R4R3 R5R1 R2 R4R3 R5)/2
1
5
40
Example 3 - Contd

In this case sample paths traverse components 2
and
4 within the loop significantly larger number of
times
assuming intra-component
dependency results into
significantly higher
reliability compared to the
independent case

41
Example 3 - Contd
Path based model restricts the number of paths to
one observed experimentally
1
1
Considering all possible paths
2
R p23 R1 R2 R3 R5 1 p24
R2 R4 (p24 R2 R4)2
(p24 R2 R4)3
p23
p24
3
4
1
leads to the same solution as in the case of the
composite model
1
R p23 R1 R2 R3 R5 / (1- p24 R2 R4)
5
42
Additive Models

Do not consider software architecture explicitly
Focused on estimating overall application
reliability using components failure data
Consider software reliability growth
Components failure processes are modeled by
non-homogeneous Poisson process (NHPP)
System failure process is also NHPP with
cumulative number of failures and failure
intensity function that are sums of the
corresponding functions for each component

43
Additive Models Xie and Wohlin model (1995)

Failure behavior Components reliabilities are
modeled by NHPP with failure intensity
Solution method
System failure intensity is
Expected cumulative number of system failures by
time t, known as mean value function, is
Time has to be adjusted appropriately to
consider different starting points for different
components

44
Additive models Everett model (1999)

Failure behavior Components reliabilities are
modeled by Extended Execution Time model that
includes information about relative usage stress
imposed on each component
Solution method Cumulative number of failures
and failure intensity functions for superposition
of such models is just the sum of corresponding
functions for each component
Keeps track of the cumulative processing time
per component during the testing, that is,
considers software architecture implicitly

45
Example 4
Additive model

Consider a software that consists of two
components,
which are tested independently
Second component is introduced into the system at
t23
Components reliabilities are modeled by log-power
model
that has mean value function
Using a set of data from a large communication
software project results in the following mean
value functions

46
Example 4 - Contd
600

The model fits very well the
sudden change in failure behavior
upon introduction of the second
component into the system
It overestimates the number of
failures because the log-power
model is not the best software
reliability growth model for this
set of data

500
s
e
r
u
l
i
400
a
f
f
300
o
r
e
b
200
m
u
100
N
0
0
10
20
30
40
50
Months
Estimated expected number of failures together
with the empirical failure data for the whole
system
47
Discussion on Model Choice

The choice can be based on different criteria
validity of the assumptions
accuracy of the solution
number of parameters in the model
ability to collect data
insight gained from the model evaluation
The relative weight to be placed on different
criteria
may depend on the context in which the model is
being applied

48
Validity of Assumptions andAccuracy of Solution

Terminating Applications
Cheung model assumes that component reliabilities
are known and uses composite method to obtain
exact solution for system reliability
Kubat model and Gokhale et al. model use
two different approaches to estimate component
reliabilities
hierarchical solution method
first order approximation for the reliability is
based on
the assumptions that
components are highly reliable
variances of the number of times each component
is executed are very small

49
Validity of Assumptions andAccuracy of Solution

Continuously Running Applications
Littlewood, Laprie and Ledoux models assume that
the sojourn times in each component are
exponentially distributed
If that is not the case one should use Littlewood
model (1979)
Asymptotic solutions are based on the additional
assumption that time between failures are much
larger than times between exchange of control

50
Validity of Assumptions and Accuracy of Solution

Path-based models
Unlike state-based models that analytically
account for the infinite number of paths,
path-based models
restrict the number of paths Krishnamurthy,
Mathur
terminate the depth traversal of each path
Yacoub, Cukic, Ammar
System reliability should not differ
significantly since long paths are usually highly
improbable

51
Validity of Assumptions andAccuracy of Solution

Additive models
Xie and Wohlin model and Everett model could be
the choice when interest is focused on the
testing phase, when the reliability growth is
considered
Component reliabilities need to be modeled with
NHPP

52
Number of Parameters and Ability to Collect Data

Each model requires the knowledge of component
failure behavior
Some models also require execution times of each
component to be measured
may impose difficulties, especially when the
distribution function is required
Granularity of required data is different
many models consider software architecture
explicitly in terms of the transfer of control
between components
other models deal directly with quantities such
as
path reliabilities
cumulative execution time per component

53
Unavailability of High Quality Data

Major limitation to comparing and validating
software reliability models is the lack of high
quality data
This limitation is even more significant for the
architecture based models which need far more
sophisticated data than black-box models
Availability of high quality data should provide
sound basis for comparison
help with the clear choice between the models

54
Assumptions, Limitations, Applicability

Level of decomposition
Estimation of individual component reliabilities
Estimation of interface reliabilities
Validity of Markov assumption
Estimation of transition probabilities
Operational profile
Considering failure dependencies
Extracting software architecture
Sensitivity analysis
Considering multiple software quality attributes
Considering different architectural styles

55
Level of Decomposition

Decomposition level depends on the factors such
as
system being analyzed, possibility of getting
required
data, etc.
Too many small components may pose difficulties
in measurement, parametrization, and solution
of
the model
Too few components may cause the distinction of
how components contribute to the system
failure
to be lost

56
Level of Decomposition Contd

Choices for level of decomposition in
experimental
studies published so far
Telephone switching software system - four
componets according to the main functions Kanoun
et. al 1987
Unix utility grep - 8 components Krishnamurthy
et al. 1997
SHARPE 30 components, each corresponding to a
single file
Gokhale et. al, 1998
Simulation of waiting queues - 6 reused
components
Yacoub et al. 1999

57
Estimation of Components Reliabilities

Depends on whether or not component code is
available, how well the component has been
tested,
whether it is a reused or a new component, etc.
Reliability growth models
difficulty due to the scarcity of failure data
Explicit consideration of non-failed executions,
possibly together with failures
high number of executions is required
Fault seeding and fault injection
depends on the range of fault classes that are
simulated

58
Estimation of Interface Reliabilities

Interface between two components could be
another component
collection of global variables
set of files
any combination of these
Little information is available about interface
failures, apart from the general agreement that
they exist separately from component failures
revealed during unit testing

59
Validity of Markov Assumption

State-space models assume that the next component
to be executed will depend only on the present
component and is independent of the past history
the embedded Markov chain is a first order chain
Hypothesis that the chain is of a given order
needs to be tested
Higher order Markov chain
enables to consider dependency among components
can be represented as a first order chain by
redefining
the state space appropriately
size of the state space grows fast

60
Estimation of Transition Probabilities

During the early phases transition probabilities
may be available by analyzing program structure
and using known operational profile
During the design phase, before actual
development, simulation can be used
During the integration phase, as new data become
available, the estimates has to be updated
thereby improving predictions

61
Operational Profile

Test selection aimed at
finding faults
increasing various structure coverages
demonstrating different functional requirements
are not representative of users operational
profile
Upgrades of software might invalidate any
existing
estimate of operational profile because new
features can change the way software is used
Change of the operational profile must be
considered in assessing components reliabilities

62
Considering Failure Dependencies among
Components and Interfaces

Existing models assume
failure processes associated with different
components are independent
when considered, interface failures are assumed
to be mutually independent and independent of
components failure processes
If a component failure behavior is affected by
previous component being executed, or by the
interface between them, these assumptions are no
longer acceptable, that is,
inter-component and intra-component
dependencies need to be considered

63
Extracting Software Architecture

If the software architecture is not available
it
has to be extracted from the source or object
code
static architectural information
parser-based or lexically-based tools
dynamic architectural information
profilers or test coverage tools

64
Extracting Software Architecture - Contd

Workbench for architectural extraction recently
developed at Software Engineering Institute
Used at Duke University
in house developed parser
GNU profiler gprof
coverage testing tool ATAC (Telcordia
Technologies)
toolkit ATOM (Compaq Tru64Unix)

65
Sensitivity analysis

Helps to identify the critical components which
have the greatest impact on system reliability
and performance
Can be used for planning and certification
activities
during different phases of software life cycle
reliability allocation to each component based
on target reliability for the entire system and
the sensitivity of the system to the component

66
Considering Multiple Software Quality Attributes

Architecture based models are mainly focused on
reliability
Performance as a software quality attribute
characterizes timeliness of the service delivered
Terminating application
expected execution time of the application
Continuously running application
expected time of one cycle

67
Considering Multiple Software Quality Attributes
- Contd

This tutorial presents an overview from the
perspective of Software reliability engineering
community
Software performance engineering perspective
in Smith, Williams, 1993
Unifying approach for reasoning about multiple
software quality attributes needs to be
developed
first step in that direction - Architecture
Tradeoff
Analysis Method (Software Engineering
Institute)

68
Considering Different Architectural Styles

Todays software applications are far more
complex,
frequently run on two or more processors,
under different operating systems, and
on geographically distributed machines
Architectural style is determined by
set of components types
clients, servers, filters, databases, objects,
etc.
topological layout of these components indicating
their interrelationships
set of interaction mechanisms
simple as procedure calls, pipes and event
broadcast,
or much more complex as client-server
protocols, database accessing protocols,
etc.

69
Considering Different Architectural Styles
Contd

In todays network centric word most software
applications run in a distributed environment
Assumptions like sequential execution of
components and instantaneous transfer of control
are not applicable
Additional challenges
race conditions
deadlocks
communication errors
node failures
failures associated with deadline violations due
to
communication overheads
etc.

70
Conclusion

Architectural decisions are made early in the
life cycle they are hardest to change, most
critical and far-reaching
State of the research and practice of
architecture based approach to software
reliability assessment
common requirements and classification
model elaboration
usefulness and limitations
key challenges for applicability
Standardized architectural styles have to be
developed, along with the methods for their
qualitative and quantitative assessment