Architecture Based Software Reliability - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Architecture Based Software Reliability

Description:

Center for Advanced Computing & Communication. Department of Electrical and Computer Engineering ... Scud missile missed by the Patriot (1991) Ariane 5 crash (1996) ... – PowerPoint PPT presentation

Number of Views:384
Avg rating:3.0/5.0
Slides: 71
Provided by: katerin8
Category:

less

Transcript and Presenter's Notes

Title: Architecture Based Software Reliability


1
Architecture Based Software Reliability
  • Katerina Goseva Popstojanova
  • Kishor S. Trivedi
  • Center for Advanced Computing
    Communication
  • Department of Electrical and Computer
    Engineering
  • Duke University, Durham, NC

2
Outline
  • Motivation and advantages
  • Common requirements and classification
  • Models elaboration
  • Assumptions, limitations and applicability
  • Conclusion

3
Software Reliability
  • Increasing dependence on computer systems
  • Failures are more due to software faults than to
    hardware faults
  • Examples of failures due to software
  • Excessive radiotherapy doses (1985-1987)
  • 9 hours outage of the long-distance phone in USA
    (1990)
  • Scud missile missed by the Patriot (1991)
  • Ariane 5 crash (1996)
  • 8 hours delay in opening the London Stock
    Exchange (2000)
  • Need for evaluation, prediction and improvement
  • of software reliability

4
Motivation
  • Black-box models treat software as a monolithic
    whole, considering only its interactions with
    external environment, without an attempt to
  • model its internal structure
  • With growing emphasis on reuse, software
    development process moves toward component-based
    software design
  • White-box approach can be used to analyze system
    with many software components and how they fit
    together

5
Advantages
  • Analyzing reliability and performance of the
    application built from reusable and COTS
  • software components
  • Studying sensitivity of the application
  • reliability/performance to reliability/performanc
    e
  • of components and interfaces
  • Guiding the process of identifying critical
    components and interfaces
  • Allocation of resources to each of the
    components,
  • so that the system reliability objective is
    achieved
  • Evaluation of design alternatives

6
Common Requirements and Classification
  • Component identification
  • Architecture of the software
  • Failure behavior of the components and interfaces
  • Combining the architecture with failure behavior
  • state-based models
  • path-based models
  • additive models

7
State-based Models
  • Estimate software reliability analytically by
  • combining software architecture with failure
    behavior
  • Method of solution
  • Composite method
  • Combine the architecture and failure behavior
    into
  • composite model
  • Solve the composite model for reliability and
    performance measures
  • Hierarchical method
  • Solve the architectural model
  • Superimpose the failure behavior on the solution
    of the architectural model in order to predict
    reliability

8
Software Architecture
  • Software behavior with respect to the manner in
    which different components interact
  • May include the information about the execution
    time of each component
  • Use control flow graph to represent architecture
  • Assume that transfer of control between
  • components has Markov property

9
Software Architecture - Contd
  • Sequential program architecture modeled by
  • Discrete Time Markov Chain (DTMC)
  • Continuous Time Markov Chain (CTMC)
  • Semi-Markov process (SMP)
  • These can be
  • absorbing - terminating applications
  • irreducible - continuously running applications

10
Failure Behavior of Components and
Interfaces
  • Failure can happen
  • during the execution of any component or
  • during the transfer of control between components
  • Failure behavior can be specified in terms of
  • reliability
  • constant failure rate
  • time-dependent failure intensity

11
Terminating ApplicationsCheung model (1980)
  • Architecture DTMC
  • Prtransfer of control from module to
    module
  • Failure behavior components reliability
  • Solution method Composite
  • Two absorbing states S (correct output) and F
    (failure) are added. Transition probability
    matrix
  • P is modified appropriately. W is the matrix
    obtained by deleting rows and columns
    corresponding to S and F. The element M(1,n)
    of the fundamental matrix M (I-W)-1
    represents
  • the probability of reaching state n from state
    1
  • System reliability is R M(1,n) Rn

12
Terminating Applications Kubat model (1989)
  • Architecture SMP
  • - density of the sojourn time in state
  • Failure behavior Component failure rate
  • Solution method Hierarchical
  • Reliability of component is
  • Embedded DTMC provides the expected number
  • of times each component is executed

13
Terminating Applications Kubat model - Contd
  • can be considered as the equivalent reliability
  • of the component that takes into account the
    component utilization
  • System reliability becomes

14
Terminating ApplicationsGokhale et al. model
(1998)
  • Architecture DTMC
  • - expected time spent in component per
    visit
  • Failure behavior Time-dependent failure rate
  • Solution method Hierarchical
  • Given and utilization of components
    represented by the cumulative expected time
    spent in the component per execution
    component reliability is
  • System reliability becomes

15
Comments on Terminating Application Models
  • Once the reliabilities are estimated the
    solution method in Kubat model reduces to
    hierarchical treatment of Cheung model
  • Special case of Kubat model that assumes
    deterministic execution times is equivalent to
  • the special case of Gokhale et al. model that
  • assumes constant failure intensities

16
Example 1
  • Terminating application
  • architecture described by
  • DTMC with transition
  • probability matrix Ppij
  • component reliabilities are

1
1
2
p23
p24
3
4
1
1
5
17
Example 1 - Contd
  • Solution method - Hierarchical
  • Vi is a clear indication of component usage
  • when p240.8 components 2
  • and 4 are invoked within a loop
  • many times which results in
  • a significantly higher expected
  • number of executions compared
  • to the case when p240.2
  • Application reliability is highly dependent on
    the components
  • usage

18
Example 1 - Contd
Solution method - Composite
1
1-R1

R1
1-R2
2
P23 R2
P24 R2
R4
1-R4
3
4
F
1-R3
R3
1-R5
5
S
R5
19
Example 1 - Contd
1
  • p24 0.8
  • Application reliability varies
  • significantly with the variation in
  • the reliability of components
  • 2 and 4
  • This is due to the fact that these
  • two components are invoked
  • large number of times

y
0.8
t
i
l
i
b
a
i
0.6
l
e
r
n
R4
o
i
0.4
t
a
c
R2
i
l
p
0.2
p
A
0
0.2
0.4
0.6
0.8
1
Reliability
of
a
component
Application reliability as a function of the
reliability of one component while
reliabilities of other components are fixed
20
Example 1 - Contd
  • p24 0.2
  • Application reliability does not
  • vary significantly with the
  • variation in the reliability of
  • component 4
  • This is due to the fact that
  • component 4 is invoked
  • few times
  • Even when R4 is low (including
  • R40), overall application
  • reliability is still high since
  • component 4, unlike other
  • components, is not necessary
  • invoked in each execution

1
R4
y
0.8
t
i
l
i
b
a
i
l
0.6
e
r
n
o
i
R2
0.4
t
a
c
i
l
p
0.2
p
A
0
0.2
0.4
0.6
0.8
1
Reliability
of
a
component
Application reliability as a function of the
reliability of one component while reliabilities
of other components are fixed
21
Continuously Running Applications
Littlewood model (1975)
  • Architecture CTMC
  • - transition rate from state to state
  • Failure behavior
  • When component is executed failures occur
    according to a Poisson process with parameter
  • Transfer of control between component and
    component fails with probability
  • Solution method Composite
  • Moment generating function of the number of
  • failures in time interval
  • Asymptotic analysis leads to a Poisson process
  • with parameter

22
Continuously Running Applications
Littlewood model (1979)
  • Architecture SMP
  • Failure behavior
  • When component is executed failures occur
    according to a Poisson process with parameter
  • Transfer of control between component and
    component fails with probability
  • Solution method Composite Asymptotic analysis
  • leads to Poisson process with parameter
  • where and

23
Continuously Running Applications
Laprie model (1984)
  • Architecture CTMC
  • Failure behavior constant failure rate
  • Solution method Hierarchical
  • Assuming that failure rates are much smaller
  • than execution rates leads to asymptotic
    behavior
  • relative to the execution process
  • System failure rate tends to

24
Continuously Running Applications
Ledoux model (1999)
  • Architecture CTMC
  • Failure behavior
  • primary failures which lead to execution break
    occur with constant failure rates
  • secondary failures are described as Poisson
    process with rate
  • primary (secondary) interface failures occur with
    probability ( )
  • Solution method Composite Using matrix-
  • analytical approach numerically are evaluated
  • distribution function of the number of failures
  • reliability
  • failure intensity function

25
Comments on Continuously Running Application
Models
  • Models that describe software architecture with
  • CTMC are special cases of Versatile Markov point
  • processes introduced by Neuts (1979) which are
  • shown to be equivalent to Batch Markovian Arrival
  • Processes (1991)
  • This is a rich class of point processes that have
    been used extensively to model arrival processes
    in queuing theory
  • Close relation of the BMAP with finite CTMC
    results
  • in matrix-analytic approach that substantially
    reduces
  • the computational complexity of the algorithmic
  • solution

26
Comments on Continuously Running Application
Models - Contd
  • Littlewood model (1975) and Ledoux model (1999)
    which consider both components and interfaces
    failures (with batch size of 1) are
  • Markovian Arrival Processes - MAP
  • Laprie model (1984) which considers only
    components failures is
  • doubly stochastic Poisson process known as
  • Markov Modulated Poisson Process - MMPP

27
Example 2
  • Continuously running application
  • architecture described by CTMC
  • transition probabilities pij
  • expected execution time of
  • component

1
g5
g1
p23 g2
p24 g2
2
g4
3
4
  • component failure rates

g3
5
28
Example 2 - Contd
1
  • This is a composite model
  • that can be solved exactly
  • Since failure rates are much
  • smaller than execution rates
  • many exchanges of control
  • would take place between
  • successive program failures
  • This leads to the asymptotic
  • behavior relative to the
  • execution process and
  • allows us to adopt the
  • hierarchical solution method

l1

g5
g1
l2
2
p23 g2
p24 g2
g4
l4
F
4
3
l3
g3
5
l5
29
Example 2 - Contd
Solution method - Hierarchical
  • The proportion of time spent in
  • each component pi is a measure
  • of component usage
  • when p240.8 the proportion of
  • time spent in components 2
  • and 4 is significantly higher
  • compared to the case when
  • p240.2
  • Application failure rate i.e. reliability is
    affected by the components usage

30
Example 2 - Contd
1
y
0.8
t
i
l
p24

i
0.2
b
i
a
l
0.6
e
r
n
o
i
0.4
p24

t
0.8
a
c
i
l
p
0.2
p
A
0
20
40
60
80
100
t
Application reliability as a function of time t
31
Path-based Models
  • Method used to combine software architecture
  • with failure behavior is not analytical
  • The sequence of components executed along each
    path is obtained either experimentally by testing
    or algorithmically
  • Path reliability is obtained by multiplying the
    reliabilities of the components and interfaces
  • along the path
  • System reliability is estimated by averaging path
    reliabilities over all paths

32
Path-based Models Shooman model (1976)
  • Architecture
  • Knowledge of all paths and their frequencies of
    execution
  • Failure behavior
  • Failure probability of path
  • Solution method
  • Total number of failures in test runs
    is
  • System failure probability on any test run is

33
Path-based Models Krishnamurthy-Mathur model
(1997)
  • Architecture Sequence of components along
    different paths is observed using the component
    traces collected during testing
  • Failure behavior component reliability
  • Solution method Reliability of a path traversed
    when P is executed on test case is
  • Reliability of P with respect to test set
    is

34
Path-based Models Yacoub,Cukic,Ammar model
(1999)
  • Architecture probabilistic model named Component
  • Dependency Graph is constructed using scenarios
  • Failure behavior
  • component reliability
  • transition reliability
  • Solution method Tree traversal algorithm
  • breadth expansions represent logical OR paths
    translated in summation of reliabilities weighted
    by the transition probability along each path
  • depth of each path represents the sequential
    execution of components, logical AND
    translated to multiplication of reliabilities

35
Comments on Path-based Models
  • Account for each component utilization along each
    path, as well as among different paths
  • Difference between state-based and path-based
    approach becomes evident when control flow graph
    contains loops
  • state based models analytically account for the
    infinite number of paths
  • path-based models
  • number of paths is restricted to one observed
    experimentally during the testing, or
  • depth traversal of each path is terminated using
    the average execution time of the application

36
Example 3
Path based approach
1
2
3
4
5
37
Example 3 - Contd
This sample of test cases results in the same
value for the transition probability p240.2
1
1
2
  • Assuming that components along
  • each path fail independently

0.8
0.2
Rin(6R1 R2 R3 R52R1 R22 R4R3 R5)/8
3
4
1
  • Considering intra-component
  • dependency by collapsing multiple
  • occurrences of a component

1
5
Rdep(6R1 R2 R3 R52R1 R2 R4R3 R5)/8
38
Example 3 - Contd
Comparison of the results
39
Example 3 - Contd
Another sample of test cases 12424235 12424242424
24235 results in p240.8
1
1
2
0.2
0.8
3
4
Rin(R1 R23R42 R3 R5 R1 R27R46 R3R5)/2
1
Rdep(R1 R2 R4R3 R5R1 R2 R4R3 R5)/2
1
5
40
Example 3 - Contd
  • In this case sample paths traverse components 2
    and
  • 4 within the loop significantly larger number of
    times
  • assuming intra-component
  • dependency results into
  • significantly higher
  • reliability compared to the
  • independent case

41
Example 3 - Contd
Path based model restricts the number of paths to
one observed experimentally
1
1
Considering all possible paths
2
R p23 R1 R2 R3 R5 1 p24
R2 R4 (p24 R2 R4)2
(p24 R2 R4)3
p23
p24
3
4
1
leads to the same solution as in the case of the
composite model
1
R p23 R1 R2 R3 R5 / (1- p24 R2 R4)
5
42
Additive Models
  • Do not consider software architecture explicitly
  • Focused on estimating overall application
    reliability using components failure data
  • Consider software reliability growth
  • Components failure processes are modeled by
  • non-homogeneous Poisson process (NHPP)
  • System failure process is also NHPP with
    cumulative number of failures and failure
    intensity function that are sums of the
    corresponding functions for each component

43
Additive Models Xie and Wohlin model (1995)
  • Failure behavior Components reliabilities are
    modeled by NHPP with failure intensity
  • Solution method
  • System failure intensity is
  • Expected cumulative number of system failures by
    time t, known as mean value function, is
  • Time has to be adjusted appropriately to
    consider different starting points for different
    components

44
Additive models Everett model (1999)
  • Failure behavior Components reliabilities are
    modeled by Extended Execution Time model that
    includes information about relative usage stress
    imposed on each component
  • Solution method Cumulative number of failures
    and failure intensity functions for superposition
    of such models is just the sum of corresponding
    functions for each component
  • Keeps track of the cumulative processing time
    per component during the testing, that is,
    considers software architecture implicitly

45
Example 4
Additive model
  • Consider a software that consists of two
    components,
  • which are tested independently
  • Second component is introduced into the system at
    t23
  • Components reliabilities are modeled by log-power
    model
  • that has mean value function
  • Using a set of data from a large communication
    software project results in the following mean
    value functions

46
Example 4 - Contd
600
  • The model fits very well the
  • sudden change in failure behavior
  • upon introduction of the second
  • component into the system
  • It overestimates the number of
  • failures because the log-power
  • model is not the best software
  • reliability growth model for this
  • set of data

500
s
e
r
u
l
i
400
a
f
f
300
o
r
e
b
200
m
u
100
N
0
0
10
20
30
40
50
Months
Estimated expected number of failures together
with the empirical failure data for the whole
system
47
Discussion on Model Choice
  • The choice can be based on different criteria
  • validity of the assumptions
  • accuracy of the solution
  • number of parameters in the model
  • ability to collect data
  • insight gained from the model evaluation
  • The relative weight to be placed on different
    criteria
  • may depend on the context in which the model is
  • being applied

48
Validity of Assumptions andAccuracy of Solution
  • Terminating Applications
  • Cheung model assumes that component reliabilities
    are known and uses composite method to obtain
    exact solution for system reliability
  • Kubat model and Gokhale et al. model use
  • two different approaches to estimate component
    reliabilities
  • hierarchical solution method
  • first order approximation for the reliability is
    based on
  • the assumptions that
  • components are highly reliable
  • variances of the number of times each component
    is executed are very small

49
Validity of Assumptions andAccuracy of Solution
  • Continuously Running Applications
  • Littlewood, Laprie and Ledoux models assume that
    the sojourn times in each component are
    exponentially distributed
  • If that is not the case one should use Littlewood
    model (1979)
  • Asymptotic solutions are based on the additional
    assumption that time between failures are much
    larger than times between exchange of control

50
Validity of Assumptions and Accuracy of Solution
  • Path-based models
  • Unlike state-based models that analytically
    account for the infinite number of paths,
    path-based models
  • restrict the number of paths Krishnamurthy,
    Mathur
  • terminate the depth traversal of each path
    Yacoub, Cukic, Ammar
  • System reliability should not differ
    significantly since long paths are usually highly
    improbable

51
Validity of Assumptions andAccuracy of Solution
  • Additive models
  • Xie and Wohlin model and Everett model could be
    the choice when interest is focused on the
    testing phase, when the reliability growth is
    considered
  • Component reliabilities need to be modeled with
    NHPP

52
Number of Parameters and Ability to Collect Data
  • Each model requires the knowledge of component
    failure behavior
  • Some models also require execution times of each
    component to be measured
  • may impose difficulties, especially when the
    distribution function is required
  • Granularity of required data is different
  • many models consider software architecture
    explicitly in terms of the transfer of control
    between components
  • other models deal directly with quantities such
    as
  • path reliabilities
  • cumulative execution time per component

53
Unavailability of High Quality Data
  • Major limitation to comparing and validating
    software reliability models is the lack of high
    quality data
  • This limitation is even more significant for the
    architecture based models which need far more
    sophisticated data than black-box models
  • Availability of high quality data should provide
  • sound basis for comparison
  • help with the clear choice between the models

54
Assumptions, Limitations, Applicability
  • Level of decomposition
  • Estimation of individual component reliabilities
  • Estimation of interface reliabilities
  • Validity of Markov assumption
  • Estimation of transition probabilities
  • Operational profile
  • Considering failure dependencies
  • Extracting software architecture
  • Sensitivity analysis
  • Considering multiple software quality attributes
  • Considering different architectural styles

55
Level of Decomposition
  • Decomposition level depends on the factors such
    as
  • system being analyzed, possibility of getting
    required
  • data, etc.
  • Too many small components may pose difficulties
  • in measurement, parametrization, and solution
    of
  • the model
  • Too few components may cause the distinction of
  • how components contribute to the system
    failure
  • to be lost

56
Level of Decomposition Contd
  • Choices for level of decomposition in
    experimental
  • studies published so far
  • Telephone switching software system - four
    componets according to the main functions Kanoun
    et. al 1987
  • Unix utility grep - 8 components Krishnamurthy
    et al. 1997
  • SHARPE 30 components, each corresponding to a
    single file
  • Gokhale et. al, 1998
  • Simulation of waiting queues - 6 reused
    components
  • Yacoub et al. 1999

57
Estimation of Components Reliabilities
  • Depends on whether or not component code is
  • available, how well the component has been
    tested,
  • whether it is a reused or a new component, etc.
  • Reliability growth models
  • difficulty due to the scarcity of failure data
  • Explicit consideration of non-failed executions,
    possibly together with failures
  • high number of executions is required
  • Fault seeding and fault injection
  • depends on the range of fault classes that are
    simulated

58
Estimation of Interface Reliabilities
  • Interface between two components could be
  • another component
  • collection of global variables
  • set of files
  • any combination of these
  • Little information is available about interface
    failures, apart from the general agreement that
    they exist separately from component failures
    revealed during unit testing

59
Validity of Markov Assumption
  • State-space models assume that the next component
    to be executed will depend only on the present
    component and is independent of the past history
  • the embedded Markov chain is a first order chain
  • Hypothesis that the chain is of a given order
  • needs to be tested
  • Higher order Markov chain
  • enables to consider dependency among components
  • can be represented as a first order chain by
    redefining
  • the state space appropriately
  • size of the state space grows fast

60
Estimation of Transition Probabilities
  • During the early phases transition probabilities
  • may be available by analyzing program structure
  • and using known operational profile
  • During the design phase, before actual
  • development, simulation can be used
  • During the integration phase, as new data become
    available, the estimates has to be updated
    thereby improving predictions

61
Operational Profile
  • Test selection aimed at
  • finding faults
  • increasing various structure coverages
  • demonstrating different functional requirements
  • are not representative of users operational
    profile
  • Upgrades of software might invalidate any
    existing
  • estimate of operational profile because new
  • features can change the way software is used
  • Change of the operational profile must be
  • considered in assessing components reliabilities

62
Considering Failure Dependencies among
Components and Interfaces
  • Existing models assume
  • failure processes associated with different
  • components are independent
  • when considered, interface failures are assumed
  • to be mutually independent and independent of
  • components failure processes
  • If a component failure behavior is affected by
    previous component being executed, or by the
    interface between them, these assumptions are no
    longer acceptable, that is,
  • inter-component and intra-component
    dependencies need to be considered

63
Extracting Software Architecture
  • If the software architecture is not available
    it
  • has to be extracted from the source or object
    code
  • static architectural information
  • parser-based or lexically-based tools
  • dynamic architectural information
  • profilers or test coverage tools

64
Extracting Software Architecture - Contd
  • Workbench for architectural extraction recently
  • developed at Software Engineering Institute
  • Used at Duke University
  • in house developed parser
  • GNU profiler gprof
  • coverage testing tool ATAC (Telcordia
    Technologies)
  • toolkit ATOM (Compaq Tru64Unix)

65
Sensitivity analysis
  • Helps to identify the critical components which
  • have the greatest impact on system reliability
  • and performance
  • Can be used for planning and certification
    activities
  • during different phases of software life cycle
  • reliability allocation to each component based
  • on target reliability for the entire system and
  • the sensitivity of the system to the component

66
Considering Multiple Software Quality Attributes
  • Architecture based models are mainly focused on
    reliability
  • Performance as a software quality attribute
    characterizes timeliness of the service delivered
  • Terminating application
  • expected execution time of the application
  • Continuously running application
  • expected time of one cycle

67
Considering Multiple Software Quality Attributes
- Contd
  • This tutorial presents an overview from the
  • perspective of Software reliability engineering
  • community
  • Software performance engineering perspective
  • in Smith, Williams, 1993
  • Unifying approach for reasoning about multiple
  • software quality attributes needs to be
    developed
  • first step in that direction - Architecture
    Tradeoff
  • Analysis Method (Software Engineering
    Institute)

68
Considering Different Architectural Styles
  • Todays software applications are far more
    complex,
  • frequently run on two or more processors,
  • under different operating systems, and
  • on geographically distributed machines
  • Architectural style is determined by
  • set of components types
  • clients, servers, filters, databases, objects,
    etc.
  • topological layout of these components indicating
    their interrelationships
  • set of interaction mechanisms
  • simple as procedure calls, pipes and event
    broadcast,
  • or much more complex as client-server
    protocols, database accessing protocols,
    etc.

69
Considering Different Architectural Styles
Contd
  • In todays network centric word most software
  • applications run in a distributed environment
  • Assumptions like sequential execution of
    components and instantaneous transfer of control
    are not applicable
  • Additional challenges
  • race conditions
  • deadlocks
  • communication errors
  • node failures
  • failures associated with deadline violations due
    to
  • communication overheads
  • etc.

70
Conclusion
  • Architectural decisions are made early in the
    life cycle they are hardest to change, most
    critical and far-reaching
  • State of the research and practice of
    architecture based approach to software
    reliability assessment
  • common requirements and classification
  • model elaboration
  • usefulness and limitations
  • key challenges for applicability
  • Standardized architectural styles have to be
    developed, along with the methods for their
    qualitative and quantitative assessment
Write a Comment
User Comments (0)
About PowerShow.com