Dependability - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Dependability

Description:

Model Concurrency and Timeliness ... zi,j: Mean time spent in state i,j until absorption. A. Bobbio. Reggio Emilia, June 17-18, 2003 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 72
Provided by: Bob101
Category:

less

Transcript and Presenter's Notes

Title: Dependability


1
Dependability Maintainability Theory and
Methods 5. Markov Models
  • Andrea Bobbio
  • Dipartimento di Informatica
  • Università del Piemonte Orientale, A. Avogadro
  • 15100 Alessandria (Italy)
  • bobbio_at_unipmn.it - http//www.mfn.unipmn.it/bob
    bio/IFOA

IFOA, Reggio Emilia, June 17-18, 2003
2
State-Space-Based Models
  • States and labeled state transitions
  • State can keep track of
  • Number of functioning resources of each type
  • States of recovery for each failed resource
  • Number of tasks of each type waiting at each
    resource
  • Allocation of resources to tasks
  • A transition
  • Can occur from any state to any other state
  • Can represent a simple or a compound event

3
State-Space-Based Models (Continued)
  • Transitions between states represent the change
    of the system state due to the occurrence of an
    event
  • Drawn as a directed graph
  • Transition label
  • Probability homogeneous discrete-time Markov
    chain (DTMC)
  • Rate homogeneous continuous-time Markov chain
    (CTMC)
  • Time-dependent rate non-homogeneous CTMC
  • Distribution function semi-Markov process (SMP)

4
Modelers Options
  • Should I Use Markov Models?
  • State-Space-Based Methods
  • Model Dependencies
  • Model Fault-Tolerance and Recovery/Repair
  • Model Contention for Resources
  • Model Concurrency and Timeliness
  • Generalize to Markov Reward Models for Modeling
    Degradable Performance

5
Modelers Options
  • Should I Use Markov Models?
  • Generalize to Markov Regenerative Models for
    Allowing Generally Distributed Event Times
  • Generalize to Non-Homogeneous Markov Chains for
    Allowing Weibull Failure Distributions
  • Performance, Availability and Performability
    Modeling Possible
  • - Large (Exponential) State Space

6
In order to fulfil our goals
  • Modeling Performance, Availability and
    Performability
  • Modeling Complex Systems
  • We Need
  • Automatic Generation and Solution of Large Markov
    Reward Models

7
Model-based evaluation
  • Choice of the model type is dictated by
  • Measures of interest
  • Level of detailed system behavior to be
    represented
  • Ease of model specification and solution
  • Representation power of the model type
  • Access to suitable tools or toolkits

8
State space models
x i
s
s
A transition represents the change of state of a
single component
Z(t) is the stochastic process Pr Z(t) s is
the probability of finding Z(t) in state s at
time t.
Pr s ? s, ?t Pr Z(t ?t) s Z(t) s
9
State space models
x i
s
s
If s ? s represents a failure event
Pr s ? s, ?t Pr Z(t
?t) s Z(t) s ? i ?t
If s ? s represents a repair event
Pr s ? s, ?t Pr Z(t
?t) s Z(t) s ? i ?t
10
Markov Process definition
11
Transition Probability Matrix
initial
12
State Probability Vector
13
Chapman-Kolmogorov Equations
14
Time-homogeneous CTMC
15
Time-homogeneous CTMC
16
The transition rate matrix
17
C-K Equations for CTMC
18
Solution equations
19
Transient analysis
Given that the initial state of the Markov
chain, then the system of differential Equations
is written based on rate of buildup rate of
flow in - rate of flow out for each state
(continuity equation).
20
Steady-state condition
If the process reaches a steady state condition,
then
21
Steady-state analysis (balance equation)
The steady-state equation can be written as a
flow balance equation with a normalization
condition on the state probabilities. (rate of
buildup) rate of flow in - rate of flow
out rate of flow in rate of flow out for each
state (balance equation).
22
State Classification
23
2-component system
24
2-component system
25
2-component system
26
2-component series system
2-component parallel system
27
2-component stand-by system
28
Markov Models Repairable systems - Availability
29
Repairable system Availability
30
Repairable system 2 identical components
31
Repairable system 2 identical components
32
2-component Markov availability model
  • Assume we have a two-component parallel redundant
    system with repair rate ?.
  • Assume that the failure rate of both the
    components is ?.
  • When both the components have failed, the system
    is considered to have failed.

33
Markov availability model
  • Let the number of properly functioning components
    be the state of the system.
  • The state space is 0,1,2 where 0 is the system
    down state.
  • We wish to examine effects of shared vs.
    non-shared repair.

34
Markov availability model
2
1
0
Non-shared (independent) repair
2
1
0
Shared repair
35
Markov availability model
  • Note Non-shared case can be modeled solved
    using a RBD or a FTREE but shared case needs the
    use of Markov chains.

36
Steady-state balance equations
  • For any state
  • Rate of flow in Rate of flow out
  • Considering the shared case
  • ?i steady state probability that system is in
    state i

37
Steady-state balance equations
  • Hence
  • Since
  • We have
  • Or

38
Steady-state balance equations (Continued)
  • Steady-state Unavailability
  • For the Shared Case ?0 1 - Ashared
  • Similarly, for the Non-Shared Case,
  • Steady-state Unavailability 1 -
    Anon-shared
  • Downtime in minutes per year (1 - A) 876060

39
Steady-state balance equations
40
Absorbing states MTTF
41
Absorbing states - MTTF
42
Markov Reliability Model with Imperfect Coverage
43
Markov model with imperfect coverage
  • Next consider a modification of the 2-component
    parallel system proposed by Arnold as a model of
    duplex processors of an electronic switching
    system.
  • We assume that not all faults are recoverable and
    that c is the coverage factor which denotes the
    conditional probability that the system recovers
    given that a fault has occurred.
  • The state diagram is now given by the following
    picture

44
Now allow for Imperfect coverage
c
45
Markov modelwith imperfect coverage
  • Assume that the initial state is 2 so that
  • Then the system of differential equations are

46
Markov model with imperfect coverage
  • After solving the differential equations we
    obtain
  • R(t)P2(t) P1(t)
  • From R(t), we can obtain system MTTF
  • It should be clear that the system MTTF and
    system reliability are critically dependent on
    the coverage factor.

47
Source of fault coverage data
  • Measurement data from an operational system
  • Large amount of data needed
  • Improved instrumentation needed
  • Fault-injection experiments
  • Expensive but badly needed
  • Tools from CMU,Illinois, LAAS (Toulouse)
  • A fault/error handling submodel (FEHM)
  • Phases detection, location, retry, reconfig,
    reboot
  • Estimate duration and probability of success of
    each phase

48
Redundant System with Finite Detection Switchover
Time
  • Modify the Markov model with imperfect coverage
    to allow for finite time to detect as well as
    imperfect detection.
  • You will need to add an extra state, say D.
  • The rate at which detection occurs is ? .
  • Draw the state diagram and investigate the
    effects of detection delay on system reliability
    and mean time to failure.

49
Redundant System with Finite Detection Switchover
Time
  • Assumptions
  • Two units have the same MTTF and MTTR
  • Single shared repair person
  • Average detection/switchover time tsw1/?
  • We need to use a Markov model.

50
Redundant System with Finite Detection Switchover
Time
1D
2
1
0
51
Redundant System with Finite Detection Switchover
Time
  • After solving the Markov model, we obtain
    steady-state probabilities

52
Closed-form
53
  • WFS Example

54
A Workstations-Fileserver Example
  • Computing system consisting of
  • A file-server
  • Two workstations
  • Computing network connecting them
  • System operational as long as
  • One of the Workstations
  • and
  • The file-server are operational
  • Computer network is assumed to be fault-free

55
The WFS Example
56
Markov Chain for WFS Example
  • Assuming exponentially distributed times to
    failure
  • ?w failure rate of workstation
  • ?f failure rate of file-server
  • Assume that components are repairable
  • ?w repair rate of workstation
  • ?f repair rate of file-server
  • File-server has priority for repair over
    workstations (such repair priority cannot be
    captured by non-state-space models)

57
Markov Availability Model for WFS
Since all states are reachable from every other
states, the CTMC is irreducible. Furthermore, all
states are positive recurrent.
58
Markov Availability Model for WFS (Continued)
  • In the figure, the label (i,j) of each state
    is interpreted as follows
  • i represents the number of workstations that are
    still functioning
  • j is 1 or 0 depending on whether the file-server
    is up or down respectively.

59
Markov Availability Model for WFS (Continued)
  • For the example problem, with the states ordered
    as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q
    matrix is given by

Q
60
Markov Model (steady-state)
? Steady-state probability vector These are
called steady-state balance equations rate of
flow in rate of flow out after solving for
obtain Steady-state availability
61
Markov Availability Model
  • We compute the availability of the system
  • System is available as long as it is in states
  • (2,1) and (1,1).
  • Instantaneous availability of the system

62
Markov Availability Model (Continued)
63
Markov Reliability Model with Repair
  • Assume that the computer system does not recover
    if both workstations fail, or if the file-server
    fails

64
Markov Reliability Model with Repair
States (0,1), (1,0) and (2,0) become absorbing
states while (2,1) and (1,1) are transient
states. Note we have made a simplification that,
once the CTMC reaches a system failure state, we
do not allow any more transitions.
65
Markov Model with Absorbing States
  • If we solve for P2,1(t) and P1,1(t) then
  • R(t)P2,1(t) P1,1(t)
  • For a Markov chain with absorbing states
  • A the set of absorbing states
  • B ? - A the set of remaining states
  • zi,j Mean time spent in state i,j until
    absorption

66
Markov Model with Absorbing States (Continued)
QB derived from Q by restricting it to only
states in B
Mean time to absorption MTTA is given as
67
Markov Reliability Model with Repair (Continued)
68
Markov Reliability Model with Repair (Continued)
  • Mean time to failure is 19992 hours.

69
Markov Reliability Model without Repair
  • Assume that neither workstations nor file-server
    is repairable

70
Markov Reliability Model without Repair
(Continued)
States (0,1), (1,0) and (2,0) become absorbing
states
71
Markov Reliability Model without Repair
(Continued)

  • Mean time to failure is 9333 hours.
Write a Comment
User Comments (0)
About PowerShow.com