Dependability Theory and Methods 5. Markov Models

About This Presentation

Title:

Dependability Theory and Methods 5. Markov Models

Description:

Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Universit del Piemonte Orientale, A. Avogadro 15100 Alessandria ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 70

Provided by: Bobb149

Category:

more less

Transcript and Presenter's Notes

Title: Dependability Theory and Methods 5. Markov Models

1
Dependability Theory and Methods5. Markov Models

Andrea Bobbio
Dipartimento di Informatica
Università del Piemonte Orientale, A. Avogadro
15100 Alessandria (Italy)
bobbio_at_unipmn.it - http//www.mfn.unipmn.it/bob
bio

Bertinoro, March 10-14, 2003
2
State-Space-Based Models

States and labeled state transitions
State can keep track of
Number of functioning resources of each type
States of recovery for each failed resource
Number of tasks of each type waiting at each
resource
Allocation of resources to tasks
A transition
Can occur from any state to any other state
Can represent a simple or a compound event

3
State-Space-Based Models (Continued)

Transitions between states represent the change
of the system state due to the occurrence of an
event
Drawn as a directed graph
Transition label
Probability homogeneous discrete-time Markov
chain (DTMC)
Rate homogeneous continuous-time Markov chain
(CTMC)
Time-dependent rate non-homogeneous CTMC
Distribution function semi-Markov process (SMP)

4
Modelers Options

Should I Use Markov Models?
State-Space-Based Methods
Model Dependencies
Model Fault-Tolerance and Recovery/Repair
Model Contention for Resources
Model Concurrency and Timeliness
Generalize to Markov Reward Models for Modeling
Degradable Performance

5
Modelers Options

Should I Use Markov Models?
Generalize to Markov Regenerative Models for
Allowing Generally Distributed Event Times
Generalize to Non-Homogeneous Markov Chains for
Allowing Weibull Failure Distributions
Performance, Availability and Performability
Modeling Possible
- Large (Exponential) State Space

6
In order to fulfill our goals

Modeling Performance, Availability and
Performability
Modeling Complex Systems
We Need
Automatic Generation and Solution of Large Markov
Reward Models

7
Model-based evaluation

Choice of the model type is dictated by
Measures of interest
Level of detailed system behavior to be
represented
Ease of model specification and solution
Representation power of the model type
Access to suitable tools or toolkits

8
State space models
x i
s
s
A transition represents the change of state of a
single component
Z(t) is the stochastic process Pr Z(t) s is
the probability of finding Z(t) in state s at
time t.
Pr s ? s, ?t Pr Z(t ?t) s Z(t) s
9
State space models
x i
s
s
If s ? s represents a failure event
Pr s ? s, ?t Pr Z(t
?t) s Z(t) s ? i ?t
If s ? s represents a repair event
Pr s ? s, ?t Pr Z(t
?t) s Z(t) s ? i ?t
10
Markov Process definition
11
Transition Probability Matrix
initial
12
State Probability Vector
13
Chapman-Kolmogorov Equations
14
Time-homogeneous CTMC
15
Time-homogeneous CTMC
16
The transition rate matrix
17
C-K Equations for CTMC
18
Solution equations
19
Transient analysis
Given that the initial state of the Markov
chain, then the system of differential Equations
is written based on rate of buildup rate of
flow in - rate of flow out for each state
(continuity equation).
20
Steady-state condition
If the process reaches a steady state condition,
then
21
Steady-state analysis (balance equation)
The steady-state equation can be written as a
flow balance equation with a normalization
condition on the state probabilities. (rate of
buildup) rate of flow in - rate of flow
out rate of flow in rate of flow out for each
state (balance equation).
22
2-component system
23
2-component system
24
2-component system
25
2-component series system
2-component parallel system
26
2-component stand-by system
27
Repairable system Availability
28
Repairable system 2 identical components
29
Repairable system 2 identical components
30
2-component Markov availability model

Assume we have a two-component parallel redundant
system with repair rate ?.
Assume that the failure rate of both the
components is ?.
When both the components have failed, the system
is considered to have failed.

31
Markov availability model

Let the number of properly functioning components
be the state of the system.
The state space is 0,1,2 where 0 is the system
down state.
We wish to examine effects of shared vs.
non-shared repair.

32
Markov availability model
2
1
0
Non-shared (independent) repair
2
1
0
Shared repair
33
Markov availability model

Note Non-shared case can be modeled solved
using a RBD or a FTREE but shared case needs the
use of Markov chains.

34
Steady-state balance equations

For any state
Rate of flow in Rate of flow out
Considering the shared case
?i steady state probability that system is in
state i

35
Steady-state balance equations

Hence
Since
We have
Or

36
Steady-state balance equations (Continued)

Steady-state Unavailability
For the Shared Case ?0 1 - Ashared
Similarly, for the Non-Shared Case,
Steady-state Unavailability 1 -
Anon-shared
Downtime in minutes per year (1 - A) 876060

37
Steady-state balance equations
38
Absorbing states MTTF
39
Absorbing states - MTTF
40
Markov Reliability Model with Imperfect Coverage
41
Markov model with imperfect coverage

Next consider a modification of the 2-component
parallel system proposed by Arnold as a model of
duplex processors of an electronic switching
system.
We assume that not all faults are recoverable and
that c is the coverage factor which denotes the
conditional probability that the system recovers
given that a fault has occurred.
The state diagram is now given by the following
picture

42
Now allow for Imperfect coverage
c
43
Markov modelwith imperfect coverage

Assume that the initial state is 2 so that
Then the system of differential equations are

44
Markov model with imperfect coverage

After solving the differential equations we
obtain
R(t)P2(t) P1(t)
From R(t), we can obtain system MTTF
It should be clear that the system MTTF and
system reliability are critically dependent on
the coverage factor.

45
Source of fault coverage data

Measurement data from an operational system
Large amount of data needed
Improved instrumentation needed
Fault-injection experiments
Expensive but badly needed
Tools from CMU,Illinois, LAAS (Toulouse)
A fault/error handling submodel (FEHM)
Phases detection, location, retry, reconfig,
reboot
Estimate duration and probability of success of
each phase

46
Redundant System with Finite Detection Switchover
Time

Modify the Markov model with imperfect coverage
to allow for finite time to detect as well as
imperfect detection.
You will need to add an extra state, say D.
The rate at which detection occurs is ? .
Draw the state diagram and investigate the
effects of detection delay on system reliability
and mean time to failure.

47
Redundant System with Finite Detection Switchover
Time

Assumptions
Two units have the same MTTF and MTTR
Single shared repair person
Average detection/switchover time tsw1/?
We need to use a Markov model.

48
Redundant System with Finite Detection Switchover
Time
1D
2
1
0
49
Redundant System with Finite Detection Switchover
Time

After solving the Markov model, we obtain
steady-state probabilities

50
Closed-form
51

WFS Example

52
A Workstations-Fileserver Example

Computing system consisting of
A file-server
Two workstations
Computing network connecting them
System operational as long as
One of the Workstations
and
The file-server are operational
Computer network is assumed to be fault-free

53
The WFS Example
54
Markov Chain for WFS Example

Assuming exponentially distributed times to
failure
?w failure rate of workstation
?f failure rate of file-server
Assume that components are repairable
?w repair rate of workstation
?f repair rate of file-server
File-server has priority for repair over
workstations (such repair priority cannot be
captured by non-state-space models)

55
Markov Availability Model for WFS
Since all states are reachable from every other
states, the CTMC is irreducible. Furthermore, all
states are positive recurrent.
56
Markov Availability Model for WFS (Continued)

In the figure, the label (i,j) of each state
is interpreted as follows
i represents the number of workstations that are
still functioning
j is 1 or 0 depending on whether the file-server
is up or down respectively.

57
Markov Availability Model for WFS (Continued)

For the example problem, with the states ordered
as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q
matrix is given by

Q
58
Markov Model (steady-state)
? Steady-state probability vector These are
called steady-state balance equations rate of
flow in rate of flow out after solving for
obtain Steady-state availability
59
Markov Availability Model

We compute the availability of the system
System is available as long as it is in states
(2,1) and (1,1).
Instantaneous availability of the system

60
Markov Availability Model (Continued)
61
Markov Reliability Model with Repair

Assume that the computer system does not recover
if both workstations fail, or if the file-server
fails

62
Markov Reliability Model with Repair
States (0,1), (1,0) and (2,0) become absorbing
states while (2,1) and (1,1) are transient
states. Note we have made a simplification that,
once the CTMC reaches a system failure state, we
do not allow any more transitions.
63
Markov Model with Absorbing States

If we solve for P2,1(t) and P1,1(t) then
R(t)P2,1(t) P1,1(t)
For a Markov chain with absorbing states
A the set of absorbing states
B ? - A the set of remaining states
zi,j Mean time spent in state i,j until
absorption

64
Markov Model with Absorbing States (Continued)
QB derived from Q by restricting it to only
states in B
Mean time to absorption MTTA is given as
65
Markov Reliability Model with Repair (Continued)
66
Markov Reliability Model with Repair (Continued)

Mean time to failure is 19992 hours.

67
Markov Reliability Model without Repair

Assume that neither workstations nor file-server
is repairable

68
Markov Reliability Model without Repair
(Continued)
States (0,1), (1,0) and (2,0) become absorbing
states
69
Markov Reliability Model without Repair
(Continued)

Mean time to failure is 9333 hours.

Write a Comment

User Comments (0)

About PowerShow.com

Dependability Theory and Methods 5. Markov Models - PowerPoint PPT Presentation

Dependability Theory and Methods 5. Markov Models

Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Universit del Piemonte Orientale, A. Avogadro 15100 Alessandria ... – PowerPoint PPT presentation