Title: Continuous Time Markov Chains
1Chapter 8
- Continuous Time Markov Chains
2Definition
- A discrete-state continuous-time stochastic
process is called a Markov
chain if - for t0 lt t1 lt t2 lt . lt tn lt t , the
conditional pmf satisfies the relation -
- A CTMC is characterized by state changes that can
occur at any arbitrary time - Index space is continuous.
- The state space is discrete valued.
3Continuous Time Markov Chain (CTMC)
- A CTMC can be completely described by
- Initial state probability vector for X(t0)
- Transition probabilities.
- Also,
4Homogenous CTMCs
- is a time-homogenous CTMC iff
- Or, the conditional pmf satisfies
- A CTMC is said to be irreducible if every state
can be reached from every other state, with a
non-zero probability. - A state is said to be absorbing if no other state
can be reached from it with non-zero probability.
5CTMC Chapman-Kolmogorov Equation
- It can also be written as
- In the matrix form, (Matrix Q is called the
infinitesimal generator matrix (or
simply Generator Matrix)
6CTMC Steady-state Solution
- Steady state solution of CTMC
- Irreducible CTMCs having ve steady-state pj
values are called recurrent non-null. - Performance measures may be computed by assigning
reward rates to states and computing expected
steady state reward rates - Accumulated reward (over an interval of time)
-
7Continuous Time Birth-Death Process
- The CTMC and i0,1,2, forms a
B-D process, if ?i, i0,1,2,.. and µi,
i1,2,.. exists, and ?i Birth rate (gt 0) and
µi Death rate (gt 0)
8Continuous Time Birth-Death Process (contd.)
In Steady-state,
9Steady State Equations
These are called balance eqs. Re-arranging
above,
0
10M/M/1 Queue
- Arrivals follow Poisson distribution, i.e.,
inter-arrival times are all i.i.d, EXP(?). - Inter-departure times are i.i.d, EXP(µ).
- N(t) birth-death proc., ?k? µkµ.
- Define, ??/µ (traffic intensity, in Erlangs)
Poisson arrival Process with rate ?
11M/M/1 queue (contd.)
- From the balance flow equations, we get
- ? lt 1 (for reasons of stability).
- Expected of customers,
12M/M/1 queue (contd.)
- This measure can be viewed as a weighted average,
. - By choosing suitable weights to the states of a
CTMC, we can get most measures of interest and
the resulting model is known as the MRM(Markov
Reward Model). - Other measures
- Average queue length (En)
- Average (expected) response time
- Average (expected) wait time etc.
13M/M/1 queue Littles formula
- Let the random variable R denote the response
time - (defined as the time elapsed from the instant
of job arrival until its completion) - Littles law states
- ER EN/?
- Here
- Response time (R) wait time (W) service time
(S) - EW ER ES 1/µ(1-?) - 1/ µ .
14Response time distribution (tagged job approach)
- Assuming FCFS and steady-state conditions
- If there are already n jobs in the system, the
next job (N1)st will experience a response time
R SS1S2..SN - S service time for the (N1)st job S1
residual service time for job currently
undergoing service (1). - Because of the memory-less property, these times
are EXP( ). - Hence, for some Nn, the LST of R is,
- Therefore,
15M/M/m queue
- m-servers service the queue.
-
µ
Poisson arrivals (?)
16M/M/m Queue Solution
17M/M/m Queue performance measures
- Average queue length EN rk k
18M/M/m Queue performance measures
- Server utilization rv M - number of busy
servers. For number of customers 0 lt k lt m,
the number of busy servers k. Beyond that the
number of busy servers m. - A customer may have to join the queue.
19Poisson stream behavior
- M/M/m input/output both form Poisson streams.
- m2 case
- Case 1 Two independent queues
- Case 2 M/M/2 case
Two separate Poisson streams
? 2 separate M/M/1 queues
Two separate Poisson streams
Combined Poisson steams
20Comparative performance
- Case 1 For each M/M/1 queue,
- Case 2 Common queue M/M/2
-
21M/M/1/n Queue
- Finite queue size, finite buffer space ? finite
state space.
22M/M/1/n Queue Performance Measures
- Mean queue length (expected of jobs in the
system). - rk k,
- Loss probability
- rn 1, rk 0, k0,1,..,n-1
- Throughput
- rk m , k1,2, ..,n r0 0 (or, rk l ,
k0,1,2, ..,n-1 rn 0)
23M/M/1/n Response time distribution
- Response time distribution Job may be rejected
(or accepted) - Unconditional
- Conditional (conditioned on the job being
accepted) - Reward assignment for the kth state, response
time experienced by the tagged task is sum of
k-service times, each of which is EXP(µ), i.e.,
k-stage Erlang. - Unconditional
- Conditional
24Special cases of Birth-Death Process
- Pure birth processes
- Poisson process
- Software Reliability Growth Model NHPP
- Number of software failures occurring in (0, t
is N(t), and N(t) is Poisson with, ?(t) abe-bt
and m(t) EN(t) a(1- e-bt) - Instantaneous failure intensity, ?(t)
ba-m(t) - Transient solution may be found using Laplace
transforms - Pure death processes
- No-repairs
25Markov Availability Model
262-State Markov Availability Model
- 1) Steady-state balance equations for each state
- Rate of flow IN rate of flow OUT
- State1
- State0
-
- 2 unknowns, 2 equations, but there is only one
independent equation.
272-State Markov Availability Model(Continued)
- Need an additional equation
Downtime in minutes per year
876060
282-State Markov Availability Model(Continued)
- 2) Transient Availability
- for each state
- Rate of buildup rate of flow IN - rate of flow
OUT -
- This equation can be solved to obtain assuming
P1(0)1
292-State Markov Availability Model(Continued)
- 3)
- 4) Steady State Availability
30Using SHARPE to Solve the models
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48Markov availability model
- Assume we have a two-component parallel redundant
system with repair rate ?. - Assume that the failure rate of both the
components is ?. - When both the components have failed, the system
is considered to have failed.
49Markov availability model (Continued)
- Let the number of properly functioning components
be the state of the system. The state space is
0,1,2 where 0 is the system down state. - We wish to examine effects of shared vs.
non-shared repair.
50Markov availability model (Continued)
2
1
0
Non-shared (independent) repair
2
1
0
Shared repair
51Markov availability model (Continued)
- Note Non-shared case can be modeled solved
using a RBD or a FTREE but shared case needs the
use of Markov chains.
52Steady-state balance equations
- For any state
- Rate of flow in Rate of flow out
- Consider the shared case
- ?i steady state probability that system is in
state i
53Steady-state balance equations (Continued)
54Steady-state balance equations (Continued)
- Steady-state unavailability ?0 1 - Ashared
- Similarly for non-shared case,
- steady-state unavailability 1 - Anon-shared
- Downtime in minutes per year (1 - A) 876060
55Steady-state balance equations
56Homework
- Return to the 2 control and 3 voice channels
example and assume that the control channel
failure rate is ?c, voice channel failure rate is
?v. - Repair rates are ?c and ?v, respectively.
Assuming a single shared repair facility and
control channel having preemptive repair priority
over voice channels, draw the state diagram of a
Markov availability model. Using SHARPE GUI,
solve the Markov chain for steady-state and
instantaneous availability.
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64Markov Reliability Model
65Markov reliability model with repair
- Consider the 2-component parallel system but
disallow repair from system down state - Note that state 0 is now an absorbing state. The
state diagram is given in the following figure. - This reliability model with repair cannot be
modeled using a reliability block diagram or a
fault tree. We need to resort to Markov chains.
(This is a form of dependency since in order to
repair a component you need to know the status of
the other component).
66Markov reliability model with repair (Continued)
Absorbing state
- Markov chain has an absorbing state. In the
steady-state, system will be in state 0 with
probability 1. Hence transient analysis is of
interest. States 1 and 2 are transient states.
67Markov reliability model with repair (Continued)
- Assume that the initial state of the Markov chain
- is 2, that is, P2(0) 1, Pk (0) 0 for k 0,
1. - Then the system of differential Equations is
written - based on
- rate of buildup rate of flow in - rate of flow
out - for each state
68Markov reliability model with repair
(Continued)
69Markov reliability model with repair
(Continued)
- After solving these equations, we get
- R(t) P2(t) P1(t)
- Recalling that
, we get
70Markov reliability model with repair
(Continued)
- Note that the MTTF of the two component
parallel redundant system, in the absence - of a repair facility (i.e., ? 0), would
have - been equal to the first term,
- 3 / ( 2? ), in the above expression.
- Therefore, the effect of a repair facility is
to - increase the mean life by ? / (2?2), or by a
- factor
71Markov Reliability Model with Imperfect Coverage
72Markov model with imperfect coverage
- Next consider a modification of the above
- example proposed by Arnold as a model of
- duplex processors of an electronic
- switching system. We assume that not all
- faults are recoverable and that c is the
- coverage factor which denotes the
- conditional probability that the system
- recovers given that a fault has occurred.
- The state diagram is now given by the
- following picture
73Now allow for Imperfect coverage
c
74Markov modelwith imperfect coverage (Continued)
- Assume that the initial state is 2 so that
- Then the system of differential equations are
75Markov model with imperfect coverage (Continued)
- After solving the differential equations we
obtain - R(t)P2(t) P1(t)
- From R(t), we can system MTTF
- It should be clear that the system MTTF and
system reliability are - critically dependent on the coverage factor.
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
822-component Availability model with detection
delay
- 2-component availability model
- Steady state availability Ass 1-p0
- Failures detection stage takes random time,
EXP(d) - Down states are 0 and 1D ? Ass 1- p0- p1D
- Therefore, steady state unavailability U(d) is
given by
832-component availability model with finite
coverage
- Coverage factor c (probability that the fault
is covered) - 1C state is a re-boot (down) state.
842-components availability model delayfinite
coverage
- Model has detection delaycoverage factor
- Down states are 0, 1C and 1D.
85Preventive Maintenance example
- Prolonged usage of a component may lead to
increased failure rate (i.e. IFR situation) - Hence, life time may be modeled as HypoEXP()
distribution, say 2-stage Hypo. - Component is inspected randomly. Time between
inspections is a random, following EXP(?i).
Inspection completion time is EXP(µi). - What does inspection do?
- First stage of life no action
- Second stage of life repair
- That is, preventive maintenance
- State ltstage, faultygt
86Performance Models
- Example 2-servers with different service times.
- State ltn1, n2gt
- Performance Average no. of jobs in the system,
En1n2 - Reward rate rn1, n2 n1n2
- Except for the lt0,0gt, in all other states, viz.,
ltk,0gt and ltk,1gt, there are k jobs in the system.
87SOURCES OF COVERAGE DATA
- Measurement Data from an Operational system
Large amount of data needed - Improved Instrumentation Needed
- Fault/Error Injection Experiments
- Costly yet badly needed tools from
- CMU, Illinois, Toulouse
88SOURCES OF COVERAGE DATA (Continued)
- A Fault/Error Handling Submodel
- Phases of FEHM
- Detection, Location, Retry, Reconfig, Reboot
- Estimate Duration Prob. of success of each
phase - IBM(EDFI), HARP(FEHM), Draper(FDIR)
89Homework 6
- Modify the Markov model with imperfect
coverage to allow for finite time to detect as
well as imperfect detection. You will need to add
an extra state, say D. The rate at which
detection occurs is ? . Draw the state diagram
and using SHARPE GUI investigate the effects of
detection delay on system reliability and mean
time to failure.
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)