MARKOV MODELS - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

MARKOV MODELS

Description:

... 'nonreturning random walk': nonreturning = the walkers are not going back to the ... Markov chain is a stochastic process with the memory less (Markov) property ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 19

Provided by: mame5

Category:

more less

Transcript and Presenter's Notes

Title: MARKOV MODELS

1
Section 8

MARKOV MODELS
Prepared and presented by Saman Halgamuge

2
Markov Chains An Introduction

Eg nonreturning random walk nonreturning
the walkers are not going back to the location
just previously visited
Markov chain is a stochastic process with the
memory less (Markov) property meaning that the
description of the present state fully captures
all the information about the future evolution of
the process
A Markov chain is a triplet (i.e. characterized
by 3 parameters)
Q is a set of states, each state emits a symbol
in alphabet ?.
p is the probability of initial state being s for
each s ? Q.
A is the state transition probabilities, ast for
each s, t ? Q.
For each s, t ? Q the transition probability is
For a random process X (x1, x2, , xL), a
Markov chain has the memory less property, the
variable xi depends only on the previous value
(xi-1) and not on the history of the process.

3
Markov Chains

For the sequence X (x1, x2, , xL), the
probability of the sequence is
Using the memory less property of Markov chains,
we get
where p(x1) is the probability of starting in a
particular state.
Add begin and end states with the corresponding
symbols x0 and xL1. Define p(s) as the initial
probability of symbol s,

4
Markov Chain to represent a DNA Sequence

The probability of the sequence becomes,

Arrows represent transition probabilities.
Each state emits the corresponding symbol, i.e.
there is one to one correspondence between
symbols and states.

A Markov Chain for modeling a DNA sequence
Example AAACCCCTTTTGGG Construct the Markov
Chain to represent above sequence
5
Using Markov ChainsCpG Islands

In DNA, the nucleotide sequence CG (abbr. CpG) is
typically modified by a process called
methylation, mutating C into T.
Consequently, CpG dinucleotides are relatively
rarer in the human and most other genomes.
For biologically important reasons, this mutation
is suppressed in short stretches of DNA (few
hundred nucleotides long) around the promoter
(start) site of genes. In these regions CpG is
more frequent.
These regions are called CpG islands. The "p" in
CpG notation refers to the phosphodiester bond
between the cytidine and the Guanosine.

6
Using Markov ChainsCpG Islands

Questions
Given a short sequence of DNA, how to decide if
it comes from a CpG island or not?
Given a long genome, how to locate CpG islands in
it?
Two Markov chain models can be used to solve the
problem.
The model represents sequences with frequent
CpG islands.
The - model represents sequences with rarer CpG
islands.

7
Identifying CpG Islands

Let, be the transition probabilities in the
model and in the - model.
These probabilities have been calculated for some
known CpG islands and non-CpG regions.

8
Identifying CpG Islands

For a given sequence X of length L, we can now
calculate the probability of the sequence using
the equation .
However, for computational accuracy, we calculate
the log-odds ratio as follows.
It is customary to use logarithmic base 2 when
calculating log-odds ratios (the answer is in
bits).

The histogram of scores for given sequences. The
CpG islands (black) clearly stand out from
non-CpG islands (gray).
9
Locating CpG Islands in a Genome

To solve this problem, we need to combine the two
Markov chains considered earlier into one unified
model.

Add a small probability of switching from one
chain to the other at each state transition event
(shown by arrows).
There are 2 states corresponding to each
nucleotide symbol, so a symbol emitted does not
reveal the internal state.

We have 8 states emitting only 4 symbols ? Need
to introduce emission probabilities in addition
to transition probabilities.
This is a hidden Markov model (HMM).

10
Hidden Markov Models

Hidden Markov model (HMM) is a stochastic process
with an underlying stochastic state transition
process that is not observable (hidden). The
underlying process can only be inferred through a
set of symbols emitted sequentially by the
stochastic process.
Example Dishonest Casino dealer States
(hidden) F or L
The set of symbols emitted
1,..,6

11
Hidden Markov Model

HMM is a triplet M (?, Q, ?) where,
? is an alphabet of symbols.
Q is a set of states capable of emitting symbols
from the alphabet ?.
? is a set of probabilities comprising of,
State transition probabilities, akl for each k, l
? Q.
Emission probabilities, ek(b) for each k ? Q and
b ? ?.
A path ? (?1,, ?L) is a sequence of states
with the corresponding symbol sequence X (x1,
, xL).
The path itself follows a Markov chain (i.e.
memory less).
There is no one-to-one correspondence between the
states and the symbols.

12
Hidden Markov Model

State transition probabilities
Emission probabilities
The probability that the sequence X was
generated by the model M given the path ? is
where ?0 begin state and ?L1 end state.

13
HMM for Detecting CpG Islands in Genome

The HMM model consists of 8 states and 4 symbols.
States A C G T A- C- G- T-
Emitted symbols A C G T A C G T
Probability of staying in a CpG island p
Probability of staying outside a CpG island q
Emission probability of symbol A while in state
A or A- 1.0,
Emission probability of symbol B while in state
B or B- 1.0, etc.
All other emission probabilities are zero. (eg.
eA(B) 0.0)
Transition probabilities can be derived from the
two tables considered earlier.

14
HMM for Detecting CpG Islands in Genome

Transition probabilities of the HMM.

15
Example HMM for Modeling Dishonest Casino

A casino dealer uses a fair die most of the time,
but occasionally switches to a loaded die.
Assume,
With the loaded die probability of a six 0.5,
all other numbers have probability of 0.1
Probability of switching from fair to loaded die
0.05 at each roll.
Probability of switching from loaded to fair die
0.1 at each roll.
Switching between dice is a Markov process.
In each state of the Markov process, the outcomes
have different probabilities.
The whole process is a HMM.

16
Example Dishonest Casino

There are two possible states Fair and Loaded Q
F, L.
There are six possible outcomes ? 1, 2, 3, 4,
5, 6.
The transition probabilities are shown by arrows.
The emission probabilities are shown inside each
state box.

17
Decoding Problem Most Probable State Path

Given the HMM M (?, Q, ?) and a sequence of
symbols X ? ?, for which the generating path ?
(?1,, ?L) is unknown,
In general, there could be many state sequences ?
that could give rise to the particular sequence
of symbols X.
Find the most probable generating path ? for X,
i.e. a path such that p (X, ?) is maximized.

18
Most Probable State Path