MARKOV MODELS - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

MARKOV MODELS

Description:

... 'nonreturning random walk': nonreturning = the walkers are not going back to the ... Markov chain is a stochastic process with the memory less (Markov) property ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 19
Provided by: mame5
Category:
Tags: markov | models | walkers

less

Transcript and Presenter's Notes

Title: MARKOV MODELS


1
Section 8
  • MARKOV MODELS
  • Prepared and presented by Saman Halgamuge

2
Markov Chains An Introduction
  • Eg nonreturning random walk nonreturning
    the walkers are not going back to the location
    just previously visited
  • Markov chain is a stochastic process with the
    memory less (Markov) property meaning that the
    description of the present state fully captures
    all the information about the future evolution of
    the process
  • A Markov chain is a triplet (i.e. characterized
    by 3 parameters)
  • Q is a set of states, each state emits a symbol
    in alphabet ?.
  • p is the probability of initial state being s for
    each s ? Q.
  • A is the state transition probabilities, ast for
    each s, t ? Q.
  • For each s, t ? Q the transition probability is
  • For a random process X (x1, x2, , xL), a
    Markov chain has the memory less property, the
    variable xi depends only on the previous value
    (xi-1) and not on the history of the process.

3
Markov Chains
  • For the sequence X (x1, x2, , xL), the
    probability of the sequence is
  • Using the memory less property of Markov chains,
    we get
  • where p(x1) is the probability of starting in a
    particular state.
  • Add begin and end states with the corresponding
    symbols x0 and xL1. Define p(s) as the initial
    probability of symbol s,

4
Markov Chain to represent a DNA Sequence
  • The probability of the sequence becomes,
  • Arrows represent transition probabilities.
  • Each state emits the corresponding symbol, i.e.
    there is one to one correspondence between
    symbols and states.

A Markov Chain for modeling a DNA sequence
Example AAACCCCTTTTGGG Construct the Markov
Chain to represent above sequence
5
Using Markov ChainsCpG Islands
  • In DNA, the nucleotide sequence CG (abbr. CpG) is
    typically modified by a process called
    methylation, mutating C into T.
  • Consequently, CpG dinucleotides are relatively
    rarer in the human and most other genomes.
  • For biologically important reasons, this mutation
    is suppressed in short stretches of DNA (few
    hundred nucleotides long) around the promoter
    (start) site of genes. In these regions CpG is
    more frequent.
  • These regions are called CpG islands. The "p" in
    CpG notation refers to the phosphodiester bond
    between the cytidine and the Guanosine.

6
Using Markov ChainsCpG Islands
  • Questions
  • Given a short sequence of DNA, how to decide if
    it comes from a CpG island or not?
  • Given a long genome, how to locate CpG islands in
    it?
  • Two Markov chain models can be used to solve the
    problem.
  • The model represents sequences with frequent
    CpG islands.
  • The - model represents sequences with rarer CpG
    islands.

7
Identifying CpG Islands
  • Let, be the transition probabilities in the
    model and in the - model.
  • These probabilities have been calculated for some
    known CpG islands and non-CpG regions.

8
Identifying CpG Islands
  • For a given sequence X of length L, we can now
    calculate the probability of the sequence using
    the equation .
  • However, for computational accuracy, we calculate
    the log-odds ratio as follows.
  • It is customary to use logarithmic base 2 when
    calculating log-odds ratios (the answer is in
    bits).

The histogram of scores for given sequences. The
CpG islands (black) clearly stand out from
non-CpG islands (gray).
9
Locating CpG Islands in a Genome
  • To solve this problem, we need to combine the two
    Markov chains considered earlier into one unified
    model.
  • Add a small probability of switching from one
    chain to the other at each state transition event
    (shown by arrows).
  • There are 2 states corresponding to each
    nucleotide symbol, so a symbol emitted does not
    reveal the internal state.
  • We have 8 states emitting only 4 symbols ? Need
    to introduce emission probabilities in addition
    to transition probabilities.
  • This is a hidden Markov model (HMM).

10
Hidden Markov Models
  • Hidden Markov model (HMM) is a stochastic process
    with an underlying stochastic state transition
    process that is not observable (hidden). The
    underlying process can only be inferred through a
    set of symbols emitted sequentially by the
    stochastic process.
  • Example Dishonest Casino dealer States
    (hidden) F or L
  • The set of symbols emitted
    1,..,6

11
Hidden Markov Model
  • HMM is a triplet M (?, Q, ?) where,
  • ? is an alphabet of symbols.
  • Q is a set of states capable of emitting symbols
    from the alphabet ?.
  • ? is a set of probabilities comprising of,
  • State transition probabilities, akl for each k, l
    ? Q.
  • Emission probabilities, ek(b) for each k ? Q and
    b ? ?.
  • A path ? (?1,, ?L) is a sequence of states
    with the corresponding symbol sequence X (x1,
    , xL).
  • The path itself follows a Markov chain (i.e.
    memory less).
  • There is no one-to-one correspondence between the
    states and the symbols.

12
Hidden Markov Model
  • State transition probabilities
  • Emission probabilities
  • The probability that the sequence X was
    generated by the model M given the path ? is
  • where ?0 begin state and ?L1 end state.

13
HMM for Detecting CpG Islands in Genome
  • The HMM model consists of 8 states and 4 symbols.
  • States A C G T A- C- G- T-
  • Emitted symbols A C G T A C G T
  • Probability of staying in a CpG island p
  • Probability of staying outside a CpG island q
  • Emission probability of symbol A while in state
    A or A- 1.0,
  • Emission probability of symbol B while in state
    B or B- 1.0, etc.
  • All other emission probabilities are zero. (eg.
    eA(B) 0.0)
  • Transition probabilities can be derived from the
    two tables considered earlier.

14
HMM for Detecting CpG Islands in Genome
  • Transition probabilities of the HMM.

15
Example HMM for Modeling Dishonest Casino
  • A casino dealer uses a fair die most of the time,
    but occasionally switches to a loaded die.
    Assume,
  • With the loaded die probability of a six 0.5,
    all other numbers have probability of 0.1
  • Probability of switching from fair to loaded die
    0.05 at each roll.
  • Probability of switching from loaded to fair die
    0.1 at each roll.
  • Switching between dice is a Markov process.
  • In each state of the Markov process, the outcomes
    have different probabilities.
  • The whole process is a HMM.

16
Example Dishonest Casino
  • There are two possible states Fair and Loaded Q
    F, L.
  • There are six possible outcomes ? 1, 2, 3, 4,
    5, 6.
  • The transition probabilities are shown by arrows.
  • The emission probabilities are shown inside each
    state box.

17
Decoding Problem Most Probable State Path
  • Given the HMM M (?, Q, ?) and a sequence of
    symbols X ? ?, for which the generating path ?
    (?1,, ?L) is unknown,
  • In general, there could be many state sequences ?
    that could give rise to the particular sequence
    of symbols X.
  • Find the most probable generating path ? for X,
    i.e. a path such that p (X, ?) is maximized.

18
Most Probable State Path
  • The solution ? will reveal the hidden states
    that generated the sequence X.
  • CpG island case
  • All parts of ? that pass through states are
    CpG islands.
  • Dishonest casino case
  • All parts of ? that pass through state L are
    suspected rolls of the loaded die.
  • A solution for the most probable path is given by
    the Viterbi algorithm.
Write a Comment
User Comments (0)
About PowerShow.com