Hidden Markov Models - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Hidden Markov Models

Description:

We want a model that generates sequences in which the ... The expected umber of times that letter b appears in state k is given by. Hidden Markov Models ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 27

Provided by: EWL6

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models

1
Hidden Markov Models

Eine Einführung

2
Markov Chains
3
Markov Chains

We want a model that generates sequences in which
the probability of a symbol depends on the
previous symbol only.
Transition probabilities
Probability of a sequence
Note

4
Markov Chains

The key property of a Markov Chain is that the
probability of each symbol xi depends only on the
value of the preceeding symbol
Modelling the beginning and end of sequences

5
Markov Chains

Markov Chains can be used to discriminate between
two options by calculating a likelihood ratio
Example CpG Islands in human DANN
Regions labeled as CpG islands ? model
Regions labeled as non-CpG islands ? - model
Maximum Likelihood estimators for the transition
probabilities for each model
and analgously for the model. Cst is the
number of times letter t followed letter s in the
labelled region

6
Markov Chains

From 48 putative CpG islands of a human DNA one
estimates the following transition probabilities
Note that the tables are asymmetric

7
Markov Chains

To use the model for discrimination one
calculates the log-odds ratio

8
Hidden Markov Models

How can one find CpG islands in a long chain of
nucleotides?
Merge both models into one model with small
transition probabilities between the chains.
Within each chain the transition probabilities
should remain close to the original ones
Relabeling of the states
The states A, C, G, T emit the symbols A, C,
G, T
The relabeling is critical as there is no one to
one correspondence between the states and the
symbols. From looking at C in isolation one
cannot tell whether it was emitted from C or C-

9
Hidden Markov Models

Formal Definitions
Distinguish the sequence of states from the
sequence of symbols
Call the state sequence the path p. It follows a
simple Markov model
with transition probabilities
As the symbols b are decoupled from the states k
new parameters are needed giving the probability
that symbol b is seen when in state k
These are known as emission probabilities

10
Hidden Markov Models

The Viterbi Algorithm
It is the most common decoding algorithm with
HMMs
It is a dynamic programming algorithm
There may be many state sequences which give rise
to any particular sequence of symbols
But the corresponding probabilities are very
different
CpG islands
(C, G, C, G) (C-, G-, C-, G-) (C,
G-, C, G-)
They all generate the symbol sequence
CGCG
but the first has the highest probability

11
Hidden Markov Models

Search recursively for the most probable path
Suppose the probability vk(i) of the most
probable path ending in state k with observation
i is known for all states k
Then this probability can be calculated for state
xi1 by
with initial condition

12
Hidden Markov Models

Viterbi Algorithm
Initialisation (i0)
Rekursion (i1..L)
Termination
Traceback (i1.L)

13
Hidden Markov Models

CpG Islands and CGCG sequence

14
Hidden Markov Models

The Forward Algorithm
As many different paths p can give rise to the
same sequence,
the probability of a sequencey P(x) is
Brute force enumeration is not practical as the
number of paths rises exponentially with the
length of the sequence
A simple solution is to evaluate
at the most probable path only.

15
Hidden Markov Models

The full probability P(x) can be calculated in a
recursive way with dynamic programming.This is
called the forward algorithm.
Calculate the probability fk(i) of the observed
sequence up to and including xi under the
constraint that pi k
The recursion equation is

16
Hidden Markov Model

Forward Algorithm
Initialization (i0)
Recursion (i1..L)
Termination

17
Hidden Markov Model

The Backward Algorithm
What is the most probable state for an
observation xi ?
What is the probability P(pi k x) that
observation xi came from state k given the
observed sequence. This is the posterior
probability of state k at time i when the emitted
sequence is known.
First calculate the probability of producing the
entire observed sequence with the ith symbol
being produced by state k

18
Hidden Markov Model

The Backward Algorithm
Initialisation (iL)
Recursion (iL-1,..,1)
Termination

19
Hidden Markov Models

Posterior Probabilities
From the backward algorithm posterior
probabilities can be obtained
where P(x) is the result of the forward
algorithm.

20
Hidden Markov Model

Parameter Estimation for HMMs
Two problems remain
1) how to choose an appropriate model
architecture
2) how to assign the transition and emission
probabilities
Assumption Independent training sequences x1 .
xn are given
Consider the log likelihood
where ? represents the set of values of all
parameters (akl,el)

21
Hidden Markov Models

Estimation with known state sequence
Assume the paths are known for all training
sequences
Count the number Akl and Ek(b) of times each
particular transition or emission is used in the
set of training sequences plus pseudocounts rkl
and rk(b), respectively.
The Maximum Likelihood estimators for akl and
ek(b) are then given by

22
Hidden Markov Models

Estimation with unknown paths
Iterative procedures must be used to estimate the
parameters
All standard algorithms for optimization of
continuous functions can be used
One particular iteration method is standardly
used the Baum Welch algorithmus
-- first estimate the Akl and Ek(b) by
considering probable paths for the training
sequences using the current values of the akl and
ek(b)
-- second use the maximum likelihood estimators
to obtain new transition and emission parameters
-- iterate that process until a stopping
criterium is met
-- many local maxima exist particularly with
large HMMs

23
Hidden Markov Models

Baum Welch Algorithmus
It calculates the Akl and Ek(b) as the expected
number of times each transition or emission is
used in the training sequence
It uses the values of the forward and backward
algorithms
The probability that akl is used at position i in
sequence x is

24
Hidden Markov Models

Baum Welch Algorithm
The expected number of times akl is used can be
derived then by summing over all positions and
over all training sequences
The expected umber of times that letter b appears
in state k is given by

25
Hidden Markov Models

Baum Welch Algoritmus
Initialisation Pick arbitrary model parameters
Recurrence Set all A and E variables to their
pseudocount values r or to zero
For each sequence j1n
-- calculate fk(i) for sequence j using the
forward algorithm
-- calculate bk(i) for sequence j using the
backward algorithm
-- add the contribution of sequence j to A and E
-- calculate the new model parameters maximum
likelihood estimator
-- calculate the new log likelihood of the model
Termination stop if log likelihood change is
less than threshold

26
Hidden Markov Models

Baum Welch Algorithm
The Baum Welch algorithm is a special case of
an Expectation Maximization Algorithm
As an alternative Viterbi training can be used as
well. There the most probable paths are estimated
with the Viterbi algorithm. These are used in the
iterative re-estimation process.
Convergence is garanteed as the assignment of the
paths is a discrete process
Unlike Baum Welch this procedure does not
maximise the true likelihood P(x1..xn?)
regarded as a function of the model parameters ?
It finds the value of ? that maximizes the
contribution to the likelihood P(x1..xn?,p(x1),
.., p(xn)) from the most probable paths for all
sequences.