Hidden Markov Models - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Hidden Markov Models

Description:

We want a model that generates sequences in which the ... The expected umber of times that letter b appears in state k is given by. Hidden Markov Models ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 27
Provided by: EWL6
Category:
Tags: hidden | markov | models | umber

less

Transcript and Presenter's Notes

Title: Hidden Markov Models


1
Hidden Markov Models
  • Eine Einführung

2
Markov Chains
3
Markov Chains
  • We want a model that generates sequences in which
    the probability of a symbol depends on the
    previous symbol only.
  • Transition probabilities
  • Probability of a sequence
  • Note

4
Markov Chains
  • The key property of a Markov Chain is that the
    probability of each symbol xi depends only on the
    value of the preceeding symbol
  • Modelling the beginning and end of sequences

5
Markov Chains
  • Markov Chains can be used to discriminate between
    two options by calculating a likelihood ratio
  • Example CpG Islands in human DANN
  • Regions labeled as CpG islands ? model
  • Regions labeled as non-CpG islands ? - model
  • Maximum Likelihood estimators for the transition
    probabilities for each model
  • and analgously for the model. Cst is the
    number of times letter t followed letter s in the
    labelled region

6
Markov Chains
  • From 48 putative CpG islands of a human DNA one
    estimates the following transition probabilities
  • Note that the tables are asymmetric

7
Markov Chains
  • To use the model for discrimination one
    calculates the log-odds ratio

8
Hidden Markov Models
  • How can one find CpG islands in a long chain of
    nucleotides?
  • Merge both models into one model with small
    transition probabilities between the chains.
  • Within each chain the transition probabilities
    should remain close to the original ones
  • Relabeling of the states
  • The states A, C, G, T emit the symbols A, C,
    G, T
  • The relabeling is critical as there is no one to
    one correspondence between the states and the
    symbols. From looking at C in isolation one
    cannot tell whether it was emitted from C or C-

9
Hidden Markov Models
  • Formal Definitions
  • Distinguish the sequence of states from the
    sequence of symbols
  • Call the state sequence the path p. It follows a
    simple Markov model
  • with transition probabilities
  • As the symbols b are decoupled from the states k
    new parameters are needed giving the probability
    that symbol b is seen when in state k
  • These are known as emission probabilities

10
Hidden Markov Models
  • The Viterbi Algorithm
  • It is the most common decoding algorithm with
    HMMs
  • It is a dynamic programming algorithm
  • There may be many state sequences which give rise
    to any particular sequence of symbols
  • But the corresponding probabilities are very
    different
  • CpG islands
  • (C, G, C, G) (C-, G-, C-, G-) (C,
    G-, C, G-)
  • They all generate the symbol sequence
  • CGCG
  • but the first has the highest probability

11
Hidden Markov Models
  • Search recursively for the most probable path
  • Suppose the probability vk(i) of the most
    probable path ending in state k with observation
    i is known for all states k
  • Then this probability can be calculated for state
    xi1 by
  • with initial condition

12
Hidden Markov Models
  • Viterbi Algorithm
  • Initialisation (i0)
  • Rekursion (i1..L)
  • Termination
  • Traceback (i1.L)

13
Hidden Markov Models
  • CpG Islands and CGCG sequence

14
Hidden Markov Models
  • The Forward Algorithm
  • As many different paths p can give rise to the
    same sequence,
  • the probability of a sequencey P(x) is
  • Brute force enumeration is not practical as the
    number of paths rises exponentially with the
    length of the sequence
  • A simple solution is to evaluate
  • at the most probable path only.

15
Hidden Markov Models
  • The full probability P(x) can be calculated in a
    recursive way with dynamic programming.This is
    called the forward algorithm.
  • Calculate the probability fk(i) of the observed
    sequence up to and including xi under the
    constraint that pi k
  • The recursion equation is

16
Hidden Markov Model
  • Forward Algorithm
  • Initialization (i0)
  • Recursion (i1..L)
  • Termination

17
Hidden Markov Model
  • The Backward Algorithm
  • What is the most probable state for an
    observation xi ?
  • What is the probability P(pi k x) that
    observation xi came from state k given the
    observed sequence. This is the posterior
    probability of state k at time i when the emitted
    sequence is known.
  • First calculate the probability of producing the
    entire observed sequence with the ith symbol
    being produced by state k

18
Hidden Markov Model
  • The Backward Algorithm
  • Initialisation (iL)
  • Recursion (iL-1,..,1)
  • Termination

19
Hidden Markov Models
  • Posterior Probabilities
  • From the backward algorithm posterior
    probabilities can be obtained
  • where P(x) is the result of the forward
    algorithm.

20
Hidden Markov Model
  • Parameter Estimation for HMMs
  • Two problems remain
  • 1) how to choose an appropriate model
    architecture
  • 2) how to assign the transition and emission
    probabilities
  • Assumption Independent training sequences x1 .
    xn are given
  • Consider the log likelihood
  • where ? represents the set of values of all
    parameters (akl,el)

21
Hidden Markov Models
  • Estimation with known state sequence
  • Assume the paths are known for all training
    sequences
  • Count the number Akl and Ek(b) of times each
    particular transition or emission is used in the
    set of training sequences plus pseudocounts rkl
    and rk(b), respectively.
  • The Maximum Likelihood estimators for akl and
    ek(b) are then given by

22
Hidden Markov Models
  • Estimation with unknown paths
  • Iterative procedures must be used to estimate the
    parameters
  • All standard algorithms for optimization of
    continuous functions can be used
  • One particular iteration method is standardly
    used the Baum Welch algorithmus
  • -- first estimate the Akl and Ek(b) by
    considering probable paths for the training
    sequences using the current values of the akl and
    ek(b)
  • -- second use the maximum likelihood estimators
    to obtain new transition and emission parameters
  • -- iterate that process until a stopping
    criterium is met
  • -- many local maxima exist particularly with
    large HMMs

23
Hidden Markov Models
  • Baum Welch Algorithmus
  • It calculates the Akl and Ek(b) as the expected
    number of times each transition or emission is
    used in the training sequence
  • It uses the values of the forward and backward
    algorithms
  • The probability that akl is used at position i in
    sequence x is

24
Hidden Markov Models
  • Baum Welch Algorithm
  • The expected number of times akl is used can be
    derived then by summing over all positions and
    over all training sequences
  • The expected umber of times that letter b appears
    in state k is given by

25
Hidden Markov Models
  • Baum Welch Algoritmus
  • Initialisation Pick arbitrary model parameters
  • Recurrence Set all A and E variables to their
    pseudocount values r or to zero
  • For each sequence j1n
  • -- calculate fk(i) for sequence j using the
    forward algorithm
  • -- calculate bk(i) for sequence j using the
    backward algorithm
  • -- add the contribution of sequence j to A and E
  • -- calculate the new model parameters maximum
    likelihood estimator
  • -- calculate the new log likelihood of the model
  • Termination stop if log likelihood change is
    less than threshold

26
Hidden Markov Models
  • Baum Welch Algorithm
  • The Baum Welch algorithm is a special case of
    an Expectation Maximization Algorithm
  • As an alternative Viterbi training can be used as
    well. There the most probable paths are estimated
    with the Viterbi algorithm. These are used in the
    iterative re-estimation process.
  • Convergence is garanteed as the assignment of the
    paths is a discrete process
  • Unlike Baum Welch this procedure does not
    maximise the true likelihood P(x1..xn?)
    regarded as a function of the model parameters ?
  • It finds the value of ? that maximizes the
    contribution to the likelihood P(x1..xn?,p(x1),
    .., p(xn)) from the most probable paths for all
    sequences.
Write a Comment
User Comments (0)
About PowerShow.com