Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Hidden Markov Models

Description:

Title: Hidden Markov Models Author: Laverty Last modified by: Laverty Created Date: 10/30/2001 3:36:44 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 80
Provided by: lav61
Category:
Tags: hidden | markov | models

less

Transcript and Presenter's Notes

Title: Hidden Markov Models


1
Hidden Markov Models
2
A Hidden Markov Model consists of
  1. A sequence of states Xtt ? T X1, X2, ... ,
    XT , and
  2. A sequence of observations Yt t ? T Y1, Y2,
    ... , YT

3
  • The sequence of states X1, X2, ... , XT form a
    Markov chain moving amongst the M states 1, 2,
    , M.
  • The observation Yt comes from a distribution that
    is determined by the current state of the process
    Xt. (or possibly past observations and past
    states).
  • The states, X1, X2, ... , XT, are unobserved
    (hence hidden).

4
Y3
Y1
Y2
YT

X3
X1
X2
XT
The Hidden Markov Model
5
  • Some basic problems
  • from the observations Y1, Y2, ... , YT
  • 1. Determine the sequence of states X1, X2,
    ... , XT.
  • 2. Determine (or estimate) the parameters of
    the stochastic process that is generating the
    states and the observations.

6
Examples
7
Example 1
  • A person is rolling two sets of dice (one is
    balanced, the other is unbalanced). He switches
    between the two sets of dice using a Markov
    transition matrix.
  • The states are the dice.
  • The observations are the numbers rolled each
    time.

8
Balanced Dice
9
Unbalanced Dice
10
Example 2
  • The Markov chain is two state.
  • The observations (given the states) are
    independent Normal.
  • Both mean and variance dependent on state.
  • HMM AR.xls

11
Example 3 Dow Jones
12
Daily Changes Dow Jones
13
Hidden Markov Model??
14
Bear and Bull Market?
15
Speech Recognition
  • When a word is spoken the vocalization process
    goes through a sequence of states.
  • The sound produced is relatively constant when
    the process remains in the same state.
  • Recognizing the sequence of states and the
    duration of each state allows one to recognize
    the word being spoken.

16
  • The interval of time when the word is spoken is
    broken into small (possibly overlapping)
    subintervals.
  • In each subinterval one measures the amplitudes
    of various frequencies in the sound. (Using
    Fourier analysis). The vector of amplitudes Yt is
    assumed to have a multivariate normal
    distribution in each state with the mean vector
    and covariance matrix being state dependent.

17
Hidden Markov Models for Biological Sequence
  • Consider the Motif
  • ATCGACACGTATGGC
  • Some realizations
  • A C A - - - A T G
  • T C A A C T A T C
  • A C A C - - A G C
  • A G A - - - A T C
  • A C C G - - A T C

18
Hidden Markov model of the same motif
ATCGACACGTATGGC
19
Profile HMMs
20
Computing Likelihood
  • Let pij PXt1 jXt i and P (pij) the
    M?M transition matrix.
  • Let PX1 i and
  • the initial distribution over the states.

21
  • Now assume that
  • PYt yt X1 i1, X2 i2, ... , Xt it
  • PYt yt Xt it p(yt )
  • Then
  • PX1 i1,X2 i2..,XT iT, Y1 y1, Y2 y2,
    ... , YT yT
  • PX i, Y y

22
  • Therefore
  • PY1 y1, Y2 y2, ... , YT yT
  • PY y

23
  • In the case when Y1, Y2, ... , YT are continuous
    random variables or continuous random vectors,
    Let f(y ) denote the conditional distribution
    of Yt given Xt i. Then the joint density of
    Y1, Y2, ... , YT is given by
  • f(y1, y2, ... , yT) f(y)
  • where f(yt )

24
Efficient Methods for computing Likelihood
  • The Forward Method
  • Consider

25

26
(No Transcript)
27
The Backward Procedure

28
(No Transcript)
29
(No Transcript)
30
Prediction of states from the observations and
the model

31
The Viterbi Algorithm (Viterbi Paths)
  • Suppose that we know the parameters of the Hidden
    Markov Model.
  • Suppose in addition suppose that we have observed
    the sequence of observations Y1, Y2, ... , YT.
  • Now consider determining the sequence of States
    X1, X2, ... , XT.

32
  • Recall that
  • PX1 i1,... , XT iT, Y1 y1,... , YT yT
  • PX i, Y y
  • Consider the problem of determining the sequence
    of states, i1, i2, ... , iT , that maximizes the
    above probability.
  • This is equivalent to maximizing
  • PX iY y PX i,Y y / PY y

33
  • The Viterbi Algorithm
  • We want to maximize
  • PX i, Y y
  • Equivalently we want to minimize
  • U(i1, i2, ... , iT)
  • Where
  • U(i1, i2, ... , iT)
  • -ln (PX i, Y y)

34
  • Minimization of U(i1, i2, ... , iT) can be
    achieved by Dynamic Programming.
  • This can be thought of as finding the shortest
    distance through the following grid of points.
  • By starting at the unique point in stage 0 and
    moving from a point in stage t to a point in
    stage t1 in an optimal way. The distances
    between points in stage t and points in stage t1
    are equal to

35
Dynamic Programming
36
  • By starting at the unique point in stage 0 and
    moving from a point in stage t to a point in
    stage t1 in an optimal way.
  • The distances between points in stage t and
    points in stage t1 are equal to

37
Dynamic Programming
38
Dynamic Programming
...
Stage 0
Stage 1
Stage 2
Stage T-1
Stage T
39
  • Let

i1 1, 2, , M
Then
and
it1 1, 2, , M t 1,, T-2
40
  • Finally



41
Summary of calculations of Viterbi Path
  • 1. i1 1, 2, , M
  • 2.
  • it1 1, 2, , M t 1,, T-2
  • 3.

42
An alternative approach to prediction of states
from the observations and the model
  • It can be shown that

43

Forward Probabilities
1.
2.
44

Backward Probabilities
1.
2.
HMM generator (normal).xls
45
Estimation of Parameters of a Hidden Markov Model
  • If both the sequence of observations Y1, Y2, ...
    , YT and the sequence of States X1, X2, ... , XT
    is observed Y1 y1, Y2 y2, ... , YT yT, X1
    i1, X2 i2, ... , XT iT, then the Likelihood
    is given by

46
  • the log-Likelihood is given by

47
  • In this case the Maximum Likelihood estimates
    are
  • the MLE of qi computed from the observations
    yt where Xt i.


48
MLE (states unknown)
  • If only the sequence of observations Y1 y1, Y2
    y2, ... , YT yT are observed then the
    Likelihood is given by

49
  • It is difficult to find the Maximum Likelihood
    Estimates directly from the Likelihood function.
  • The Techniques that are used are
  • 1. The Segmental K-means Algorithm
  • 2. The Baum-Welch (E-M) Algorithm

50
The Segmental K-means Algorithm
  • In this method the parameters
    are adjusted to maximize
  • where is the Viterbi path

51
  • Consider this with the special case
  • Case The observations Y1, Y 2, ... , YT are
    continuous Multivariate Normal with mean vector
    and covariance matrix when
    ,
  • i.e.

52
  1. Pick arbitrarily M centroids a1, a2, aM. Assign
    each of the T observations yt (kT if multiple
    realizations are observed) to a state it by
    determining
  2. Then

53
  • And
  • Calculate the Viterbi path (i1, i2, , iT) based
    on the parameters of step 2 and 3.
  • If there is a change in the sequence (i1, i2, ,
    iT) repeat steps 2 to 4.

54
The Baum-Welch (E-M) Algorithm
  • The E-M algorithm was designed originally to
    handle Missing observations.
  • In this case the missing observations are the
    states X1, X2, ... , XT.
  • Assuming a model, the states are estimated by
    finding their expected values under this model.
    (The E part of the E-M algorithm).

55
  • With these values the model is estimated by
    Maximum Likelihood Estimation (The M part of the
    E-M algorithm).
  • The process is repeated until the estimated model
    converges.

56
The E-M Algorithm
  • Let denote
    the joint distribution of Y,X.
  • Consider the function
  • Starting with an initial estimate of
    . A sequence of estimates are formed
    by finding to maximize
  • with respect to .

57
  • The sequence of estimates
  • converge to a local maximum of the likelihood
  • .

58
  • Example Sampling from Mixtures
  • Let y1, y2, , yn denote a sample from the
    density

where
and
59
  • Suppose that m 2 and let x1, x2, , x1 denote
    independent random variables taking on the value
    1 with probability f and 0 with probability 1- f.
  • Suppose that yi comes from the density

We will also assume that g(yqi) is normal with
mean miand standard deviation si.
60
  • Thus the joint distribution of x1, x2, , xn and
    let y1, y2, , yn is

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
  • In the case of an HMM the log-Likelihood is given
    by

67
  • Recall
  • and
  • Expected no. of transitions from
  • state i.

68
  • Let
  • Expected no. of transitions from state i to
  • state j.

69
The E-M Re-estimation Formulae
  • Case 1 The observations Y1, Y2, ... , YT are
    discrete with K possible values and

70

71
  • Case 2 The observations Y1, Y 2, ... , YT are
    continuous Multivariate Normal with mean vector
    and covariance matrix when ,
  • i.e.

72

73
Measuring distance between two HMMs
  • Let
  • and
  • denote the parameters of two different HMM
    models. We now consider defining a distance
    between these two models.

74
The Kullback-Leibler distance
  • Consider the two discrete distributions
  • and
  • ( and in the continuous case)
  • then define

75
  • and in the continuous case

76
  • These measures of distance between the two
    distributions are not symmetric but can be made
    symmetric by the following

77
  • In the case of a Hidden Markov model.
  • where
  • The computation of in this
    case is formidable

78
Juang and Rabiner distance
  • Let denote
    a sequence of observations generated from the HMM
    with parameters
  • Let
  • denote the optimal (Viterbi) sequence of states
    assuming HMM model
    .

79
  • Then define
  • and
Write a Comment
User Comments (0)
About PowerShow.com