Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation

Title:

Hidden Markov Models

Description:

Title: Hidden Markov Models Author: Laverty Last modified by: Laverty Created Date: 10/30/2001 3:36:44 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 80

Provided by: lav61

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models

1
Hidden Markov Models
2
A Hidden Markov Model consists of

A sequence of states Xtt ? T X1, X2, ... ,
XT , and
A sequence of observations Yt t ? T Y1, Y2,
... , YT

The sequence of states X1, X2, ... , XT form a
Markov chain moving amongst the M states 1, 2,
, M.
The observation Yt comes from a distribution that
is determined by the current state of the process
Xt. (or possibly past observations and past
states).
The states, X1, X2, ... , XT, are unobserved
(hence hidden).

4
Y3
Y1
Y2
YT

X3
X1
X2
XT
The Hidden Markov Model
5

Some basic problems
from the observations Y1, Y2, ... , YT
1. Determine the sequence of states X1, X2,
... , XT.
2. Determine (or estimate) the parameters of
the stochastic process that is generating the
states and the observations.

6
Examples
7
Example 1

A person is rolling two sets of dice (one is
balanced, the other is unbalanced). He switches
between the two sets of dice using a Markov
transition matrix.
The states are the dice.
The observations are the numbers rolled each
time.

8
Balanced Dice
9
Unbalanced Dice
10
Example 2

The Markov chain is two state.
The observations (given the states) are
independent Normal.
Both mean and variance dependent on state.
HMM AR.xls

11
Example 3 Dow Jones
12
Daily Changes Dow Jones
13
Hidden Markov Model??
14
Bear and Bull Market?
15
Speech Recognition

When a word is spoken the vocalization process
goes through a sequence of states.
The sound produced is relatively constant when
the process remains in the same state.
Recognizing the sequence of states and the
duration of each state allows one to recognize
the word being spoken.

The interval of time when the word is spoken is
broken into small (possibly overlapping)
subintervals.
In each subinterval one measures the amplitudes
of various frequencies in the sound. (Using
Fourier analysis). The vector of amplitudes Yt is
assumed to have a multivariate normal
distribution in each state with the mean vector
and covariance matrix being state dependent.

17
Hidden Markov Models for Biological Sequence

Consider the Motif
ATCGACACGTATGGC
Some realizations
A C A - - - A T G
T C A A C T A T C
A C A C - - A G C
A G A - - - A T C
A C C G - - A T C

18
Hidden Markov model of the same motif
ATCGACACGTATGGC
19
Profile HMMs
20
Computing Likelihood

Let pij PXt1 jXt i and P (pij) the
M?M transition matrix.
Let PX1 i and
the initial distribution over the states.

Now assume that
PYt yt X1 i1, X2 i2, ... , Xt it
PYt yt Xt it p(yt )
Then
PX1 i1,X2 i2..,XT iT, Y1 y1, Y2 y2,
... , YT yT
PX i, Y y

Therefore
PY1 y1, Y2 y2, ... , YT yT
PY y

In the case when Y1, Y2, ... , YT are continuous
random variables or continuous random vectors,
Let f(y ) denote the conditional distribution
of Yt given Xt i. Then the joint density of
Y1, Y2, ... , YT is given by
f(y1, y2, ... , yT) f(y)
where f(yt )

24
Efficient Methods for computing Likelihood

The Forward Method
Consider

26
(No Transcript)
27
The Backward Procedure

28
(No Transcript)
29
(No Transcript)
30
Prediction of states from the observations and
the model

31
The Viterbi Algorithm (Viterbi Paths)

Suppose that we know the parameters of the Hidden
Markov Model.
Suppose in addition suppose that we have observed
the sequence of observations Y1, Y2, ... , YT.
Now consider determining the sequence of States
X1, X2, ... , XT.

Recall that
PX1 i1,... , XT iT, Y1 y1,... , YT yT
PX i, Y y
Consider the problem of determining the sequence
of states, i1, i2, ... , iT , that maximizes the
above probability.
This is equivalent to maximizing
PX iY y PX i,Y y / PY y

The Viterbi Algorithm
We want to maximize
PX i, Y y
Equivalently we want to minimize
U(i1, i2, ... , iT)
Where
U(i1, i2, ... , iT)
-ln (PX i, Y y)

Minimization of U(i1, i2, ... , iT) can be
achieved by Dynamic Programming.
This can be thought of as finding the shortest
distance through the following grid of points.
By starting at the unique point in stage 0 and
moving from a point in stage t to a point in
stage t1 in an optimal way. The distances
between points in stage t and points in stage t1
are equal to

35
Dynamic Programming
36

By starting at the unique point in stage 0 and
moving from a point in stage t to a point in
stage t1 in an optimal way.
The distances between points in stage t and
points in stage t1 are equal to

37
Dynamic Programming
38
Dynamic Programming
...
Stage 0
Stage 1
Stage 2
Stage T-1
Stage T
39

i1 1, 2, , M
Then
and
it1 1, 2, , M t 1,, T-2
40

Finally

41
Summary of calculations of Viterbi Path

1. i1 1, 2, , M
2.
it1 1, 2, , M t 1,, T-2
3.

42
An alternative approach to prediction of states
from the observations and the model

It can be shown that

Forward Probabilities
1.
2.
44

Backward Probabilities
1.
2.
HMM generator (normal).xls
45
Estimation of Parameters of a Hidden Markov Model

If both the sequence of observations Y1, Y2, ...
, YT and the sequence of States X1, X2, ... , XT
is observed Y1 y1, Y2 y2, ... , YT yT, X1
i1, X2 i2, ... , XT iT, then the Likelihood
is given by

the log-Likelihood is given by

In this case the Maximum Likelihood estimates
are
the MLE of qi computed from the observations
yt where Xt i.

48
MLE (states unknown)

If only the sequence of observations Y1 y1, Y2
y2, ... , YT yT are observed then the
Likelihood is given by

It is difficult to find the Maximum Likelihood
Estimates directly from the Likelihood function.
The Techniques that are used are
1. The Segmental K-means Algorithm
2. The Baum-Welch (E-M) Algorithm

50
The Segmental K-means Algorithm

In this method the parameters
are adjusted to maximize
where is the Viterbi path

Consider this with the special case
Case The observations Y1, Y 2, ... , YT are
continuous Multivariate Normal with mean vector
and covariance matrix when
,
i.e.

Pick arbitrarily M centroids a1, a2, aM. Assign
each of the T observations yt (kT if multiple
realizations are observed) to a state it by
determining
Then

And
Calculate the Viterbi path (i1, i2, , iT) based
on the parameters of step 2 and 3.
If there is a change in the sequence (i1, i2, ,
iT) repeat steps 2 to 4.

54
The Baum-Welch (E-M) Algorithm

The E-M algorithm was designed originally to
handle Missing observations.
In this case the missing observations are the
states X1, X2, ... , XT.
Assuming a model, the states are estimated by
finding their expected values under this model.
(The E part of the E-M algorithm).

With these values the model is estimated by
Maximum Likelihood Estimation (The M part of the
E-M algorithm).
The process is repeated until the estimated model
converges.

56
The E-M Algorithm

Let denote
the joint distribution of Y,X.
Consider the function
Starting with an initial estimate of
. A sequence of estimates are formed
by finding to maximize
with respect to .

The sequence of estimates
converge to a local maximum of the likelihood
.

Example Sampling from Mixtures
Let y1, y2, , yn denote a sample from the
density

where
and
59

Suppose that m 2 and let x1, x2, , x1 denote
independent random variables taking on the value
1 with probability f and 0 with probability 1- f.
Suppose that yi comes from the density

We will also assume that g(yqi) is normal with
mean miand standard deviation si.
60