Title: Hidden Markov Models
1Hidden Markov Models
2A Hidden Markov Model consists of
- A sequence of states Xtt ? T X1, X2, ... ,
XT , and - A sequence of observations Yt t ? T Y1, Y2,
... , YT
3- The sequence of states X1, X2, ... , XT form a
Markov chain moving amongst the M states 1, 2,
, M. - The observation Yt comes from a distribution that
is determined by the current state of the process
Xt. (or possibly past observations and past
states). - The states, X1, X2, ... , XT, are unobserved
(hence hidden).
The Hidden Markov Model
5- Some basic problems
- from the observations Y1, Y2, ... , YT
- 1. Determine the sequence of states X1, X2,
... , XT. - 2. Determine (or estimate) the parameters of
the stochastic process that is generating the
states and the observations.
7Example 1
- A person is rolling two sets of dice (one is
balanced, the other is unbalanced). He switches
between the two sets of dice using a Markov
transition matrix. - The states are the dice.
- The observations are the numbers rolled each
8Balanced Dice
9Unbalanced Dice
10Example 2
- The Markov chain is two state.
- The observations (given the states) are
independent Normal. - Both mean and variance dependent on state.
- HMM AR.xls
11Example 3 Dow Jones
12Daily Changes Dow Jones
13Hidden Markov Model??
14Bear and Bull Market?
15Speech Recognition
- When a word is spoken the vocalization process
goes through a sequence of states. - The sound produced is relatively constant when
the process remains in the same state. - Recognizing the sequence of states and the
duration of each state allows one to recognize
the word being spoken.
16- The interval of time when the word is spoken is
broken into small (possibly overlapping)
subintervals. - In each subinterval one measures the amplitudes
of various frequencies in the sound. (Using
Fourier analysis). The vector of amplitudes Yt is
assumed to have a multivariate normal
distribution in each state with the mean vector
and covariance matrix being state dependent.
17Hidden Markov Models for Biological Sequence
- Consider the Motif
- Some realizations
- A C A - - - A T G
- T C A A C T A T C
- A C A C - - A G C
- A G A - - - A T C
- A C C G - - A T C
18Hidden Markov model of the same motif
19Profile HMMs
20Computing Likelihood
- Let pij PXt1 jXt i and P (pij) the
M?M transition matrix. - Let PX1 i and
- the initial distribution over the states.
21- Now assume that
- PYt yt X1 i1, X2 i2, ... , Xt it
- PYt yt Xt it p(yt )
- Then
- PX1 i1,X2 i2..,XT iT, Y1 y1, Y2 y2,
... , YT yT - PX i, Y y
22- Therefore
- PY1 y1, Y2 y2, ... , YT yT
- PY y
23- In the case when Y1, Y2, ... , YT are continuous
random variables or continuous random vectors,
Let f(y ) denote the conditional distribution
of Yt given Xt i. Then the joint density of
Y1, Y2, ... , YT is given by - f(y1, y2, ... , yT) f(y)
- where f(yt )
24Efficient Methods for computing Likelihood
- The Forward Method
- Consider
25 26(No Transcript)
27The Backward Procedure
28(No Transcript)
29(No Transcript)
30Prediction of states from the observations and
the model
31The Viterbi Algorithm (Viterbi Paths)
- Suppose that we know the parameters of the Hidden
Markov Model. - Suppose in addition suppose that we have observed
the sequence of observations Y1, Y2, ... , YT. - Now consider determining the sequence of States
X1, X2, ... , XT.
32- Recall that
- PX1 i1,... , XT iT, Y1 y1,... , YT yT
- PX i, Y y
- Consider the problem of determining the sequence
of states, i1, i2, ... , iT , that maximizes the
above probability. - This is equivalent to maximizing
- PX iY y PX i,Y y / PY y
33- The Viterbi Algorithm
- We want to maximize
- PX i, Y y
- Equivalently we want to minimize
- U(i1, i2, ... , iT)
- Where
- U(i1, i2, ... , iT)
- -ln (PX i, Y y)
34- Minimization of U(i1, i2, ... , iT) can be
achieved by Dynamic Programming. - This can be thought of as finding the shortest
distance through the following grid of points. - By starting at the unique point in stage 0 and
moving from a point in stage t to a point in
stage t1 in an optimal way. The distances
between points in stage t and points in stage t1
are equal to
35Dynamic Programming
36- By starting at the unique point in stage 0 and
moving from a point in stage t to a point in
stage t1 in an optimal way. - The distances between points in stage t and
points in stage t1 are equal to
37Dynamic Programming
38Dynamic Programming
Stage 0
Stage 1
Stage 2
Stage T-1
Stage T
39i1 1, 2, , M
it1 1, 2, , M t 1,, T-2
40 41Summary of calculations of Viterbi Path
- 1. i1 1, 2, , M
- 2.
- it1 1, 2, , M t 1,, T-2
- 3.
42An alternative approach to prediction of states
from the observations and the model
43Forward Probabilities
44Backward Probabilities
HMM generator (normal).xls
45Estimation of Parameters of a Hidden Markov Model
- If both the sequence of observations Y1, Y2, ...
, YT and the sequence of States X1, X2, ... , XT
is observed Y1 y1, Y2 y2, ... , YT yT, X1
i1, X2 i2, ... , XT iT, then the Likelihood
is given by
46- the log-Likelihood is given by
47- In this case the Maximum Likelihood estimates
are - the MLE of qi computed from the observations
yt where Xt i.
48MLE (states unknown)
- If only the sequence of observations Y1 y1, Y2
y2, ... , YT yT are observed then the
Likelihood is given by
49- It is difficult to find the Maximum Likelihood
Estimates directly from the Likelihood function. - The Techniques that are used are
- 1. The Segmental K-means Algorithm
- 2. The Baum-Welch (E-M) Algorithm
50The Segmental K-means Algorithm
- In this method the parameters
are adjusted to maximize - where is the Viterbi path
51- Consider this with the special case
- Case The observations Y1, Y 2, ... , YT are
continuous Multivariate Normal with mean vector
and covariance matrix when
, - i.e.
52- Pick arbitrarily M centroids a1, a2, aM. Assign
each of the T observations yt (kT if multiple
realizations are observed) to a state it by
determining - Then
53- And
- Calculate the Viterbi path (i1, i2, , iT) based
on the parameters of step 2 and 3. - If there is a change in the sequence (i1, i2, ,
iT) repeat steps 2 to 4.
54The Baum-Welch (E-M) Algorithm
- The E-M algorithm was designed originally to
handle Missing observations. - In this case the missing observations are the
states X1, X2, ... , XT. - Assuming a model, the states are estimated by
finding their expected values under this model.
(The E part of the E-M algorithm).
55- With these values the model is estimated by
Maximum Likelihood Estimation (The M part of the
E-M algorithm). - The process is repeated until the estimated model
56The E-M Algorithm
- Let denote
the joint distribution of Y,X. - Consider the function
- Starting with an initial estimate of
. A sequence of estimates are formed
by finding to maximize - with respect to .
57- The sequence of estimates
- converge to a local maximum of the likelihood
- .
58- Example Sampling from Mixtures
- Let y1, y2, , yn denote a sample from the
59- Suppose that m 2 and let x1, x2, , x1 denote
independent random variables taking on the value
1 with probability f and 0 with probability 1- f. - Suppose that yi comes from the density
We will also assume that g(yqi) is normal with
mean miand standard deviation si.
60- Thus the joint distribution of x1, x2, , xn and
let y1, y2, , yn is
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66- In the case of an HMM the log-Likelihood is given
67- Recall
- and
- Expected no. of transitions from
- state i.
68- Let
- Expected no. of transitions from state i to
- state j.
69The E-M Re-estimation Formulae
- Case 1 The observations Y1, Y2, ... , YT are
discrete with K possible values and
70 71- Case 2 The observations Y1, Y 2, ... , YT are
continuous Multivariate Normal with mean vector
and covariance matrix when , - i.e.
72 73Measuring distance between two HMMs
- Let
- and
- denote the parameters of two different HMM
models. We now consider defining a distance
between these two models.
74The Kullback-Leibler distance
- Consider the two discrete distributions
- and
- ( and in the continuous case)
- then define
75- and in the continuous case
76- These measures of distance between the two
distributions are not symmetric but can be made
symmetric by the following
77- In the case of a Hidden Markov model.
- where
- The computation of in this
case is formidable
78Juang and Rabiner distance
- Let denote
a sequence of observations generated from the HMM
with parameters - Let
- denote the optimal (Viterbi) sequence of states
assuming HMM model
. -