Title: Introduction to Hidden Markov Models
1Introduction to Hidden MarkovModels
- Wang Rui
- CADCG state key lab
- 2004-5-26
2Outline
- Example ---- Video Texture
- Markov Chains
- Hidden Markov Models
- Example ---- Motion Texture
3Example ---- Video Texture
video clip
video texture
4The approach
- How do we find good transitions?
5Finding good transitions
- Compute L2 distance Di, j between all frames
frame i
vs.
frame j
- Similar frames make good transitions
6Fish Tank
7Mathematic model of Video Texture
A sequence of random variables ADEABEDADBCAD
A sequence of random variables BDACBDCACDBCADCBAD
CA
Mathematic Model The future is independent of the
past and given by the present.
Markov Model
8Markov Property
- Formal definition
- Let XXnn0N be a sequence of random variables
taking values sk ?N Iff P(XmsmX0s0,,Xm-1sm-1
) P(Xmsm Xm-1sm-1) - then the X fulfills Markov property
- Informal definition
- The future is independent of the past given the
present.
9History
- Markov chain theory developed around 1900.
- Hidden Markov Models developed in late 1960s.
- Used extensively in speech recognition in
1960-70. - Introduced to computer science in 1989.
Applications
- Bioinformatics.
- Signal Processing
- Data analysis and Pattern recognition
10Markov Chain
- A Markov chain is specified by
- A state space S s1, s2..., sn
- An initial distribution a0
- A transition matrix A
- Where A(n)ij aij P(qtsjqt-1si)
- Graphical Representation
- as a directed graph where
- Vertices represent states
- Edges represent transitions with positive
probability
11Probability Axioms
- Marginal Probability sum the joint probability
- Conditional Probability
12Calculating with Markov chains
- Probability of an observation sequence
- Let XxtLt0 be an observation sequence from
the Markov chain S, a0, A
13(No Transcript)
14(No Transcript)
15Motivation of Hidden Markov Models
- Hidden states
- The state of the entity we want to model is often
not observable - The state is then said to be hidden.
- Observables
- Sometimes we can instead observe the state of
entities influenced by the hidden state. - A system can be modeled by an HMM if
- The sequence of hidden states is Markov
- The sequence of observations are independent (or
Markov) given the hidden
16Hidden Markov Model
- Definition
- Set of states S s1, s2..., sN
- Observation symbols V v1, v2, , vM
- Transition probabilities A between any two states
- aij P(qtsjqt-1si)
- Emission probabilities B within each state
- bj(Ot) P( Otvj qt sj)
- Start probabilities ? a0
- Use ? (A, B, ?) to indicate the parameter set
of the model.
17Generating a sequence by the model
- Given a HMM, we can generate a sequence of length
n as follows - Start at state q1 according to prob a0t1
- Emit letter o1 according to prob et1(o1)
- Go to state q2 according to prob at1t2
- until emitting yn
1
a02
2
2
0
N
b2(o1)
o1
o2
o3
on
18Example
19Calculating with Hidden Markov Model
- Consider one such fixed state sequence
- Q q1 q2 qT
- The observation sequence O for the Q is
20The probability of such a state sequence Q can be
written as The probability that O and Q occur
simultaneously, is simply the product of the
above two terms, i.e.,
21Example
22The three main questions on HMMs
- Evaluation
- GIVEN a HMM (S, V, A, B, ?), and a sequence O,
- FIND Prob y M
- Decoding
- GIVEN a HMM (S, V, A, B, ?), and a sequence O,
- FIND the sequence Q of states that maximizes P(O,
Q ?) - Learning
- GIVEN a HMM (S, V, A, B, ?), with unspecified
transition/emission probs and a sequence Q, - FIND parameters ? (ei(.), aij) that maximize
Px?
23Evaluation
- Find the likelihood a sequence is generated by
the model
- A straight way
- The probability of O is obtained by summing all
possible state sequences q giving
Complexity is O(NT) Calculations is unfeasible
24The Forward Algorithm
- A more elaborate algorithm
- The Forward Algorithm
a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
25The Forward Algorithm
- The Forward variable
- We can compute a(i) for all N, i,
- Initialization
- a1(i) aibi(O1) i 1N
- Iteration
-
- Termination
-
-
a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
26The Backward Algorithm
- The backward variable
- Similar, we can compute backward variable for all
N, i, - Initialization
- ßT(i) 1, i 1N
- Iteration
-
- Termination
-
a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
27Forward, fk(i)
Backward, bk(i)
28Decoding
- Decoding
- GIVEN a HMM, and a sequence O.
- Suppose that we know the parameters of the
Hidden Markov Model and the observed sequence of
observations O1, O2, ... , OT. - FIND the sequence Q of states that maximizes
P(QO,?) - Determining the sequence of States q1, q2, ...
, qT, which is optimal in some meaningful sense.
(i.e. best explain the observations)
29- Consider
- So that maximizes the above probability
- This is equivalent to maximizing
-
1
A best path finding problem
a02
2
2
0
N
o1
o2
o3
on
30Viterbi Algorithm
- A dynamic programming
- Initialization
- d1(i) a0ibi(O1) , i 1N
- ?1(i) 0.
- Recursion
- dt(j) maxi dt-1(i) aijbj(Ot) t2T
- j1N
- ?1(j) argmaxi dt-1(i) aij t2T
- j1N
- Termination
- P maxi dT(i)
- qT argmaxi dT(i)
- Traceback
- qt ?1(qt1 ) tT-1,T-2,,1.
31Learning
- Estimation of Parameters of a Hidden Markov Model
- 1. Both the sequence of observations O and the
sequence of States Q is observed - learning ? (A, B, ?)
- 2. Only the sequence of observations O are
observed - learning Q and ? (A, B, ?)
-
32- Given O and Q then the Likelihood is given by
the log-Likelihood is given by
33In such case these parameters computed by Maximum
Likelihood estimation are the MLE of
bi computed from the observations ot where qt
Si.
34- Only the sequence of observations O are observed
- It is difficult to find the Maximum Likelihood
Estimates directly from the Likelihood function. - The Techniques that are used are
- 1. The Segmental K-means Algorithm
- 2. The Baum-Welch (E-M) Algorithm
35The Baum-Welch (E-M) Algorithm
- The E-M algorithm was designed originally to
handle Missing observations. - In this case the missing observations are the
states q1, q2, ... , qT. - Assuming a model, the states are estimated by
finding their expected values under this model.
(The E part of the E-M algorithm).
36- With these values the model is estimated by
Maximum Likelihood Estimation (The M part of the
E-M algorithm). - The process is repeated until the estimated model
converges.
37The E-M Algorithm
- Let denote
the joint distribution of Q,O. - Consider the function
- Starting with an initial estimate of
. A sequence of estimates are formed
by finding to maximize - with respect to .
38- The sequence of estimates
- converge to a local maximum of the likelihood
- .