Introduction to Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Hidden Markov Models

Description:

Example ---- Video Texture. Markov Chains. Hidden Markov Models. Example ---- Motion Texture ... The sequence of hidden states is Markov ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 39
Provided by: sha1170
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Hidden Markov Models


1
Introduction to Hidden MarkovModels
  • Wang Rui
  • CADCG state key lab
  • 2004-5-26

2
Outline
  • Example ---- Video Texture
  • Markov Chains
  • Hidden Markov Models
  • Example ---- Motion Texture

3
Example ---- Video Texture
  • Problem statement

video clip
video texture
4
The approach
  • How do we find good transitions?

5
Finding good transitions
  • Compute L2 distance Di, j between all frames

frame i
vs.
frame j
  • Similar frames make good transitions

6
Fish Tank
7
Mathematic model of Video Texture
A sequence of random variables ADEABEDADBCAD
A sequence of random variables BDACBDCACDBCADCBAD
CA
Mathematic Model The future is independent of the
past and given by the present.
Markov Model
8
Markov Property
  • Formal definition
  • Let XXnn0N be a sequence of random variables
    taking values sk ?N Iff P(XmsmX0s0,,Xm-1sm-1
    ) P(Xmsm Xm-1sm-1)
  • then the X fulfills Markov property
  • Informal definition
  • The future is independent of the past given the
    present.

9
History
  • Markov chain theory developed around 1900.
  • Hidden Markov Models developed in late 1960s.
  • Used extensively in speech recognition in
    1960-70.
  • Introduced to computer science in 1989.

Applications
  • Bioinformatics.
  • Signal Processing
  • Data analysis and Pattern recognition

10
Markov Chain
  • A Markov chain is specified by
  • A state space S s1, s2..., sn
  • An initial distribution a0
  • A transition matrix A
  • Where A(n)ij aij P(qtsjqt-1si)
  • Graphical Representation
  • as a directed graph where
  • Vertices represent states
  • Edges represent transitions with positive
    probability

11
Probability Axioms
  • Marginal Probability sum the joint probability
  • Conditional Probability

12
Calculating with Markov chains
  • Probability of an observation sequence
  • Let XxtLt0 be an observation sequence from
    the Markov chain S, a0, A

13
(No Transcript)
14
(No Transcript)
15
Motivation of Hidden Markov Models
  • Hidden states
  • The state of the entity we want to model is often
    not observable
  • The state is then said to be hidden.
  • Observables
  • Sometimes we can instead observe the state of
    entities influenced by the hidden state.
  • A system can be modeled by an HMM if
  • The sequence of hidden states is Markov
  • The sequence of observations are independent (or
    Markov) given the hidden

16
Hidden Markov Model
  • Definition
  • Set of states S s1, s2..., sN
  • Observation symbols V v1, v2, , vM
  • Transition probabilities A between any two states
  • aij P(qtsjqt-1si)
  • Emission probabilities B within each state
  • bj(Ot) P( Otvj qt sj)
  • Start probabilities ? a0
  • Use ? (A, B, ?) to indicate the parameter set
    of the model.

17
Generating a sequence by the model
  • Given a HMM, we can generate a sequence of length
    n as follows
  • Start at state q1 according to prob a0t1
  • Emit letter o1 according to prob et1(o1)
  • Go to state q2 according to prob at1t2
  • until emitting yn

1
a02
2
2
0
N
b2(o1)
o1
o2
o3
on
18
Example
19
Calculating with Hidden Markov Model
  • Consider one such fixed state sequence
  • Q q1 q2 qT
  • The observation sequence O for the Q is

20
The probability of such a state sequence Q can be
written as The probability that O and Q occur
simultaneously, is simply the product of the
above two terms, i.e.,
21
Example
22
The three main questions on HMMs
  • Evaluation
  • GIVEN a HMM (S, V, A, B, ?), and a sequence O,
  • FIND Prob y M
  • Decoding
  • GIVEN a HMM (S, V, A, B, ?), and a sequence O,
  • FIND the sequence Q of states that maximizes P(O,
    Q ?)
  • Learning
  • GIVEN a HMM (S, V, A, B, ?), with unspecified
    transition/emission probs and a sequence Q,
  • FIND parameters ? (ei(.), aij) that maximize
    Px?

23
Evaluation
  • Find the likelihood a sequence is generated by
    the model
  • A straight way
  • The probability of O is obtained by summing all
    possible state sequences q giving

Complexity is O(NT) Calculations is unfeasible
24
The Forward Algorithm
  • A more elaborate algorithm
  • The Forward Algorithm

a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
25
The Forward Algorithm
  • The Forward variable
  • We can compute a(i) for all N, i,
  • Initialization
  • a1(i) aibi(O1) i 1N
  • Iteration
  • Termination

a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
26
The Backward Algorithm
  • The backward variable
  • Similar, we can compute backward variable for all
    N, i,
  • Initialization
  • ßT(i) 1, i 1N
  • Iteration
  • Termination

a11
1
a21
a01
2
2
2
a02
0
an1
a0n
N
N
o1
o2
o3
on
27
Forward, fk(i)
Backward, bk(i)
28
Decoding
  • Decoding
  • GIVEN a HMM, and a sequence O.
  • Suppose that we know the parameters of the
    Hidden Markov Model and the observed sequence of
    observations O1, O2, ... , OT.
  • FIND the sequence Q of states that maximizes
    P(QO,?)
  • Determining the sequence of States q1, q2, ...
    , qT, which is optimal in some meaningful sense.
    (i.e. best explain the observations)

29
  • Consider
  • So that maximizes the above probability
  • This is equivalent to maximizing

1
A best path finding problem
a02
2
2
0
N
o1
o2
o3
on
30
Viterbi Algorithm
  • A dynamic programming
  • Initialization
  • d1(i) a0ibi(O1) , i 1N
  • ?1(i) 0.
  • Recursion
  • dt(j) maxi dt-1(i) aijbj(Ot) t2T
  • j1N
  • ?1(j) argmaxi dt-1(i) aij t2T
  • j1N
  • Termination
  • P maxi dT(i)
  • qT argmaxi dT(i)
  • Traceback
  • qt ?1(qt1 ) tT-1,T-2,,1.

31
Learning
  • Estimation of Parameters of a Hidden Markov Model
  • 1. Both the sequence of observations O and the
    sequence of States Q is observed
  • learning ? (A, B, ?)
  • 2. Only the sequence of observations O are
    observed
  • learning Q and ? (A, B, ?)

32
  • Given O and Q then the Likelihood is given by

the log-Likelihood is given by
33
In such case these parameters computed by Maximum
Likelihood estimation are the MLE of
bi computed from the observations ot where qt
Si.

34
  • Only the sequence of observations O are observed
  • It is difficult to find the Maximum Likelihood
    Estimates directly from the Likelihood function.
  • The Techniques that are used are
  • 1. The Segmental K-means Algorithm
  • 2. The Baum-Welch (E-M) Algorithm

35
The Baum-Welch (E-M) Algorithm
  • The E-M algorithm was designed originally to
    handle Missing observations.
  • In this case the missing observations are the
    states q1, q2, ... , qT.
  • Assuming a model, the states are estimated by
    finding their expected values under this model.
    (The E part of the E-M algorithm).

36
  • With these values the model is estimated by
    Maximum Likelihood Estimation (The M part of the
    E-M algorithm).
  • The process is repeated until the estimated model
    converges.

37
The E-M Algorithm
  • Let denote
    the joint distribution of Q,O.
  • Consider the function
  • Starting with an initial estimate of
    . A sequence of estimates are formed
    by finding to maximize
  • with respect to .

38
  • The sequence of estimates
  • converge to a local maximum of the likelihood
  • .
Write a Comment
User Comments (0)
About PowerShow.com