Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation

Title:

Hidden Markov Models

Description:

Title: Forward/Backward Algorithms for Hidden Markov Models Author: Richard M. Golden Last modified by: Marek Perkowski Created Date: 4/14/2003 12:35:54 PM – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 42

Provided by: Richa378

Learn more at: http://web.cecs.pdx.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models

1
Hidden Markov Models

Richard Golden
(following approach of Chapter 9 of Manning and
Schutze, 2000)
REVISION DATE April 15 (Tuesday), 2003

2
VMM (Visible Markov Model)
3
HMM Notation

State Sequence Variables X1, , XT1
Output Sequence Variables O1, , OT
Set of Hidden States (S1, , SN)
Output Alphabet (K1, , KM)
Initial State Probabilities (?1, .., ?N)
?ip(X1Si), i1,,N
State Transition Probabilities (aij)
i,j?1,,Naij p(Xt1Xt), t1,,T
Emission Probabilities (bij) i?1,,N,j
?1,,Mbijp(Xt1SiXtSj), t1,,T

4
HMM State-Emission Representation

Note that sometimes a Hidden Markov Model is
represented by having the emission arrows come
off the arcs
In this situation you would have a lot more
emission arrows because theres a lot more arcs
But the transition and emission probabilities are
the sameit just takes longer to draw on your
powerpoint presentation (self-conscious
presentation)

a110.7
b110.6
b120.1
b130.3
?11
a120.3
b220.7
b230.2
?20
a210.5
S2
b210.1
a220.5
5
Arc-Emission Representation

Note that sometimes a Hidden Markov Model is
represented by having the emission arrows come
off the arcs
In this situation you would have a lot more
emission arrows because theres a lot more arcs
But the transition and emission probabilities are
the sameit just takes longer to draw on your
powerpoint presentation (self-conscious
presentation)

6
Fundamental Questions for HMMs

MODEL FIT
How can we compute likelihood of observations and
hidden states given known emission and transition
probabilities?
Computep(Dog/NOUN,is/VERB,Good/ADJ
aij,bkm)
How can we compute likelihood of observations
given known emission and transition
probabilities? p(Dog,is,Good aij,bkm)

7
Fundamental Questions for HMMs

INFERENCE
How can we infer the sequence of hidden states
given the observations and the known emission and
transition probabilities?
Maximize
p(Dog/?,is/?, Good/? aij,bkm)with
respect to the unknown labels

8
Fundamental Questions for HMMs

LEARNING
How can we estimate the emission and transition
probabilities given observations and assuming
that hidden states are observable during learning
process?
How can we estimate emission and transition
probabilities given observations only?

9
Direct Calculation of Model Fit(note use of
Markov Assumptions) Part 1
Follows directly from the definition of a
conditional probability p(o,x)p(ox)p(x)
EXAMPLEP(Dog/NOUN,is/VERB,Good/ADJ
aij,bij) p(Dog,is,GoodNOUN,VERB,ADJ
aij,bij) X p(NOUN,VERB,ADJ aij,bij)
10
Direct Calculation of Likelihood of Labeled
Observations(note use of Markov
Assumptions)Part 2

EXAMPLECompute p(DOG/NOUN,is/VERB,good/ADJ
aij,bkm)
11
Graphical Algorithm Representation of Direct
Calculation of Likelihood of Observations and
Hidden States (not hard!)
Note that good is The name Of the dogj So it is
a Noun!
The likelihood of a particular labeled sequence
of observations (e.g., p(Dog/NOUN,is/VERB,Goo
d/NOUNaij,bkm)) may be computed Using the
direct calculation method using following
simple graphical algorithm. Specifically,
p(K3/S1, K2/S2, K1/S1 aij,bkm))
?1b13a12b22a21b11
12
Extension to case where the likelihood of the
observations given parameters is needed(e.g., p(
Dog, is, good aij,bij)
KILLER EQUATION!!!!!
13
Efficiency of Calculations is Important (e.g.,
Model-Fit)

Assume 1 multiplication per microsecond
Assume N1000 word vocabulary and T7 word
sentence.
(2T1)NT1 multiplications by direct
calculation yields (2(7)1)(1000)(71) is
about 475,000 million years of computer time!!!
2N2T multiplications using forward methodis
about 14 seconds of computer time!!!

14
Forward, Backward, and Viterbi Calculations

Forward calculation methods are thus very useful.
Forward, Backward, and Viterbi Calculations will
now be discussed.

15
Forward Calculations Overview
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
16
Forward Calculations Time 2 (1 word example)
TIME 2
NOTE that ?1 (2) ?2 (2) is the likelihood of
the observation/word K3in this 1 word
example
K1
K2
K3
b130.3
a110.7
S1
?1
a120.3
a210.5
?2
S2
S2
a220.5
b230.2
K1
K2
K3
17
Forward Calculations Time 3 (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
?1(3)
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K3
18
Forward Calculations Time 4 (3 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
19
Forward Calculation of Likelihood Function
(emit and jump)
t1(0-word) t2(1-word) t3 (2-word) t4(3-word)
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
20
Backward Calculations Overview
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
21
Backward Calculations Time 4
TIME 4
K1
K2
K3
b110.6
S1
S2
b210.1
K1
K2
K3
22
Backward Calculations Time 3
TIME 3
K1
K2
K3
b110.6
S1
S2
b210.1
K1
K2
K3
23
Backward Calculations Time 2
TIME 3
TIME 4
TIME 2
NOTE that ?1 (2) ?2 (2) is the likelihood the
observation/word sequence K2,K1in this 2 word
example
K1
K2
K3
K1
K2
K3
b120.1
b130.3
a110.7
S1
S1
a120.3
a210.5
a220.5
S2
S2
b230.2
b220.7
K1
K2
K1
K2
K3
K3
24
Backward Calculations Time 1
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
25
Backward Calculation of Likelihood Function
(EMIT AND JUMP)
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
26
You get same answer going forward or backward!!
Backward
Forward
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
t1 t2 t3 t4
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
27
The Forward-Backward Method

Note the forward method computes
Note the backward method computes (tgt1)
We can do the forward-backward methodwhich
computes p(K1,,KT) using formula (using any
choice of t1,,T1!)

28
Example Forward-Backward Calculation!
Backward
Forward
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
t1 t2 t3 t4
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
29
Solution to Problem 1

The hard part of the 1st Problem was to find
the likelihood of the observations for an HMM
We can now do this using either theforward,
backward, or forward-backwardmethod.

30
Solution to Problem 2 Viterbi Algorithm(Computin
g Most Probable Labeling)

Consider direct calculation of labeledobservation
s
Previously we summed these likelihoods together
across all possible labelings to solve the first
problemwhich was to compute the likelihood of
the observationsgiven the parameters (Hard part
of HMM Question 1!).
We solved this problem using forward or backward
method.
Now we want to compute all possible labelings and
theirrespective likelihoods and pick the
labeling which isthe largest!

EXAMPLECompute p(DOG/NOUN,is/VERB,good/ADJ
aij,bkm)
31
Efficiency of Calculations is Important (e.g.,
Most Likely Labeling Problem)

Just as in the forward-backward calculations
wecan solve problem of computing likelihood of
every possible one of the NT labelings
efficiently
Instead of millions of years of computing time we
can solve the problem in several seconds!!

32
Viterbi Algorithm Overview (same setup as
forward algorithm)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
33
Forward Calculations Time 2 (1 word example)
TIME 2
K1
K2
K3
b130.3
a110.7
S1
?11
a120.3
a210.5
?20
S2
S2
a220.5
b230.2
K1
K2
K3
34
Backtracking Time 2 (1 word example)
TIME 2
K1
K2
K3
b130.3
a110.7
S1
?11
a120.3
a210.5
?20
S2
S2
a220.5
b230.2
K1
K2
K3
35
Forward Calculations (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
b130.3
S1
S1
a110.7
?1
a210.5
a120.3
?2
a220.5
S2
S2
S2
b230.2
b220.1
K1
K2
K3
K1
K2
K3
36
BACKTRACKING (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
b130.3
S1
S1
a110.7
?1
a210.5
a120.3
?2
a220.5
S2
S2
S2
b230.2
b220.1
K1
K2
K3
K1
K2
K3
37
Formal Analysis of 2 word case
38
Forward Calculations Time 4 (3 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
39
Backtracking to Obtain Labeling for 3 word case
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
40
Formal Analysis of 3 word case
41
Third Fundamental QuestionParameter Estimation

Make Initial Guess for aij and bkm
Compute probability one hidden state follows
another given aij and bkm and sequence of
observations.(computed using forward-backward
algorithm)
Compute probability of observed state given a
hidden state given aij and bkm and sequence
of observations.(computed using forward-backward
algorithm)
Use these computed probabilities tomake an
improved guess for aij and bkm
Repeat this process until convergence
Can be shown that this algorithm does infact
converge to correct choice for aij and
bkmassuming that the initial guess was close
enough..