Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Hidden Markov Models

Description:

Title: Forward/Backward Algorithms for Hidden Markov Models Author: Richard M. Golden Last modified by: Marek Perkowski Created Date: 4/14/2003 12:35:54 PM – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 42
Provided by: Richa378
Learn more at: http://web.cecs.pdx.edu
Category:
Tags: hidden | markov | models

less

Transcript and Presenter's Notes

Title: Hidden Markov Models


1
Hidden Markov Models
  • Richard Golden
  • (following approach of Chapter 9 of Manning and
    Schutze, 2000)
  • REVISION DATE April 15 (Tuesday), 2003

2
VMM (Visible Markov Model)
3
HMM Notation
  • State Sequence Variables X1, , XT1
  • Output Sequence Variables O1, , OT
  • Set of Hidden States (S1, , SN)
  • Output Alphabet (K1, , KM)
  • Initial State Probabilities (?1, .., ?N)
    ?ip(X1Si), i1,,N
  • State Transition Probabilities (aij)
    i,j?1,,Naij p(Xt1Xt), t1,,T
  • Emission Probabilities (bij) i?1,,N,j
    ?1,,Mbijp(Xt1SiXtSj), t1,,T

4
HMM State-Emission Representation
  • Note that sometimes a Hidden Markov Model is
    represented by having the emission arrows come
    off the arcs
  • In this situation you would have a lot more
    emission arrows because theres a lot more arcs
  • But the transition and emission probabilities are
    the sameit just takes longer to draw on your
    powerpoint presentation (self-conscious
    presentation)

a110.7
b110.6
b120.1
b130.3
?11
a120.3
b220.7
b230.2
?20
a210.5
S2
b210.1
a220.5
5
Arc-Emission Representation
  • Note that sometimes a Hidden Markov Model is
    represented by having the emission arrows come
    off the arcs
  • In this situation you would have a lot more
    emission arrows because theres a lot more arcs
  • But the transition and emission probabilities are
    the sameit just takes longer to draw on your
    powerpoint presentation (self-conscious
    presentation)

6
Fundamental Questions for HMMs
  • MODEL FIT
  • How can we compute likelihood of observations and
    hidden states given known emission and transition
    probabilities?
  • Computep(Dog/NOUN,is/VERB,Good/ADJ
    aij,bkm)
  • How can we compute likelihood of observations
    given known emission and transition
    probabilities? p(Dog,is,Good aij,bkm)

7
Fundamental Questions for HMMs
  • INFERENCE
  • How can we infer the sequence of hidden states
    given the observations and the known emission and
    transition probabilities?
  • Maximize
  • p(Dog/?,is/?, Good/? aij,bkm)with
    respect to the unknown labels

8
Fundamental Questions for HMMs
  • LEARNING
  • How can we estimate the emission and transition
    probabilities given observations and assuming
    that hidden states are observable during learning
    process?
  • How can we estimate emission and transition
    probabilities given observations only?

9
Direct Calculation of Model Fit(note use of
Markov Assumptions) Part 1
Follows directly from the definition of a
conditional probability p(o,x)p(ox)p(x)
EXAMPLEP(Dog/NOUN,is/VERB,Good/ADJ
aij,bij) p(Dog,is,GoodNOUN,VERB,ADJ
aij,bij) X p(NOUN,VERB,ADJ aij,bij)
10
Direct Calculation of Likelihood of Labeled
Observations(note use of Markov
Assumptions)Part 2

EXAMPLECompute p(DOG/NOUN,is/VERB,good/ADJ
aij,bkm)
11
Graphical Algorithm Representation of Direct
Calculation of Likelihood of Observations and
Hidden States (not hard!)
Note that good is The name Of the dogj So it is
a Noun!
The likelihood of a particular labeled sequence
of observations (e.g., p(Dog/NOUN,is/VERB,Goo
d/NOUNaij,bkm)) may be computed Using the
direct calculation method using following
simple graphical algorithm. Specifically,
p(K3/S1, K2/S2, K1/S1 aij,bkm))
?1b13a12b22a21b11
12
Extension to case where the likelihood of the
observations given parameters is needed(e.g., p(
Dog, is, good aij,bij)
KILLER EQUATION!!!!!
13
Efficiency of Calculations is Important (e.g.,
Model-Fit)
  • Assume 1 multiplication per microsecond
  • Assume N1000 word vocabulary and T7 word
    sentence.
  • (2T1)NT1 multiplications by direct
    calculation yields (2(7)1)(1000)(71) is
    about 475,000 million years of computer time!!!
  • 2N2T multiplications using forward methodis
    about 14 seconds of computer time!!!

14
Forward, Backward, and Viterbi Calculations
  • Forward calculation methods are thus very useful.
  • Forward, Backward, and Viterbi Calculations will
    now be discussed.

15
Forward Calculations Overview
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
16
Forward Calculations Time 2 (1 word example)
TIME 2
NOTE that ?1 (2) ?2 (2) is the likelihood of
the observation/word K3in this 1 word
example
K1
K2
K3
b130.3
a110.7
S1
?1
a120.3
a210.5
?2
S2
S2
a220.5
b230.2
K1
K2
K3
17
Forward Calculations Time 3 (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
?1(3)
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K3
18
Forward Calculations Time 4 (3 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
19
Forward Calculation of Likelihood Function
(emit and jump)
t1(0-word) t2(1-word) t3 (2-word) t4(3-word)
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
20
Backward Calculations Overview
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
21
Backward Calculations Time 4
TIME 4
K1
K2
K3
b110.6
S1
S2
b210.1
K1
K2
K3
22
Backward Calculations Time 3
TIME 3
K1
K2
K3
b110.6
S1
S2
b210.1
K1
K2
K3
23
Backward Calculations Time 2
TIME 3
TIME 4
TIME 2
NOTE that ?1 (2) ?2 (2) is the likelihood the
observation/word sequence K2,K1in this 2 word
example
K1
K2
K3
K1
K2
K3
b120.1
b130.3
a110.7
S1
S1
a120.3
a210.5
a220.5
S2
S2
b230.2
b220.7
K1
K2
K1
K2
K3
K3
24
Backward Calculations Time 1
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
25
Backward Calculation of Likelihood Function
(EMIT AND JUMP)
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
26
You get same answer going forward or backward!!
Backward
Forward
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
t1 t2 t3 t4
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
27
The Forward-Backward Method
  • Note the forward method computes
  • Note the backward method computes (tgt1)
  • We can do the forward-backward methodwhich
    computes p(K1,,KT) using formula (using any
    choice of t1,,T1!)

28
Example Forward-Backward Calculation!
Backward
Forward
t1 t2 t3 t4
?1(t) 0.0315 0.045 a11b11 ?1(1) a12b21 ?1(1) 0.6 b11 1
?2(t) 0.029 0.245 a11b11 ?1(1) a12b21 ?1(1) 0.1 b21 1
L(t) p(Kt KT) 0.0315?1 ?1(1) ?2 ?2(1) 0.290?1(2) ?2(2) 0.7?1(3) ?2(3) 1
t1 t2 t3 t4
?1(t) 1.0 ?1 1 0.21 ?1(1) a11 b13 ?2(1) a21 b23 0.0462?1(2)a11 b12 ?2(2)a21 b12 0.021294
?2(t) 0.0 ?2 0 0.09 ?1(1) a12 b13?2(1) a22 b23 0.0378 0.010206
L(t) p(K1 Kt) 1.0?1(1) ?2(1) 0.3?1(2) ?2(2) 0.084?1(3) ?2(3) 0.0315?1(4) ?2(4)
29
Solution to Problem 1
  • The hard part of the 1st Problem was to find
    the likelihood of the observations for an HMM
  • We can now do this using either theforward,
    backward, or forward-backwardmethod.

30
Solution to Problem 2 Viterbi Algorithm(Computin
g Most Probable Labeling)
  • Consider direct calculation of labeledobservation
    s
  • Previously we summed these likelihoods together
    across all possible labelings to solve the first
    problemwhich was to compute the likelihood of
    the observationsgiven the parameters (Hard part
    of HMM Question 1!).
  • We solved this problem using forward or backward
    method.
  • Now we want to compute all possible labelings and
    theirrespective likelihoods and pick the
    labeling which isthe largest!

EXAMPLECompute p(DOG/NOUN,is/VERB,good/ADJ
aij,bkm)
31
Efficiency of Calculations is Important (e.g.,
Most Likely Labeling Problem)
  • Just as in the forward-backward calculations
    wecan solve problem of computing likelihood of
    every possible one of the NT labelings
    efficiently
  • Instead of millions of years of computing time we
    can solve the problem in several seconds!!

32
Viterbi Algorithm Overview (same setup as
forward algorithm)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
b230.2
a220.5
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
33
Forward Calculations Time 2 (1 word example)
TIME 2
K1
K2
K3
b130.3
a110.7
S1
?11
a120.3
a210.5
?20
S2
S2
a220.5
b230.2
K1
K2
K3
34
Backtracking Time 2 (1 word example)
TIME 2
K1
K2
K3
b130.3
a110.7
S1
?11
a120.3
a210.5
?20
S2
S2
a220.5
b230.2
K1
K2
K3
35
Forward Calculations (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
b130.3
S1
S1
a110.7
?1
a210.5
a120.3
?2
a220.5
S2
S2
S2
b230.2
b220.1
K1
K2
K3
K1
K2
K3
36
BACKTRACKING (2 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
b120.1
b130.3
S1
S1
a110.7
?1
a210.5
a120.3
?2
a220.5
S2
S2
S2
b230.2
b220.1
K1
K2
K3
K1
K2
K3
37
Formal Analysis of 2 word case
38
Forward Calculations Time 4 (3 word example)
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
39
Backtracking to Obtain Labeling for 3 word case
TIME 3
TIME 4
TIME 2
K1
K2
K3
K1
K2
K3
K1
K2
K3
b120.1
b130.3
b110.6
a110.7
S1
S1
S1
?1
a120.3
a210.5
?2
S2
S2
S2
S2
a220.5
b230.2
b210.1
b220.1
K1
K2
K3
K1
K2
K1
K2
K3
K3
40
Formal Analysis of 3 word case
41
Third Fundamental QuestionParameter Estimation
  • Make Initial Guess for aij and bkm
  • Compute probability one hidden state follows
    another given aij and bkm and sequence of
    observations.(computed using forward-backward
    algorithm)
  • Compute probability of observed state given a
    hidden state given aij and bkm and sequence
    of observations.(computed using forward-backward
    algorithm)
  • Use these computed probabilities tomake an
    improved guess for aij and bkm
  • Repeat this process until convergence
  • Can be shown that this algorithm does infact
    converge to correct choice for aij and
    bkmassuming that the initial guess was close
    enough..
Write a Comment
User Comments (0)
About PowerShow.com