Part II. Statistical NLP - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Part II. Statistical NLP

Description:

Advanced Artificial Intelligence Part II. Statistical NLP Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 65
Provided by: www2Infor3
Category:

less

Transcript and Presenter's Notes

Title: Part II. Statistical NLP


1
Advanced Artificial Intelligence
  • Part II. Statistical NLP

Hidden Markov Models Wolfram Burgard, Luc De
Raedt, Bernhard Nebel, Lars Schmidt-Thieme
Most slides taken (or adapted) from David Meir
Blei, Figures from Manning and Schuetze and from
Rabiner
2
Contents
  • Markov Models
  • Hidden Markov Models
  • Three problems - three algorithms
  • Decoding
  • Viterbi
  • Baum-Welsch
  • Next chapter
  • Application to part-of-speech-tagging
    (POS-tagging)
  • Largely chapter 9 of Statistical NLP, Manning and
    Schuetze, or Rabiner, A tutorial on HMMs and
    selected applications in Speech Recognition,
    Proc. IEEE

3
Motivations and Applications
  • Part-of-speech tagging / Sequence tagging
  • The representative put chairs on the table
  • AT NN VBD NNS IN AT NN
  • AT JJ NN VBZ IN AT NN
  • Some tags
  • AT article, NN singular or mass noun, VBD
    verb, past tense, NNS plural noun, IN
    preposition, JJ adjective

4
Bioinformatics
  • Durbin et al. Biological Sequence Analysis,
    Cambridge University Press.
  • Several applications, e.g. proteins
  • From primary structure ATCPLELLLD
  • Infer secondary structure HHHBBBBBC..

5
Other Applications
  • Speech Recognition from
  • From Acoustic signals infer
  • Infer Sentence
  • Robotics
  • From Sensory readings
  • Infer Trajectory / location

6
What is a (Visible) Markov Model ?
  • Graphical Model (Can be interpreted as Bayesian
    Net)
  • Circles indicate states
  • Arrows indicate probabilistic dependencies
    between states
  • State depends only on the previous state
  • The past is independent of the future given the
    present.
  • Recall from introduction to N-gramms !!!

7
Markov Model Formalization
S
S
S
S
S
  • S, P, A
  • S s1sN are the values for the hidden states
  • Limited Horizon (Markov Assumption)
  • Time Invariant (Stationary)
  • Transition Matrix A

8
Markov Model Formalization
A
A
A
A
S
S
S
S
S
  • S, P, A
  • S s1sN are the values for the hidden states
  • P pi are the initial state probabilities
  • A aij are the state transition probabilities

9
What is the probability of a sequence of states ?
10
What is an HMM?
  • Graphical Model
  • Circles indicate states
  • Arrows indicate probabilistic dependencies
    between states
  • HMM Hidden Markov Model

11
What is an HMM?
  • Green circles are hidden states
  • Dependent only on the previous state

12
What is an HMM?
  • Purple nodes are observed states
  • Dependent only on their corresponding hidden
    state
  • The past is independent of the future given the
    present

13
HMM Formalism
S
S
S
S
S
K
K
K
K
K
  • S, K, P, A, B
  • S s1sN are the values for the hidden states
  • K k1kM are the values for the observations

14
HMM Formalism
A
A
A
A
S
S
S
S
S
B
B
B
K
K
K
K
K
  • S, K, P, A, B
  • P pi are the initial state probabilities
  • A aij are the state transition probabilities
  • B bik are the observation state probabilities
  • Note sometimes one uses B bijk
  • output then depends on previous state /
    transition as well

15
The crazy soft drink machine
  • Fig 9.2

16
Probability of lem,ice ?
  • Sum over all paths taken through HMM
  • Start in CP
  • 1 x 0.3 x 0.7 x 0.1
  • 1 x 0.3 x 0.3 x 0.7

17
HMMs and Bayesian Nets (1)
x1
xt-1
xt
xt1
xT
18
HMM and Bayesian Nets (2)
x1
xt1
xT
xt
xt-1
oT
o1
ot
ot-1
ot1
Conditionally independent of
Given
Because of d-separation
The past is independent of the future given the
present.
19
Inference in an HMM
  • Compute the probability of a given observation
    sequence
  • Given an observation sequence, compute the most
    likely hidden state sequence
  • Given an observation sequence and set of possible
    models, which model most closely fits the data?

20
Decoding
o1
ot
ot-1
ot1
Given an observation sequence and a model,
compute the probability of the observation
sequence
21
Decoding
22
Decoding
23
Decoding
24
Decoding
25
Decoding
26
Dynamic Programming
27
Forward Procedure
  • Special structure gives us an efficient solution
    using dynamic programming.
  • Intuition Probability of the first t
    observations is the same for all possible t1
    length state sequences.
  • Define

28
Forward Procedure
29
Forward Procedure
30
Forward Procedure
31
Forward Procedure
32
Forward Procedure
33
Forward Procedure
34
Forward Procedure
35
Forward Procedure
36
Dynamic Programming
37
Backward Procedure
x1
xt1
xT
xt
xt-1
oT
o1
ot
ot-1
ot1
Probability of the rest of the states given the
first state
38
(No Transcript)
39
Decoding Solution
x1
xt1
xT
xt
xt-1
oT
o1
ot
ot-1
ot1
Forward Procedure
Backward Procedure
Combination
40
(No Transcript)
41
Best State Sequence
  • Find the state sequence that best explains the
    observations
  • Two approaches
  • Individually most likely states
  • Most likely sequence (Viterbi)

42
Best State Sequence (1)
43
Best State Sequence (2)
  • Find the state sequence that best explains the
    observations
  • Viterbi algorithm

44
Viterbi Algorithm
x1
xt-1
j
oT
o1
ot
ot-1
ot1
The state sequence which maximizes the
probability of seeing the observations to time
t-1, landing in state j, and seeing the
observation at time t
45
Viterbi Algorithm
x1
xt-1
xt
xt1
Recursive Computation
46
Viterbi Algorithm
x1
xt-1
xt
xt1
xT
Compute the most likely state sequence by working
backwards
47
HMMs and Bayesian Nets (1)
x1
xt-1
xt
xt1
xT
48
HMM and Bayesian Nets (2)
x1
xt1
xT
xt
xt-1
oT
o1
ot
ot-1
ot1
Conditionally independent of
Given
Because of d-separation
The past is independent of the future given the
present.
49
Inference in an HMM
  • Compute the probability of a given observation
    sequence
  • Given an observation sequence, compute the most
    likely hidden state sequence
  • Given an observation sequence and set of possible
    models, which model most closely fits the data?

50
Dynamic Programming
51
Parameter Estimation
A
A
A
A
B
B
B
B
B
  • Given an observation sequence, find the model
    that is most likely to produce that sequence.
  • No analytic method
  • Given a model and observation sequence, update
    the model parameters to better fit the
    observations.

52
(No Transcript)
53
Parameter Estimation
A
A
A
A
B
B
B
B
B
Probability of traversing an arc
Probability of being in state i
54
Parameter Estimation
A
A
A
A
B
B
B
B
B
Now we can compute the new estimates of the model
parameters.
55
Instance of Expectation Maximization
  • We have that
  • We may get stuck in local maximum (or even saddle
    point)
  • Nevertheless, Baum-Welch usually effective

56
Some Variants
  • So far, ergodic models
  • All states are connected
  • Not always wanted
  • Epsilon or null-transitions
  • Not all states/transitions emit output symbols
  • Parameter tying
  • Assuming that certain parameters are shared
  • Reduces the number of parameters that have to be
    estimated
  • Logical HMMs (Kersting, De Raedt, Raiko)
  • Working with structured states and observation
    symbols
  • Working with log probabilities and addition
    instead of multiplication of probabilities
    (typically done)

57
The Most Important Thing
A
A
A
A
B
B
B
B
B
We can use the special structure of this model to
do a lot of neat math and solve problems that are
otherwise not solvable.
58
HMMs from an Agent Perspective
  • AI a modern approach
  • AI is the study of rational agents
  • Third part by Wolfram Burgard on Reinforcement
    learning
  • HMMs can also be used here
  • Typically one is interested in P(state)

59
Example
  • Possible states
  • snow, no snow
  • Observations
  • skis , no skis
  • Questions
  • Was there snow the day before yesterday (given a
    sequence of observations) ?
  • Is there now snow (given a sequence of
    observations) ?
  • Will there be snow tomorrow, given a sequence of
    observations? Next week ?

60
HMM and Agents
  • Question
  • Case 1 often called smoothing
  • t lt T see last time
  • Only part of trellis between t and T needed

61
HMM and Agents
  • Case 2 often called filtering
  • t T last time
  • Can we make it recursive ? I.e go from T-1 to T ?

62
HMM and Agents
  • Case 2 often called filtering
  • t T last time

63
HMM and Agents
  • Case 3 often called prediction
  • t T1 (or TK) not yet seen
  • Interesting recursive
  • Easily extended towards k gt 1

64
Extensions
  • Use Dynamic Bayesian networks instead of HMMs
  • One state corresponds to a Bayesian Net
  • Observations can become more complex
  • Involve actions of the agent as well
  • Cf. Wolfram Burgards Part
Write a Comment
User Comments (0)
About PowerShow.com