HMM for POS Tagging PowerPoint PPT Presentation

presentation player overlay
1 / 34
About This Presentation
Transcript and Presenter's Notes

Title: HMM for POS Tagging


1
HMM for POS Tagging
  • Heng Ji
  • hengji_at_cs.qc.cuny.edu
  • Feb 4, 2008

Acknowledgement some slides from Ralph Grishman,
Nicolas Nicolov
2
Outline
  • HMMs and Viterbi algorithm

3
Machine Learning based POS Tagging
  • Statistical approaches
  • Machine learning of rules
  • Role of corpus
  • No corpus (hand-written)
  • No machine learning (hand-written)
  • Unsupervised learning from raw data
  • Supervised learning from annotated data

4
The Basic Idea
  • For a string of words
  • W w1w2w3wn
  • find the string of POS tags
  • T t1 t2 t3 tn
  • which maximizes P(TW)
  • i.e., the probability of tag string T given that
    the word string was W
  • i.e., that W was tagged T

5
But, the Sparse Data Problem
  • Rich Models often require vast amounts of data
  • Count up instances of the string "heat oil in a
    large pot" in the training corpus, and pick the
    most common tag assignment to the string..
  • Too many possible combinations

6
POS Tagging as Sequence Classification
  • We are given a sentence (an observation or
    sequence of observations)
  • Secretariat is expected to race tomorrow
  • What is the best sequence of tags that
    corresponds to this sequence of observations?
  • Probabilistic view
  • Consider all possible sequences of tags
  • Out of this universe of sequences, choose the tag
    sequence which is most probable given the
    observation sequence of n words w1wn.

7
Getting to HMMs
  • We want, out of all sequences of n tags t1tn the
    single tag sequence such that P(t1tnw1wn) is
    highest.
  • Hat means our estimate of the best one
  • Argmaxx f(x) means the x such that f(x) is
    maximized

8
Getting to HMMs
  • This equation is guaranteed to give us the best
    tag sequence
  • But how to make it operational? How to compute
    this value?
  • Intuition of Bayesian classification
  • Use Bayes rule to transform this equation into a
    set of other probabilities that are easier to
    compute

9
Goal of POS Tagging
  • We want the best set of tags for a sequence of
    words (a sentence)
  • W a sequence of words
  • T a sequence of tags

Our Goal
  • Example
  • P((NN NN P DET ADJ NN) (heat oil in a
    large pot))

10
Reminder ApplyBayes Theorem (1763)
likelihood
prior
posterior
Our Goal To maximize it!
marginal likelihood
Reverend Thomas Bayes Presbyterian minister
(1702-1761)
11
How to Count
  • P(WT) and P(T) can be counted from a large
  • hand-tagged corpus and smooth them to get
    rid of the zeroes

12
Count P(WT) and P(T)
  • Assume each word in the sequence depends only on
    its corresponding tag

13
Count P(T)
history
  • Make a Markov assumption and use N-grams over
    tags ...
  • P(T) is a product of the probability of N-grams
    that make it up

14
Example a Moore Machine
  • Goal What is the most probable sequence of
    animals if you hear Moo, Hello, Quack.

Hello!
15
A Hidden Markov Model (HMM)
16
The State Space of a Moore Machine
17
Viterbi Decoding of a Moore Machine
quack
moo
hello

t0
t1
t2
t3
t4
START
1
0
0
0
0
110.9
0.90.50.1
COW
0.9
0
0.045
0
0
0.0450.30.6
0.90.30.4
0.0081
DUCK
0
0
0
0.108
0.0324
0.1080.50.6
0.3240.21
END
0
0
0
0
0.00648
18
Computing Probabilities
  • viterbi s, t max(s) ( viterbi s,
    t-1 transition probability P(s s)
    emission probability P (tokent s) )
  • for each s, t
  • record which s, t-1 contributed the maximum

19
Analyzing
  • Fish sleep.

20
A Simple POS HMM
21
Word Emission ProbabilitiesP ( word state )
  • A two-word language fish and sleep
  • Suppose in our training corpus,
  • fish appears 8 times as a noun and 4 times as a
    verb
  • sleep appears twice as a noun and 6 times as a
    verb
  • Emission probabilities
  • Noun
  • P(fish noun) 0.8
  • P(sleep noun) 0.2
  • Verb
  • P(fish verb) 0.4
  • P(sleep verb) 0.6

22
Viterbi Probabilities

23

24

Token 1 fish
25

Token 1 fish
26

Token 2 sleep (if fish is verb)
27

Token 2 sleep (if fish is verb)
28

Token 2 sleep (if fish is a noun)
29

Token 2 sleep (if fish is a noun)
30

Token 2 sleeptake maximum,set back pointers
31

Token 2 sleeptake maximum,set back pointers
32

Token 3 end
33

Token 3 endtake maximum,set back pointers
34

Decodefish nounsleep verb
Write a Comment
User Comments (0)
About PowerShow.com