POS Tagging Markov Models PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: POS Tagging Markov Models


1
POS TaggingMarkov Models
2
POS Tagging
  • Purpose to give us explicit information about
    the structure of a text, and of the language
    itself, without necessarily having a complete
    understanding of the text
  • To feed other NLP applications/processes
  • Chunking (feeds IE tasks)
  • Speech Recognition
  • IR
  • Stemming (to more accurately stem)
  • QA
  • Adding more structure (Parsing in all its
    flavors)

3
Tags
  • Most common PTBs 45 tags
  • Another common one CLAWS7 (BNC), 140 tags (up
    from a historic 62 tags)

4
Approaches to Tagging
  • Rule-based tagging
  • Hand constructed
  • ENGTWOL (Voutilainen 1995)
  • Stochastic tagging
  • Tag probabilities learned from training corpus
    drive tagging
  • Transformation-based tagging
  • Rule-based
  • Rules learned from training corpus
  • Brills tagger (Brill 1995)

5
A Really Stupid Tagger
  • Read the words and tags from a POS tagged corpus
  • Count the of tags for any given word
  • Calculate the frequency for each tag-word pair
  • Ignore all but the most frequent (for each word)
  • Use the frequencies thus learned to tag a text
  • Sound familiar?
  • HW3! (All but last 2 steps.)

6
A Really Stupid Tagger
  • But Charniak 1993 showed
  • Such a tagger has an accuracy of 90
  • An early rule-based tagger (Greene and Rubin
    1971), using hand-coded rules and patterns got
    77 right
  • The best stochastic taggers around hit about 95
    (controlled experiments approach 99)
  • Lets just give up and go home!

7
A Smarter Tagger
  • Assume that a words tag is dependent on what
    tags precede it.
  • Therefore, we would assume that the history of
    a word affects how it will be tagged.
  • So what is more likely
  • a/DT truly/RB fabulous/JJ play/NN
  • a/DT truly/RB fabulous/JJ play/VB

8
A Smarter Tagger
  • So what is more likely
  • a/DT truly/RB fabulous/JJ play/NN
  • a/DT truly/RB fabulous/JJ play/VB
  • C(JJ,NN)
  • P(NNJJ) ------------- 0.45
  • C(JJ)
  • C(JJ,VB)
  • P(VBJJ) ------------- 0.0005
  • C(JJ)
  • ?1 is more likely than 2 (because P(NNJJ) gt
    P(VBJJ)
  • Nothing beyond the JJ,NN vs. JJ,VBD transitions
    matters (well, almost)

9
Stochastic Tagging
  • Assume that a words tag is dependent only on the
    preceding tag(s)
  • Could be just one
  • Could me more than one
  • Train on a tagged corpus to
  • Learn probabilities for various tag-tag sequences
  • Learn the possible tags for each word (and the
    associated probabilities)

10
Markov Tagger
  • What is the goal of a Markov Tagger?
  • To maximize the following equation
  • P(witj) ? P(tjt1,j-1)

11
Markov Tagger
  • A sequence of tags in text can be thought of as a
    Markov chain
  • Markov chains have the following property
  • Limited horizon
  • P(Xi1 tjX1,Xi) P(Xi1 tjXi)
  • or, following Charniaks notation
  • P(ti1t1,i) P(ti1ti)
  • Thus a words tag depends only on the previous
    tag (limited memory).

12
Next Time
  • For next time, bring MS Charniak 93
  • Read the appropriate sections in 9 and 10. Study
    10 over 9 (for now).
Write a Comment
User Comments (0)
About PowerShow.com