Part-of-speech Tagging - PowerPoint PPT Presentation

About This Presentation
Title:

Part-of-speech Tagging

Description:

Max-Ent Hand picked features did not help much, but adding prefixes and ... Change tag A to tag B when the the tag Z appears within [N] positions of the ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 8
Provided by: Tim885
Learn more at: https://nlp.stanford.edu
Category:
Tags: part | speech | tagging

less

Transcript and Presenter's Notes

Title: Part-of-speech Tagging


1
Part-of-speech Tagging
  • cs224n Final project
  • Spring, 2008
  • Tim Lai

2
POS Tagging 3 general techniques
  • 1. Rule based system
  • Relies on a hand-picked set of rules
  • Performance is not very good
  • 2. Stochastic methods
  • HMM with Viterbi algorithm to determine best
    tagging
  • Uses emission probabilities, i.e. P(word tag)
  • and transition probabilities, i.e. P(prevTag
    currentTag)
  • Maximum Entropy models also useful
  • 3. Hybrid of the two
  • Rules-based system to do POS tagging
  • Uses rule templates and learns useful rules
    during training

3
Simple HMM vs Max-Ent
  • HMM using bigrams for transition probabilities
  • Max-Ent using simple features such as previous
    tag and current word

4
Error Analysis
  • HMM and Max-Ent both perform well when tested on
    data from same domain
  • Only 6.6 of words were ambiguous, making known
    words easy to tag
  • Accuracy drops when using test data from another
    domain
  • Most errors are caused by unknown words, or the
    POS tagging of words near unknown words.
  • In sentences without unknown words, accuracy
    99!
  • Most common mistake is mis-tagging JJ as NN
  • Need to enhance both taggers to deal with
    unknowns.

5
Enhancement ideas
  • For HMM
  • Transition probabilities can be modeled using
    trigrams, taking more context information into
    account when word is unknown
  • For Max-Ent
  • Word shapes, word features, and more context can
    help
  • Results
  • HMM Switching from Unigram to Bigram helps a
    lot, but using Trigram doesnt help much.
  • Max-Ent Hand picked features did not help much,
    but adding prefixes and suffixes were most
    helpful.

6
Transformation-based tagging
  • One more idea to try using rule-based templates
    to learn POS tagging rules
  • Sample rule template
  • Change tag A to tag B when the preceding
    following word is tagged Z.
  • Change tag A to tag B when the the tag Z appears
    within N positions of the current word.
  • Result
  • Using a very restricted set of rule templates,
    accuracy went up 0.5

7
Final results
  • HMM with bigram and rule-based adjustments
  • Max-Ent with prefix/suffix, word shape features
    and rule-based adjustments
  • Max-Ent performs better, with 97 accuracy
    achievable
Write a Comment
User Comments (0)
About PowerShow.com