Classifying Parts of Speech Based on Sparse Data - PowerPoint PPT Presentation

About This Presentation
Title:

Classifying Parts of Speech Based on Sparse Data

Description:

Classifying Parts of Speech Based on Sparse Data Katherine Brainard The Problem Sparse data has little contextual information Many words fall into this category ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 6
Provided by: Kathe175
Learn more at: https://nlp.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Classifying Parts of Speech Based on Sparse Data


1
Classifying Parts of Speech Based on Sparse Data

Katherine Brainard
2
The Problem
  • Sparse data has little contextual information
  • Many words fall into this category
  • Automatic PoS taggers and finders are useful

3
Approach
  • Relatively easy to learn categories from frequent
    words
  • Infrequent words often more regular than their
    common counterparts
  • Learn frequent words, then use these to classify
    infrequent
  • Uses clustering for the frequent words

4
Evaluating the Model
  • Somewhat tricky - want eval function that doesnt
    encourage degenerate behavior
  • Evaluation separated from clustering
  • Used both bigram probability model and comparison
    with already-tagged data

5
Results
  • Improvement of 36 from delaying processing of
    data
  • About 2.5 times better than classifying
    infrequent words into one lump
  • Using just contextual data produced the best
    performance
Write a Comment
User Comments (0)
About PowerShow.com