Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky - PowerPoint PPT Presentation

About This Presentation
Title:

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

Description:

One sense per collocation : ... Calculate log-likelihood ratio of word-sense probability for each collocation: ... defining seed collocation for each possible ... – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 17
Provided by: sonjiawa
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky


1
Unsupervised Word Sense Disambiguation Rivaling
Supervised MethodsDavid Yarowsky
  • G22.2591 Presentation, Sonjia Waxmonsky

2
Introduction
  • Presents unsupervised learning algorithm for word
    sense disambiguation that can be applied to
    completely untagged text
  • Based on supervised machine learning algorithm
    that uses decision lists
  • Performance matches that of supervised system

3
Properties of Language
  • One sense per collocation
  • Nearby words provide strong and consistent clues
    as to the sense of a target word
  • One sense per discourse
  • The sense of a target word is highly consistent
    within a single document

4
Decision List Algorithm
  • Supervised algorithm
  • Based on One sense per collocation property
  • Start with large set of possible collocations
  • Calculate log-likelihood ratio of word-sense
    probability for each collocation
  • Higher log-likelihood more predictive evidence
  • Collocations are ordered in a decision list, with
    most predictive collocations ranked highest

5
Decision List Algorithm
  • Decision list is used to classify instances of
    target word

the loss of animal and plant species through
extinction
Classification is based on the highest ranking
rule that matches the target context
LogL Collocation Sense

9.31 flower (within /- k words) A (living)
9.24 job (within /- k words) B (factory)
9.03 fruit (within /- k words) A (living)
9.02 plant species A (living)
... ...

6
Advantage of Decision Lists
  • Multiple collocations may match a single context
  • But, only the single most predictive piece of
    evidence is used to classify the target word
  • Result The classification procedure combines a
    large amount of non-independent information
    without complex modeling

7
Bootstrapping Algorithm
Sense-A life
Sense-B factory
  • All occurrences of the target word are identified
  • A small training set of seed data is tagged with
    word sense

8
Selecting Training Seeds
  • Initial training set should accurately
    distinguish among possible senses
  • Strategies
  • Select a single, defining seed collocation for
    each possible sense.
  • Ex life and manufacturing for target plant
  • Use words from dictionary definitions
  • Hand-label most frequent collocates

9
Bootstrapping Algorithm
  • Iterative procedure
  • Train decision list algorithm on seed set
  • Classify residual data with decision list
  • Create new seed set by identifying samples that
    are tagged with a probability above a certain
    threshold
  • Retrain classifier on new seed set

10
Bootstrapping Algorithm
  • Seed set grows and residual set shrinks .

11
Bootstrapping Algorithm
  • Convergence Stop when residual set stabilizes

12
Final Decision List
  • Original seed collocations may not necessarily be
    at the top of the list
  • Possible for sample in the original seed data to
    be reclassified
  • Initial misclassifications in seed data can be
    corrected

13
One Sense per Discourse
  • Algorithm can be improved by applying One Sense
    per Discourse constraint
  • After algorithm has converged
  • Identify tokens tagged with low confidence, label
    with dominant tag of that document
  • After each iteration
  • Extend tag to all examples in a single document
    after enough examples are tagged with a single
    sense

14
Evaluation
  • Test corpus extracted from 460 million word
    corpus of multiple sources (news articles,
    transcripts, novels, etc.)
  • Performance of multiple models compared with
  • supervised decision lists
  • unsupervised learning algorithm of Schütze
    (1992), based on alignment of clusters with word
    senses

15
Results
  • Applying the One sense per discourse constraint
    improves performance

Word Senses Unsupervised (Dictionary seed data) Unsupervised - Applying One Sense per Discourse Unsupervised - Applying One Sense per Discourse
Word Senses Unsupervised (Dictionary seed data) After last iter. After each iter.
plant living/factory 97.3 98.3 98.6
space volume/outer 92.3 93.3 93.6
tank vehicle/ container 94.6 97.8 96.5
motion legal/physical 97.4 98.5 97.9
-
Average - 94.8 96.1 96.5
Accuracy ()
16
Results
  • Accuracy exceeds Schütze algorithm for all target
    words, and matches that of supervised algorithm

Word Senses Supervised Unsupervised / Schütze Unsupervised / Bootstrapping
plant living/factory 97.7 92 98.6
space volume/outer 93.9 90 93.6
tank vehicle/ container 97.1 95 96.5
motion legal/physical 98.0 92 97.9
-
Average - 96.1 92.2 96.5
Accuracy ()
Write a Comment
User Comments (0)
About PowerShow.com