Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

About This Presentation

Title:

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

Description:

One sense per collocation : ... Calculate log-likelihood ratio of word-sense probability for each collocation: ... defining seed collocation for each possible ... – PowerPoint PPT presentation

Number of Views:201

Avg rating:3.0/5.0

Slides: 17

Provided by: sonjiawa

Learn more at: https://cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

1
Unsupervised Word Sense Disambiguation Rivaling
Supervised MethodsDavid Yarowsky

G22.2591 Presentation, Sonjia Waxmonsky

2
Introduction

Presents unsupervised learning algorithm for word
sense disambiguation that can be applied to
completely untagged text
Based on supervised machine learning algorithm
that uses decision lists
Performance matches that of supervised system

3
Properties of Language

One sense per collocation
Nearby words provide strong and consistent clues
as to the sense of a target word
One sense per discourse
The sense of a target word is highly consistent
within a single document

4
Decision List Algorithm

Supervised algorithm
Based on One sense per collocation property
Start with large set of possible collocations
Calculate log-likelihood ratio of word-sense
probability for each collocation
Higher log-likelihood more predictive evidence
Collocations are ordered in a decision list, with
most predictive collocations ranked highest

5
Decision List Algorithm

Decision list is used to classify instances of
target word

the loss of animal and plant species through
extinction
Classification is based on the highest ranking
rule that matches the target context
LogL Collocation Sense

9.31 flower (within /- k words) A (living)
9.24 job (within /- k words) B (factory)
9.03 fruit (within /- k words) A (living)
9.02 plant species A (living)
... ...

6
Advantage of Decision Lists

Multiple collocations may match a single context
But, only the single most predictive piece of
evidence is used to classify the target word
Result The classification procedure combines a
large amount of non-independent information
without complex modeling

7
Bootstrapping Algorithm
Sense-A life
Sense-B factory

All occurrences of the target word are identified
A small training set of seed data is tagged with
word sense

8
Selecting Training Seeds

Initial training set should accurately
distinguish among possible senses
Strategies
Select a single, defining seed collocation for
each possible sense.
Ex life and manufacturing for target plant
Use words from dictionary definitions
Hand-label most frequent collocates

9
Bootstrapping Algorithm

Iterative procedure
Train decision list algorithm on seed set
Classify residual data with decision list
Create new seed set by identifying samples that
are tagged with a probability above a certain
threshold
Retrain classifier on new seed set

10
Bootstrapping Algorithm

Seed set grows and residual set shrinks .

11
Bootstrapping Algorithm

Convergence Stop when residual set stabilizes

12
Final Decision List

Original seed collocations may not necessarily be
at the top of the list
Possible for sample in the original seed data to
be reclassified
Initial misclassifications in seed data can be
corrected

13
One Sense per Discourse

Algorithm can be improved by applying One Sense
per Discourse constraint
After algorithm has converged
Identify tokens tagged with low confidence, label
with dominant tag of that document
After each iteration
Extend tag to all examples in a single document
after enough examples are tagged with a single
sense

14
Evaluation

Test corpus extracted from 460 million word
corpus of multiple sources (news articles,
transcripts, novels, etc.)
Performance of multiple models compared with
supervised decision lists
unsupervised learning algorithm of Schütze
(1992), based on alignment of clusters with word
senses

15
Results

Applying the One sense per discourse constraint
improves performance

Word Senses Unsupervised (Dictionary seed data) Unsupervised - Applying One Sense per Discourse Unsupervised - Applying One Sense per Discourse
Word Senses Unsupervised (Dictionary seed data) After last iter. After each iter.
plant living/factory 97.3 98.3 98.6
space volume/outer 92.3 93.3 93.6
tank vehicle/ container 94.6 97.8 96.5
motion legal/physical 97.4 98.5 97.9
-
Average - 94.8 96.1 96.5
Accuracy ()
16
Results

Accuracy exceeds Schütze algorithm for all target
words, and matches that of supervised algorithm

Word Senses Supervised Unsupervised / Schütze Unsupervised / Bootstrapping
plant living/factory 97.7 92 98.6
space volume/outer 93.9 90 93.6
tank vehicle/ container 97.1 95 96.5
motion legal/physical 98.0 92 97.9
-
Average - 96.1 92.2 96.5
Accuracy ()

Write a Comment

User Comments (0)