Title: Word Senses and Word Sense Disambiguation
1Word Senses and Word Sense Disambiguation
- CIS 530 Introduction to NLP
2I. Lexical Meaning Word Sense
- Slides adapted from slides by
- Bonnie Dorr, Martha Palmer, David Yarowsky,
3An Ambiguous Word bank
- The bank on State Street
- Two clear senses
- The rising ground bordering a lake, river or sea
- An establishment for the custody, loan exchange,
or issue of money, for the extension of credit,
and for facilitating the transmission or funds - Different senses are not always so easily
delineated.
4Word Sense Disambiguation is Important for
Machine Translation
- Iraq lost the battle.
- Ilakuka centwey ciessta.
- Iraq battle lost.
- John lost his computer.
- John-i computer-lul ilepelyessta.
- John computer misplaced.
- (Korean)
5WSD Is Required for Speech Synthesis
- slightly elevated lead levels
- ? lead role (rhymes with seed) or
- ? lead mines (rhymes with bed)
- The speaker produces too little bass
- ? string bass (rhymes with vase) or
- ? sea bass (rhymes with lass)
6French/Spanish Accent Restoration
- une famille des pecheurs
- ? pêcheurs (meaning fisherman) or
- ? pécheurs (meaning sinners)
-
7WSD Really Requires Semantic Constraints
- Iraq lost the battle.
- Ilakuka centwey ciessta.
- Iraq battle lost.
- John lost his computer.
- John-i computer-lul ilepelyessta.
- John computer misplaced.
- Semantic Constraints
- lose1(Agent, Patient competition) ltgt ciessta
- lose2 (Agent, Patient physobj) ltgt
ilepelyessta
8Lexical Relations I Homonomy
- A bank holds investments in a custodial account
- Agriculture is burgeoning on the east bank
- Variants
- homophones read vs. red
- homographs bass vs. bass
9Lexical Relations II Polysemy
- The bank is constructed from red brickI withdrew
the money from the bank - Distinguishing polysemy from homonymy is not
straightforward
10Word Sense Disambiguation
- For any given lexeme, can its senses be reliably
distinguished? - Assumes a fixed set of senses for each lexical
item
11Lexical Relations III Synonymy
- What is synonymy?
- How big is that plane?
- How large is that plane?
- Very hard to find true synonyms
- A big fat apple
- ?A large fat apple
- Influences on substitutability
- subtle shades of meaning differences
- polysemy
- register
- collocational constraints
12WordNet
- Most widely used hierarchically organized lexical
database for English (Fellbaum, 1998)
Demo http//www.cogsci.princeton.edu/wn/
13Word Sense and OntoNotes
- Meaning of nouns and verbs are specified using a
catalog of possible senses - All the senses are annotatable at 90 ITA
Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
- Enter into an official record
- Wish, purpose or intend to achieve something
14WSD with OntoNotes verbs
- Picked 217 verbs with the most number of
instances annotated with sense groupings - 35K instances total
- WN polysemy of 10.4 reduced to 5.1
- WN polysemy range 59 to 2
- Coarse polysemy range 16 to 2
- Results
- Average Baseline accuracy 0.6803
- Average ITA 0.8253
- Average MaxEnt accuracy 0.8272 (no stat. sign.
w. ITA) - Average SVM accuracy 0.8220 (no stat. sign. w.
ITA) - Also tried other classifiers with worse results
14
15Format of WordNet Entries
16Distribution of Senses among WordNet Verbs
17Lexical Relations in WordNet
18Synsets in WordNet
- Example chump, fish, fool, gull, mark, patsy,
fall guy, sucker, schlemiel, shlemiel, soft
touch, mug - Definition a person who is gullible and easy to
take advantage of. Â - Important This exact synset makes up one sense
for each of the entries listed in the synset. - Theoretically, each synset can be viewed as a
concept in a taxonomy - WN represents give as 45 senses, one of which
is the synset supply, provide, render, furnish.
19Hyponomy in WordNet
20II. Decision Lists for Word Sense Disambiguation
- Slides adapted from slides by
- David Yarowsky
- (describing Davids PhD dissertation work)
21Decision Lists for Homonym Disambiguation
22Outline of Decision List Algorithm I
23Step 2 Collect Training Contexts
24Step 3 Measure Collocational Distributions
25Step 3 Measure Collocational Distributions
26Step 4 Sort by Log-Likelihood
27Step 5 Classify New Data
28Performance Accent Restoration
29Performance WSD Machine Translation
30Performance Speech Synthesis
31Comparative Evaluation I
32Comparative Evaluation II
33An Unsupervised(!) Algorithm
- Yarowsky, D. Decision Lists for Lexical
Ambiguity Resolution Application to Accent
Restoration in Spanish and French.' - In Proceedings of the 32nd Annual Meeting of the
Association for Computational Linguistics. Las
Cruces, NM, pp. 88-95, 1994.
34One Sense per discourse hypothesis
- Words tend to exhibit only one sense in a given
discourse or document
35Step 1 Identify all examples of target word
- Store contexts in initial untagged training set
36Step 2 Tag examples
- For each sense, identify a small set of labelled
training examples - Use seed words Plant manufacturing vs. life
37Sample Initial State after Step 2
38Step 3a Run supervised algorithm
39OSPD constraint
40Steps 3b, 3c Apply classifier ( OSPD) to all
41Iterate until Done.
42Final Decision List
43Evaluation