Title: Word sense disambiguation (1)
1- Word sense disambiguation (1)
- Instructor Rada Mihalcea
- Note Some of the material in this slide set was
adapted from a tutorial given by Rada Mihalcea
Ted Pedersen at ACL 2005
2Definitions
- Word sense disambiguation is the problem of
selecting a sense for a word from a set of
predefined possibilities. - Sense Inventory usually comes from a dictionary
or thesaurus. - Knowledge intensive methods, supervised learning,
and (sometimes) bootstrapping approaches - Word sense discrimination is the problem of
dividing the usages of a word into different
meanings, without regard to any particular
existing sense inventory. - Unsupervised techniques
3Computers versus Humans
- Polysemy most words have many possible
meanings. - A computer program has no basis for knowing which
one is appropriate, even if it is obvious to a
human - Ambiguity is rarely a problem for humans in their
day to day communication, except in extreme
cases
4Ambiguity for Humans - Newspaper Headlines!
- DRUNK GETS NINE YEARS IN VIOLIN CASE
- FARMER BILL DIES IN HOUSE
- PROSTITUTES APPEAL TO POPE
- STOLEN PAINTING FOUND BY TREE
- RED TAPE HOLDS UP NEW BRIDGE
- DEER KILL 300,000
- RESIDENTS CAN DROP OFF TREES
- INCLUDE CHILDREN WHEN BAKING COOKIES
- MINERS REFUSE TO WORK AFTER DEATH
5Ambiguity for a Computer
- The fisherman jumped off the bank and into the
water. - The bank down the street was robbed!
- Back in the day, we had an entire bank of
computers devoted to this problem. - The bank in that road is entirely too steep and
is really dangerous. - The plane took a bank to the left, and then
headed off towards the mountains.
6Early Days of WSD
- Noted as problem for Machine Translation (Weaver,
1949) - A word can often only be translated if you know
the specific sense intended (A bill in English
could be a pico or a cuenta in Spanish) - Bar-Hillel (1960) posed the following
- Little John was looking for his toy box. Finally,
he found it. The box was in the pen. John was
very happy. - Is pen a writing instrument or an enclosure
where children play? - declared it unsolvable, left the field of MT!
-
-
7Since then
- 1970s - 1980s
- Rule based systems
- Rely on hand crafted knowledge sources
- 1990s
- Corpus based approaches
- Dependence on sense tagged text
- (Ide and Veronis, 1998) overview history from
early days to 1998. - 2000s
- Hybrid Systems
- Minimizing or eliminating use of sense tagged
text - Taking advantage of the Web
8Practical Applications
- Machine Translation
- Translate bill from English to Spanish
- Is it a pico or a cuenta?
- Is it a bird jaw or an invoice?
- Information Retrieval
- Find all Web Pages about cricket
- The sport or the insect?
- Question Answering
- What is George Millers position on gun control?
- The psychologist or US congressman?
- Knowledge Acquisition
- Add to KB Herb Bergson is the mayor of Duluth.
- Minnesota or Georgia?
9Knowledge-based WSD
- Task definition
- Knowledge-based WSD class of WSD methods
relying (mainly) on knowledge drawn from
dictionaries and/or raw text - Resources
- Yes
- Machine Readable Dictionaries
- Raw corpora
- No
- Manually annotated corpora
- Scope
- All open-class words
10Machine Readable Dictionaries
- In recent years, most dictionaries made available
in Machine Readable format (MRD) - Oxford English Dictionary
- Collins
- Longman Dictionary of Ordinary Contemporary
English (LDOCE) - Thesauruses add synonymy information
- Roget Thesaurus
- Semantic networks add more semantic relations
- WordNet
- EuroWordNet
11MRD A Resource for Knowledge-based WSD
- For each word in the language vocabulary, an MRD
provides - A list of meanings
- Definitions (for all word meanings)
- Typical usage examples (for most word meanings)
12MRD A Resource for Knowledge-based WSD
- A thesaurus adds
- An explicit synonymy relation between word
meanings - A semantic network adds
- Hypernymy/hyponymy (IS-A), meronymy/holonymy
(PART-OF), antonymy, entailnment, etc.
WordNet synsets for the noun plant 1.
plant, works, industrial plant 2. plant,
flora, plant life
WordNet related concepts for the meaning plant
life plant, flora, plant life
hypernym organism, being
hypomym house plant, fungus,
meronym plant tissue, plant part
holonym Plantae, kingdom Plantae, plant
kingdom
13Lesk Algorithm
- (Michael Lesk 1986) Identify senses of words in
context using definition overlap - Algorithm
- Retrieve from MRD all sense definitions of the
words to be disambiguated - Determine the definition overlap for all possible
sense combinations - Choose senses that lead to highest overlap
- Example disambiguate PINE CONE
- PINE
- 1. kinds of evergreen tree with needle-shaped
leaves - 2. waste away through sorrow or illness
- CONE
- 1. solid body which narrows to a point
- 2. something of this shape whether solid or
hollow - 3. fruit of certain evergreen trees
Pine1 ? Cone1 0 Pine2 ? Cone1 0 Pine1 ?
Cone2 1 Pine2 ? Cone2 0 Pine1 ? Cone3
2 Pine2 ? Cone3 0
14Lesk Algorithm for More than Two Words?
- I saw a man who is 98 years old and can still
walk and tell jokes - nine open class words see(26), man(11),
year(4), old(8), can(5), still(4), walk(10),
tell(8), joke(3) - 43,929,600 sense combinations! How to find the
optimal sense combination? - Simulated annealing (Cowie, Guthrie, Guthrie
1992) - Define a function E combination of word senses
in a given text. - Find the combination of senses that leads to
highest definition overlap (redundancy) - 1. Start with E the most frequent sense
for each word - 2. At each iteration, replace the sense of a
random word in the set with a different sense,
and measure E - 3. Stop iterating when there is no change in
the configuration of senses
15Lesk Algorithm A Simplified Version
- Original Lesk definition measure overlap between
sense definitions for all words in context - Identify simultaneously the correct senses for
all words in context - Simplified Lesk (Kilgarriff Rosensweig 2000)
measure overlap between sense definitions of a
word and current context - Identify the correct sense for one word at a time
- Search space significantly reduced
16Lesk Algorithm A Simplified Version
- Algorithm for simplified Lesk
- Retrieve from MRD all sense definitions of the
word to be disambiguated - Determine the overlap between each sense
definition and the current context - Choose the sense that leads to highest overlap
- Example disambiguate PINE in
- Pine cones hanging in a tree
- PINE
- 1. kinds of evergreen tree with needle-shaped
leaves - 2. waste away through sorrow or illness
Pine1 ? Sentence 1 Pine2 ? Sentence 0
17Evaluations of Lesk Algorithm
- Initial evaluation by M. Lesk
- 50-70 on short samples of text manually
annotated set, with respect to Oxford Advanced
Learners Dictionary - Simulated annealing
- 47 on 50 manually annotated sentences
- Evaluation on Senseval-2 all-words data, with
back-off to random sense (Mihalcea Tarau 2004) - Original Lesk 35
- Simplified Lesk 47
- Evaluation on Senseval-2 all-words data, with
back-off to most frequent sense (Vasilescu,
Langlais, Lapalme 2004) - Original Lesk 42
- Simplified Lesk 58
18Selectional Preferences
- A way to constrain the possible meanings of words
in a given context - E.g. Wash a dish vs. Cook a dish
- WASH-OBJECT vs. COOK-FOOD
- Capture information about possible relations
between semantic classes - Common sense knowledge
- Alternative terminology
- Selectional Restrictions
- Selectional Preferences
- Selectional Constraints
19Acquiring Selectional Preferences
- From annotated corpora
- Circular relationship with the WSD problem
- Need WSD to build the annotated corpus
- Need selectional preferences to derive WSD
- From raw corpora
- Frequency counts
- Information theory measures
- Class-to-class relations
20Preliminaries Learning Word-to-Word Relations
- An indication of the semantic fit between two
words - 1. Frequency counts
- Pairs of words connected by a syntactic relations
- 2. Conditional probabilities
- Condition on one of the words
21Learning Selectional Preferences (1)
- Word-to-class relations (Resnik 1993)
- Quantify the contribution of a semantic class
using all the concepts subsumed by that class - where
22Learning Selectional Preferences (2)
- Determine the contribution of a word sense based
on the assumption of equal sense distributions - e.g. plant has two senses ? 50 occurrences are
sense 1, 50 are sense 2 - Example learning restrictions for the verb to
drink - Find high-scoring verb-object pairs
- Find prototypical object classes (high
association score)
23Using Selectional Preferences for WSD
- Algorithm
- 1. Learn a large set of selectional preferences
for a given syntactic relation R - 2. Given a pair of words W1 W2 connected by a
relation R - 3. Find all selectional preferences W1 C
(word-to-class) or C1 C2 (class-to-class) that
apply - 4. Select the meanings of W1 and W2 based on the
selected semantic class
- Example disambiguate coffee in drink coffee
- 1. (beverage) a beverage consisting of an
infusion of ground coffee beans - 2. (tree) any of several small trees native to
the tropical Old World - 3. (color) a medium to dark brown color
-
Given the selectional preference DRINK BEVERAGE
coffee1
24Evaluation of Selectional Preferences for WSD
- Data set
- mainly on verb-object, subject-verb relations
extracted from SemCor - Compare against random baseline
- Results (Agirre and Martinez, 2000)
- Average results on 8 nouns
- Similar figures reported in (Resnik 1997)
25Semantic Similarity
- Words in a discourse must be related in meaning,
for the discourse to be coherent (Haliday and
Hassan, 1976) - Use this property for WSD Identify related
meanings for words that share a common context - Context span
- 1. Local context semantic similarity between
pairs of words - 2. Global context lexical chains
26Semantic Similarity in a Local Context
- Similarity determined between pairs of concepts,
or between a word and its surrounding context - Relies on similarity metrics on semantic networks
- (Rada et al. 1989)
carnivore
bear
feline, felid
canine, canid
fissiped mamal, fissiped
wild dog
wolf
hyena
dog
hunting dog
hyena dog
dingo
dachshund
terrier
27Semantic Similarity Metrics for WSD
- Disambiguate target words based on similarity
with one word to the left and one word to the
right - (Patwardhan, Banerjee, Pedersen 2002)
- Evaluation
- 1,723 ambiguous nouns from Senseval-2
- Among 5 similarity metrics, (Jiang and Conrath
1997) provide the best precision (39)
- Example disambiguate PLANT in plant with
flowers - PLANT
- plant, works, industrial plant
- plant, flora, plant life
- Similarity (plant1, flower) 0.2
- Similarity (plant2, flower) 1.5
plant2
28Semantic Similarity in a Global Context
- Lexical chains (Hirst and St-Onge 1988), (Haliday
and Hassan 1976) - A lexical chain is a sequence of semantically
related words, which creates a context and
contributes to the continuity of meaning and the
coherence of a discourse - Algorithm for finding lexical chains
- Select the candidate words from the text. These
are words for which we can compute similarity
measures, and therefore most of the time they
have the same part of speech. - For each such candidate word, and for each
meaning for this word, find a chain to receive
the candidate word sense, based on a semantic
relatedness measure between the concepts that are
already in the chain, and the candidate word
meaning. - If such a chain is found, insert the word in this
chain otherwise, create a new chain.
29Semantic Similarity of a Global Context
A very long train traveling along the rails with
a constant velocity v in a certain direction
train
1 public transport
1 change location
2 a bar of steel for trains
2 order set of things
3 piece of cloth
travel
2 undergo transportation
rail
1 a barrier
3 a small bird
30Lexical Chains for WSD
- Identify lexical chains in a text
- Usually target one part of speech at a time
- Identify the meaning of words based on their
membership to a lexical chain - Evaluation
- (Galley and McKeown 2003) lexical chains on 74
SemCor texts give 62.09 - (Mihalcea and Moldovan 2000) on five SemCor texts
give 90 with 60 recall - lexical chains anchored on monosemous words
- (Okumura and Honda 1994) lexical chains on five
Japanese texts give 63.4
31Heuristics Most Frequent Sense
- Identify the most often used meaning and use this
meaning by default - Word meanings exhibit a Zipfian distribution
- E.g. distribution of word senses in SemCor
Example plant/flora is used more often than
plant/factory - annotate any instance of
PLANT as plant/flora
32Heuristics One Sense Per Discourse
- A word tends to preserve its meaning across all
its occurrences in a given discourse (Gale,
Church, Yarowksy 1992) - What does this mean?
- Evaluation
- 8 words with two-way ambiguity, e.g. plant,
crane, etc. - 98 of the two-word occurrences in the same
discourse carry the same meaning - The grain of salt Performance depends on
granularity - (Krovetz 1998) experiments with words with more
than two senses - Performance of one sense per discourse measured
on SemCor is approx. 70
E.g. The ambiguous word PLANT occurs 10 times in
a discourse all instances of plant
carry the same meaning
33Heuristics One Sense per Collocation
- A word tends to preserver its meaning when used
in the same collocation (Yarowsky 1993) - Strong for adjacent collocations
- Weaker as the distance between words increases
- An example
- Evaluation
- 97 precision on words with two-way ambiguity
- Finer granularity
- (Martinez and Agirre 2000) tested the one sense
per collocation hypothesis on text annotated
with WordNet senses - 70 precision on SemCor words
The ambiguous word PLANT preserves its meaning in
all its occurrences within the collocation
industrial plant, regardless of the context
where this collocation occurs