Title: Extended Gloss Overlaps as a Measure of Semantic Relatedness
1Extended Gloss Overlaps as a Measure of Semantic
Relatedness
- Satanjeev Banerjee Ted Pedersen
- Carnegie Mellon University University
of Minnesota Duluth - Supported by NSF Grants 0092784, REC-9979894
2Semantic Relatedness
- Some pairs of words are closer in meaning than
others - E.g. car tire are strongly related
- car tree are not strongly related
- Relatedness between words can consist of
- Synonymy e.g. car automobile
- Is-a/has-a relationships e.g. car tire
- Co-occurrence e.g. car insurance
3Goal of this Paper
- Create a measure to quantify semantic relatedness
- Most existing work measures noun-noun only.
- Resnik (1995), Lin (1997), Jiang-Conrath (1997),
- Leacock-Chodorow (1998)
- We can measure across parts of speech.
- Based on WordNet definitions and relations.
- Evaluate
- Using word sense disambiguation.
- Compare to human relatedness judgments (in paper)
4Description of WordNet
- Online English lexical database.
- Like dictionaries, contains word senses and their
definitions or glosses - E.g. sentence the penalty meted out to one
adjudged guilty - Word senses that mean the same are grouped into
synonym sets or synsets - E.g. sentence, conviction, condemnation
5Semantic Relations in WordNet
Synsets are connected to other synsets through
semantic relations
sentence the penalty meted out to one adjudged
guilty
6Semantic Relations in WordNet
Synsets are connected to other synsets through
semantic relations
final judgment a judgment disposing of the
case before the court of law
a sentence is a
sentence the penalty meted out to one adjudged
guilty
7Semantic Relations in WordNet
Synsets are connected to other synsets through
semantic relations
final judgment a judgment disposing of the
case before the court of law
a sentence is a
hypernym
sentence the penalty meted out to one adjudged
guilty
8Semantic Relations in WordNet
Synsets are connected to other synsets through
semantic relations
final judgment a judgment disposing of the
case before the court of law
a sentence is a
hypernym
sentence the penalty meted out to one adjudged
guilty
is a sentence
is a sentence
hard time term served in a maximum security
prison
death penalty punishment by death via
execution
9Semantic Relations in WordNet
Synsets are connected to other synsets through
semantic relations
final judgment a judgment disposing of the
case before the court of law
a sentence is a
hypernym
sentence the penalty meted out to one adjudged
guilty
is a sentence
hyponym
is a sentence
hyponym
hard time term served in a maximum security
prison
death penalty punishment by death via
execution
10Gloss Overlaps Relatedness
- Lesks (1986) idea Related word senses are
(often) defined using the same words. E.g - bank(1) a financial institution
- bank(2) sloping land beside a body of water
- lake a body of water surrounded by land
11Gloss Overlaps Relatedness
- Lesks (1986) idea Related word senses are
(often) defined using the same words. E.g - bank(1) a financial institution
- bank(2) sloping land beside a body of water
- lake a body of water surrounded by land
12Gloss Overlaps Relatedness
- Lesks (1986) idea Related word senses are
(often) defined using the same words. E.g - bank(1) a financial institution
- bank(2) sloping land beside a body of water
- lake a body of water surrounded by land
- Gloss overlaps content words common to two
glosses relatedness - Thus, relatedness (bank(2), lake) 3
- And, relatedness (bank(1), lake) 0
13Limitations of (Lesks)Gloss Overlaps
- Most glosses are very short.
- So not enough words to find overlaps with.
- Solution Extended gloss overlaps
- Add glosses of synsets connected to the input
synsets.
14Extending a Gloss
sentence the penalty meted out to one adjudged
guilty
bench persons who hear cases in a court of
law
overlapped words 0
15Extending a Gloss
final judgment a judgment disposing of the
case before the court of law
hypernym
sentence the penalty meted out to one adjudged
guilty
bench persons who hear cases in a court of
law
overlapped words 0
16Extending a Gloss
final judgment a judgment disposing of the
case before the court of law
hypernym
sentence the penalty meted out to one adjudged
guilty
bench persons who hear cases in a court of
law
overlapped words 2
17Creating the Extended Gloss Overlap Measure
- How to measure overlaps?
- Which relations to use for gloss extension?
18How to Score Overlaps?
- Lesk simply summed up overlapped words.
- But matches involving phrases phrasal matches
are rarer, and more informative - E.g. court of law
- Aim Score of n words in a phrase gt sum of
scores of n words in shorter phrases - Solution Give a phrase of n words a score of
- court of law gets score of 9.
19Which Relations to Use?
- Hypernyms car ? vehicle
- Hyponyms car ? convertible
- Meronyms car ? accelerator
- Holonym car ? train
- Also-see relation enter ? move in
- Attribute measure ? standard
- Pertainym centennial ? century
20Extended Gloss Overlap Measure
- Input two synsets A and B
- Find phrasal gloss overlaps between A and B
- Next, find phrasal gloss overlaps between
- every synset connected to A, and
- every synset connected to B
- Compute phrasal scores for all such overlaps
- Add phrasal scores to get relatedness of A and B
- A and B can be from different parts of speech.
21Evaluation On WSD
- Test semantic relatedness measures on Word Sense
Disambiguation (WSD) task. - WSD determine the intended sense of a
multi-sense word in a sentence - E.g. I sat on the bank of the lake.
- Our WSD algorithm Pick that sense of the target
word that is most strongly related to its
neighboring words. (based on Lesk 86)
22Word sense disambiguation using a relatedness
measure
the bench pronounced the sentence
23bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
24pronounce speak or utter in a certain way
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
pronounce pronounce judgment on
25pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
26pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
27pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
28pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
29pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
30pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
31pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
32pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
33pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
34pronounce speak or utter in a certain way
sentence a string of words that satisfies
grammar rules
bench a long seat for more than one person
the bench pronounced the sentence
bench persons who hear cases in a court of
law
sentence the penalty meted out to one adjudged
guilty
pronounce pronounce judgment on
35Evaluation Data
- Data from SENSEVAL-2 WSD exercise.
- 4,328 passages, each 2-3 sentences long and
containing 1 multi-sense target word. - Each target word labeled by humans with its most
appropriate WordNet sense. - WSD algorithms output senses compared against
these human labels. - Precision, recall, and f-measure reported.
36Evaluation Results
37Which WN Relations Help?
- Evaluation with a single relation at a time
- E.g., comparing only hypernyms, only hyponyms,
etc. - Result No single comparison is a big source of
information. - No pair exceeded f-measure of 0.136, as compared
to overall f-measure of 0.346
38Which WN Relations Help?
- Most helpful were
- Hyponym relation
- kinds of car ? compact, SUV, coupe, etc.
- Meronym relation
- parts of car ? accelerator, wheel, hood,
etc. - These relations are usually one-many.
- Thus they give access to many glosses.
- Implies more glosses ? more useful.
39Conclusions
- We presented a new measure of semantic
relatedness - Can operate across parts of speech.
- We evaluated on the task of WSD.
- Performed much better than the Lesk baseline
- Performance comparable to other systems.
- Future work
- Augment using corpus statistics.
- Evaluate on different task.
40Resources
- WordNetSimilarity (relatedness measures)
(http//search.cpan.org/dist/WordNet-Similarity) - Extended gloss overlaps
- Resnik, Lin, Jiang-Conrath
- Leacock-Chodorow, Hirst-St. Onge
- Edge Counting, Random
- SenseRelate (WSD using relatedness)
- (http//www.d.umn.edu/tpederse/senserelate.html)