Word sense disambiguation (1) - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Word sense disambiguation (1)

Description:

Note: Some of the material in this set was adapted from a tutorial given ... dachshund. hunting dog. hyena dog. dingo. hyena. dog. terrier. Slide 26 ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 34
Provided by: radami
Category:

less

Transcript and Presenter's Notes

Title: Word sense disambiguation (1)


1
  • Word sense disambiguation (1)
  • Instructor Rada Mihalcea
  • Note Some of the material in this slide set was
    adapted from a tutorial given by Rada Mihalcea
    Ted Pedersen at ACL 2005

2
Definitions
  • Word sense disambiguation is the problem of
    selecting a sense for a word from a set of
    predefined possibilities.
  • Sense Inventory usually comes from a dictionary
    or thesaurus.
  • Knowledge intensive methods, supervised learning,
    and (sometimes) bootstrapping approaches
  • Word sense discrimination is the problem of
    dividing the usages of a word into different
    meanings, without regard to any particular
    existing sense inventory.
  • Unsupervised techniques

3
Computers versus Humans
  • Polysemy most words have many possible
    meanings.
  • A computer program has no basis for knowing which
    one is appropriate, even if it is obvious to a
    human
  • Ambiguity is rarely a problem for humans in their
    day to day communication, except in extreme
    cases

4
Ambiguity for Humans - Newspaper Headlines!
  • DRUNK GETS NINE YEARS IN VIOLIN CASE
  • FARMER BILL DIES IN HOUSE
  • PROSTITUTES APPEAL TO POPE
  • STOLEN PAINTING FOUND BY TREE
  • RED TAPE HOLDS UP NEW BRIDGE
  • DEER KILL 300,000
  • RESIDENTS CAN DROP OFF TREES
  • INCLUDE CHILDREN WHEN BAKING COOKIES
  • MINERS REFUSE TO WORK AFTER DEATH

5
Ambiguity for a Computer
  • The fisherman jumped off the bank and into the
    water.
  • The bank down the street was robbed!
  • Back in the day, we had an entire bank of
    computers devoted to this problem.
  • The bank in that road is entirely too steep and
    is really dangerous.
  • The plane took a bank to the left, and then
    headed off towards the mountains.

6
Early Days of WSD
  • Noted as problem for Machine Translation (Weaver,
    1949)
  • A word can often only be translated if you know
    the specific sense intended (A bill in English
    could be a pico or a cuenta in Spanish)
  • Bar-Hillel (1960) posed the following
  • Little John was looking for his toy box. Finally,
    he found it. The box was in the pen. John was
    very happy.
  • Is pen a writing instrument or an enclosure
    where children play?
  • declared it unsolvable, left the field of MT!

7
Since then
  • 1970s - 1980s
  • Rule based systems
  • Rely on hand crafted knowledge sources
  • 1990s
  • Corpus based approaches
  • Dependence on sense tagged text
  • (Ide and Veronis, 1998) overview history from
    early days to 1998.
  • 2000s
  • Hybrid Systems
  • Minimizing or eliminating use of sense tagged
    text
  • Taking advantage of the Web

8
Practical Applications
  • Machine Translation
  • Translate bill from English to Spanish
  • Is it a pico or a cuenta?
  • Is it a bird jaw or an invoice?
  • Information Retrieval
  • Find all Web Pages about cricket
  • The sport or the insect?
  • Question Answering
  • What is George Millers position on gun control?
  • The psychologist or US congressman?
  • Knowledge Acquisition
  • Add to KB Herb Bergson is the mayor of Duluth.
  • Minnesota or Georgia?

9
Knowledge-based WSD
  • Task definition
  • Knowledge-based WSD class of WSD methods
    relying (mainly) on knowledge drawn from
    dictionaries and/or raw text
  • Resources
  • Yes
  • Machine Readable Dictionaries
  • Raw corpora
  • No
  • Manually annotated corpora
  • Scope
  • All open-class words

10
Machine Readable Dictionaries
  • In recent years, most dictionaries made available
    in Machine Readable format (MRD)
  • Oxford English Dictionary
  • Collins
  • Longman Dictionary of Ordinary Contemporary
    English (LDOCE)
  • Thesauruses add synonymy information
  • Roget Thesaurus
  • Semantic networks add more semantic relations
  • WordNet
  • EuroWordNet

11
MRD A Resource for Knowledge-based WSD
  • For each word in the language vocabulary, an MRD
    provides
  • A list of meanings
  • Definitions (for all word meanings)
  • Typical usage examples (for most word meanings)

12
MRD A Resource for Knowledge-based WSD
  • A thesaurus adds
  • An explicit synonymy relation between word
    meanings
  • A semantic network adds
  • Hypernymy/hyponymy (IS-A), meronymy/holonymy
    (PART-OF), antonymy, entailnment, etc.

WordNet synsets for the noun plant 1.
plant, works, industrial plant 2. plant,
flora, plant life
WordNet related concepts for the meaning plant
life plant, flora, plant life
hypernym organism, being
hypomym house plant, fungus,
meronym plant tissue, plant part
holonym Plantae, kingdom Plantae, plant
kingdom
13
Lesk Algorithm
  • (Michael Lesk 1986) Identify senses of words in
    context using definition overlap
  • Algorithm
  • Retrieve from MRD all sense definitions of the
    words to be disambiguated
  • Determine the definition overlap for all possible
    sense combinations
  • Choose senses that lead to highest overlap
  • Example disambiguate PINE CONE
  • PINE
  • 1. kinds of evergreen tree with needle-shaped
    leaves
  • 2. waste away through sorrow or illness
  • CONE
  • 1. solid body which narrows to a point
  • 2. something of this shape whether solid or
    hollow
  • 3. fruit of certain evergreen trees

Pine1 ? Cone1 0 Pine2 ? Cone1 0 Pine1 ?
Cone2 1 Pine2 ? Cone2 0 Pine1 ? Cone3
2 Pine2 ? Cone3 0
14
Lesk Algorithm for More than Two Words?
  • I saw a man who is 98 years old and can still
    walk and tell jokes
  • nine open class words see(26), man(11),
    year(4), old(8), can(5), still(4), walk(10),
    tell(8), joke(3)
  • 43,929,600 sense combinations! How to find the
    optimal sense combination?
  • Simulated annealing (Cowie, Guthrie, Guthrie
    1992)
  • Define a function E combination of word senses
    in a given text.
  • Find the combination of senses that leads to
    highest definition overlap (redundancy)
  • 1. Start with E the most frequent sense
    for each word
  • 2. At each iteration, replace the sense of a
    random word in the set with a different sense,
    and measure E
  • 3. Stop iterating when there is no change in
    the configuration of senses

15
Lesk Algorithm A Simplified Version
  • Original Lesk definition measure overlap between
    sense definitions for all words in context
  • Identify simultaneously the correct senses for
    all words in context
  • Simplified Lesk (Kilgarriff Rosensweig 2000)
    measure overlap between sense definitions of a
    word and current context
  • Identify the correct sense for one word at a time
  • Search space significantly reduced

16
Lesk Algorithm A Simplified Version
  • Algorithm for simplified Lesk
  • Retrieve from MRD all sense definitions of the
    word to be disambiguated
  • Determine the overlap between each sense
    definition and the current context
  • Choose the sense that leads to highest overlap
  • Example disambiguate PINE in
  • Pine cones hanging in a tree
  • PINE
  • 1. kinds of evergreen tree with needle-shaped
    leaves
  • 2. waste away through sorrow or illness

Pine1 ? Sentence 1 Pine2 ? Sentence 0
17
Evaluations of Lesk Algorithm
  • Initial evaluation by M. Lesk
  • 50-70 on short samples of text manually
    annotated set, with respect to Oxford Advanced
    Learners Dictionary
  • Simulated annealing
  • 47 on 50 manually annotated sentences
  • Evaluation on Senseval-2 all-words data, with
    back-off to random sense (Mihalcea Tarau 2004)
  • Original Lesk 35
  • Simplified Lesk 47
  • Evaluation on Senseval-2 all-words data, with
    back-off to most frequent sense (Vasilescu,
    Langlais, Lapalme 2004)
  • Original Lesk 42
  • Simplified Lesk 58

18
Selectional Preferences
  • A way to constrain the possible meanings of words
    in a given context
  • E.g. Wash a dish vs. Cook a dish
  • WASH-OBJECT vs. COOK-FOOD
  • Capture information about possible relations
    between semantic classes
  • Common sense knowledge
  • Alternative terminology
  • Selectional Restrictions
  • Selectional Preferences
  • Selectional Constraints

19
Acquiring Selectional Preferences
  • From annotated corpora
  • Circular relationship with the WSD problem
  • Need WSD to build the annotated corpus
  • Need selectional preferences to derive WSD
  • From raw corpora
  • Frequency counts
  • Information theory measures
  • Class-to-class relations

20
Preliminaries Learning Word-to-Word Relations
  • An indication of the semantic fit between two
    words
  • 1. Frequency counts
  • Pairs of words connected by a syntactic relations
  • 2. Conditional probabilities
  • Condition on one of the words

21
Learning Selectional Preferences (1)
  • Word-to-class relations (Resnik 1993)
  • Quantify the contribution of a semantic class
    using all the concepts subsumed by that class
  • where

22
Learning Selectional Preferences (2)
  • Determine the contribution of a word sense based
    on the assumption of equal sense distributions
  • e.g. plant has two senses ? 50 occurrences are
    sense 1, 50 are sense 2
  • Example learning restrictions for the verb to
    drink
  • Find high-scoring verb-object pairs
  • Find prototypical object classes (high
    association score)

23
Using Selectional Preferences for WSD
  • Algorithm
  • 1. Learn a large set of selectional preferences
    for a given syntactic relation R
  • 2. Given a pair of words W1 W2 connected by a
    relation R
  • 3. Find all selectional preferences W1 C
    (word-to-class) or C1 C2 (class-to-class) that
    apply
  • 4. Select the meanings of W1 and W2 based on the
    selected semantic class
  • Example disambiguate coffee in drink coffee
  • 1. (beverage) a beverage consisting of an
    infusion of ground coffee beans
  • 2. (tree) any of several small trees native to
    the tropical Old World
  • 3. (color) a medium to dark brown color

Given the selectional preference DRINK BEVERAGE
coffee1
24
Evaluation of Selectional Preferences for WSD
  • Data set
  • mainly on verb-object, subject-verb relations
    extracted from SemCor
  • Compare against random baseline
  • Results (Agirre and Martinez, 2000)
  • Average results on 8 nouns
  • Similar figures reported in (Resnik 1997)

25
Semantic Similarity
  • Words in a discourse must be related in meaning,
    for the discourse to be coherent (Haliday and
    Hassan, 1976)
  • Use this property for WSD Identify related
    meanings for words that share a common context
  • Context span
  • 1. Local context semantic similarity between
    pairs of words
  • 2. Global context lexical chains

26
Semantic Similarity in a Local Context
  • Similarity determined between pairs of concepts,
    or between a word and its surrounding context
  • Relies on similarity metrics on semantic networks
  • (Rada et al. 1989)

carnivore
bear
feline, felid
canine, canid
fissiped mamal, fissiped
wild dog
wolf
hyena
dog
hunting dog
hyena dog
dingo
dachshund
terrier
27
Semantic Similarity Metrics for WSD
  • Disambiguate target words based on similarity
    with one word to the left and one word to the
    right
  • (Patwardhan, Banerjee, Pedersen 2002)
  • Evaluation
  • 1,723 ambiguous nouns from Senseval-2
  • Among 5 similarity metrics, (Jiang and Conrath
    1997) provide the best precision (39)
  • Example disambiguate PLANT in plant with
    flowers
  • PLANT
  • plant, works, industrial plant
  • plant, flora, plant life
  • Similarity (plant1, flower) 0.2
  • Similarity (plant2, flower) 1.5
    plant2

28
Semantic Similarity in a Global Context
  • Lexical chains (Hirst and St-Onge 1988), (Haliday
    and Hassan 1976)
  • A lexical chain is a sequence of semantically
    related words, which creates a context and
    contributes to the continuity of meaning and the
    coherence of a discourse
  • Algorithm for finding lexical chains
  • Select the candidate words from the text. These
    are words for which we can compute similarity
    measures, and therefore most of the time they
    have the same part of speech.
  • For each such candidate word, and for each
    meaning for this word, find a chain to receive
    the candidate word sense, based on a semantic
    relatedness measure between the concepts that are
    already in the chain, and the candidate word
    meaning.
  • If such a chain is found, insert the word in this
    chain otherwise, create a new chain.

29
Semantic Similarity of a Global Context
A very long train traveling along the rails with
a constant velocity v in a certain direction
train
1 public transport
1 change location
2 a bar of steel for trains
2 order set of things
3 piece of cloth
travel
2 undergo transportation
rail
1 a barrier
3 a small bird
30
Lexical Chains for WSD
  • Identify lexical chains in a text
  • Usually target one part of speech at a time
  • Identify the meaning of words based on their
    membership to a lexical chain
  • Evaluation
  • (Galley and McKeown 2003) lexical chains on 74
    SemCor texts give 62.09
  • (Mihalcea and Moldovan 2000) on five SemCor texts
    give 90 with 60 recall
  • lexical chains anchored on monosemous words
  • (Okumura and Honda 1994) lexical chains on five
    Japanese texts give 63.4

31
Heuristics Most Frequent Sense
  • Identify the most often used meaning and use this
    meaning by default
  • Word meanings exhibit a Zipfian distribution
  • E.g. distribution of word senses in SemCor

Example plant/flora is used more often than
plant/factory - annotate any instance of
PLANT as plant/flora
32
Heuristics One Sense Per Discourse
  • A word tends to preserve its meaning across all
    its occurrences in a given discourse (Gale,
    Church, Yarowksy 1992)
  • What does this mean?
  • Evaluation
  • 8 words with two-way ambiguity, e.g. plant,
    crane, etc.
  • 98 of the two-word occurrences in the same
    discourse carry the same meaning
  • The grain of salt Performance depends on
    granularity
  • (Krovetz 1998) experiments with words with more
    than two senses
  • Performance of one sense per discourse measured
    on SemCor is approx. 70

E.g. The ambiguous word PLANT occurs 10 times in
a discourse all instances of plant
carry the same meaning
33
Heuristics One Sense per Collocation
  • A word tends to preserver its meaning when used
    in the same collocation (Yarowsky 1993)
  • Strong for adjacent collocations
  • Weaker as the distance between words increases
  • An example
  • Evaluation
  • 97 precision on words with two-way ambiguity
  • Finer granularity
  • (Martinez and Agirre 2000) tested the one sense
    per collocation hypothesis on text annotated
    with WordNet senses
  • 70 precision on SemCor words

The ambiguous word PLANT preserves its meaning in
all its occurrences within the collocation
industrial plant, regardless of the context
where this collocation occurs
Write a Comment
User Comments (0)
About PowerShow.com