CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 503 Computational Linguistics

Description:

Thesaurus methods: measure distance in online thesauri (e.g., Wordnet) ... WS: Thesaurus Methods(2) One of best performers Jiang-Conrath distance ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 37
Provided by: giuseppe7
Category:

less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics


1
CPSC 503Computational Linguistics
  • Computational Lexical Semantics
  • Lecture 12
  • Giuseppe Carenini

2
Today 22/10
  • Three well-defined Semantic Task
  • Word Sense Disambiguation
  • Corpus and Thesaurus
  • Word Similarity
  • Thesaurus and Corpus
  • Semantic Role Labeling

3
WSD example table ?? - 1-6
  • The noun "table" has 6 senses in WordNet.1.
    table, tabular array -- (a set of data )2.
    table -- (a piece of furniture )3. table -- (a
    piece of furniture with tableware)4. mesa,
    table -- (flat tableland )5. table -- (a
    company of people )6. board, table -- (food or
    meals )

4
WSD methods
  • Machine Learning
  • Supervised
  • Unsupervised
  • Dictionary / Thesaurus (Lesk)

5
Supervised ML Approaches to WSD
6
Training Data Example
((word context) ? sense)i
  • ..after the soup she had bass with a big salad

7
WordNet Bass music vs. fish
  • The noun bass'' has 8 senses in WordNet
  • bass - (the lowest part of the musical range)
  • bass, bass part - (the lowest part in polyphonic
    music)
  • bass, basso - (an adult male singer with )
  • sea bass, bass - (flesh of lean-fleshed saltwater
    fish of the family Serranidae)
  • freshwater bass, bass - (any of various North
    American lean-fleshed )
  • bass, bass voice, basso - (the lowest adult male
    singing voice)
  • bass - (the member with the lowest range of a
    family of musical instruments)
  • bass -(nontechnical name for any of numerous
    edible marine and freshwater spiny-finned
    fishes)

8
Representations for Context
  • GOAL Informative characterization of the window
    of text surrounding the target word
  • TASK Select relevant linguistic information,
    encode them as a feature vector

9
Relevant Linguistic Information(1)
  • Collocational info about the words that appear
    in specific positions to the right and left of
    the target word

Typically words and their POS
word in position -n, part-of-speech position -n,
word in position n, part-of-speech position
n,
Assume a window of /- 2 from the target
  • Example text (WSJ)
  • An electric guitar and bass player stand off to
    one side not really part of the scene,

guitar, NN, and, CJC, player, NN, stand, VVB
10
Relevant Linguistic Information(2)
  • Co-occurrence info about the words that occur
    anywhere in the window regardless of position
  • Find k content words that most frequently
    co-occur with target in corpus (for bass
    fishing, big, sound, player, fly , guitar, band))

Vector for one case c(fishing), c(big),
c(sound), c(player), c(fly), , c(guitar),
c(band)
  • Example text (WSJ)
  • An electric guitar and bass player stand off to
    one side not really part of the scene,

0,0,0,1,0,0,0,0,0,0,1,0
11
Training Data Examples
Lets assume bass-music encoded as 0 bass-fish
encoded as 1
0,0,0,1,0,0,0,0,0,0,1,0,0
guitar, NN, and, CJC, player, NN, stand, VVB, 0
a, AT0, sea, CJC, to, PRP, me, PNP, 1
1,0,0,0,0,0,0,0,0,0,0,0,1
play, VVB, the, AT0, with, PRP, others, PNP, 0

1,0,0,0,0,0,0,0,0,0,0,1,1
..
  • Inputs to classifiers

1,1,0,0,0,1,0,0,0,0,0,0
guitar, NN, and, CJC, could, VM0, be, VVI
12
ML for Classifiers
  • Training Data
  • Co-occurrence
  • Collocational
  • Naïve Bayes
  • Decision lists
  • Decision trees
  • Neural nets
  • Support vector machines
  • Nearest neighbor methods

Machine Learning
Classifier
13
Naïve Bayes
14
Naïve Bayes Evaluation
  • Experiment comparing different classifiers
    Mooney 96
  • Naïve Bayes and Neural Network achieved highest
    performance
  • 73 in assigning one of six senses to line
  • Is this good?
  • Simplest Baseline most frequent sense
  • Celing human inter-annotator agreement
  • 75-80 on refined sense distinctions (wordnet)
  • Closer to 90 for binary distinctions

15
Bootstrapping
  • What if you dont have enough data to train a
    system

16
Bootstrapping how to pick the seeds
  • Hand-labeling (Hearst 1991)
  • Likely correct
  • Likely to be prototypical
  • One sense per collocation (Yarowsky 1995)
  • E.g., bass play is strongly associated with the
    music sense whereas fish is strongly associated
    the fish sense
  • One Sense Per Discourse multiple occurrences of
    word in one discourse tend to have the same sense

17
Unsupervised Methods Schutze 98
Machine Learning (Clustering)
Training Data
(word vector)1 (word vector)n
K Clusters ci
18
Agglomerative Clustering
  • Assign each instance to its own cluster
  • Repeat
  • Merge the two clusters that are more similar
  • Until (specified of clusters is reached)
  • If there are too many training instances -random
    sampling

19
Problems
  • Given these general ML approaches, how many
    classifiers do I need to perform WSD robustly
  • One for each ambiguous word in the language
  • How do you decide what set of tags/labels/senses
    to use for a given word?
  • Depends on the application

20
WDS Dictionary and Thesaurus Methods
  • Most common Lesk method
  • Choose the sense whose dictionary gloss shares
    most words with the target words neighborhood
  • Exclude stop-words

Def Words in gloss for a sense is called the
signature
21
Lesk Example
Two SENSES for channel S1 (n) channel (a passage
for water (or other fluids) to flow through) "the
fields were crossed with irrigation channels"
"gutters carried off the rainwater into a series
of channels under the street" S2 (n) channel,
television channel, TV channel (a television
station and its programs) "a satellite TV
channel" "surfing through the channels" "they
offer more than one hundred channels" ..
most streets closed to the TV station were
flooded because the main channel was clogged by
heavy rain .
22
Corpus Lesk
  • Best performer
  • If a corpus with annotated senses is available
  • For each sense add all the words in the
    sentences containing that sense to the signature
    for that sense

CORPUS most streets closed to the TV station
were flooded because the main channel
was clogged by heavy rain. ..
?
23
WSD More Recent Trends
  • Better ML techniques (e.g., Combining
    Classifiers)
  • Combining ML and Lesk
  • Other Languages
  • Building better/larger corpora

24
Today 22/10
  • Word Sense Disambiguation
  • Word Similarity
  • Semantic Role Labeling

25
Word Similarity
  • Actually relation between two senses
  • Similarity vs. Relatedness
  • sun vs. moon mouth vs. food hot vs. cold
  • Applications?
  • Thesaurus methods measure distance in online
    thesauri (e.g., Wordnet)
  • Distributional methods finding if the two words
    appear in similar contexts

26
WS Thesaurus Methods(1)
  • Path-length based sim on hyper/hypo hierarchies
  • Information content word similarity (not all
    edges are equal)

probability
Information
Lowest Common Subsumer
27
WS Thesaurus Methods(2)
  • One of best performers Jiang-Conrath distance
  • This is a measure of distance. Reciprocal for
    similarity!
  • See also Extended Lesk

28
WS Distributional Methods
  • Do not have any thesauri for target language
  • If you have thesaurus, still
  • Missing domain-specific (e.g., technical words)
  • Poor hyponym knowledge (for V) and nothing for
    Adj and Adv
  • Difficult to compare senses from different
    hierarchies
  • Solution extract similarity from corpora
  • Basic idea two words are similar if they appear
    in similar contexts

29
WS Distributional Methods (1)
  • Simple Context feature vector

Stop list
Example fi how many times wi appeared in the
neighborhood of w
  • More Complex Context feature matrix

aij how many times wi appeared in the
neighborhood of w and was related to w by the
syntactic relation rj
30
WS Distributional Methods (2)
  • More informative values (referred to as weights
    or measure of association in the literature)
  • Point-wise Mutual Information
  • t-test

31
WS Distributional Methods (3)
  • Similarity between vectors

Not sensitive to extreme values
Normalized (weighted) number of overlapping
features
32
WS Distributional Methods (4)
  • Best combination overall
  • t-test for weights
  • Jaccard (or Dice) for vector similarity

33
Today 22/10
  • Word Sense Disambiguation
  • Word Similarity
  • Semantic Role Labeling

34
Semantic Role Labeling
  • Typically framed as a classification problem
    Gildea, Jurfsky 2002
  • Assign parse tree to input
  • Find all predicate-bearing words (PropBank,
    FrameNet)
  • For each predicate
  • determine for each synt. constituent which role
    (if any) it plays with respect to the predicate

Common constituent features predicate, phrase
type, head word and its POS, path, voice, linear
position and many others
35
Semantic Role Labeling Example
issued, NP, Examiner, NNP, NP?S?VP?VBD, active,
before, ..
36
Next Time
  • Discourse and Dialog
  • Overview of Chapters 21 and 24
Write a Comment
User Comments (0)
About PowerShow.com