CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation

Title:

CPSC 503 Computational Linguistics

Description:

Thesaurus methods: measure distance in online thesauri (e.g., Wordnet) ... WS: Thesaurus Methods(2) One of best performers Jiang-Conrath distance ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 37

Provided by: giuseppe7

Category:

more less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics

1
CPSC 503Computational Linguistics

Computational Lexical Semantics
Lecture 12
Giuseppe Carenini

2
Today 22/10

Three well-defined Semantic Task
Word Sense Disambiguation
Corpus and Thesaurus
Word Similarity
Thesaurus and Corpus
Semantic Role Labeling

3
WSD example table ?? - 1-6

The noun "table" has 6 senses in WordNet.1.
table, tabular array -- (a set of data )2.
table -- (a piece of furniture )3. table -- (a
piece of furniture with tableware)4. mesa,
table -- (flat tableland )5. table -- (a
company of people )6. board, table -- (food or
meals )

4
WSD methods

Machine Learning
Supervised
Unsupervised
Dictionary / Thesaurus (Lesk)

5
Supervised ML Approaches to WSD
6
Training Data Example
((word context) ? sense)i

..after the soup she had bass with a big salad

7
WordNet Bass music vs. fish

The noun bass'' has 8 senses in WordNet
bass - (the lowest part of the musical range)
bass, bass part - (the lowest part in polyphonic
music)
bass, basso - (an adult male singer with )
sea bass, bass - (flesh of lean-fleshed saltwater
fish of the family Serranidae)
freshwater bass, bass - (any of various North
American lean-fleshed )
bass, bass voice, basso - (the lowest adult male
singing voice)
bass - (the member with the lowest range of a
family of musical instruments)
bass -(nontechnical name for any of numerous
edible marine and freshwater spiny-finned
fishes)

8
Representations for Context

GOAL Informative characterization of the window
of text surrounding the target word

TASK Select relevant linguistic information,
encode them as a feature vector

9
Relevant Linguistic Information(1)

Collocational info about the words that appear
in specific positions to the right and left of
the target word

Typically words and their POS
word in position -n, part-of-speech position -n,
word in position n, part-of-speech position
n,
Assume a window of /- 2 from the target

Example text (WSJ)
An electric guitar and bass player stand off to
one side not really part of the scene,

guitar, NN, and, CJC, player, NN, stand, VVB
10
Relevant Linguistic Information(2)

Co-occurrence info about the words that occur
anywhere in the window regardless of position

Find k content words that most frequently
co-occur with target in corpus (for bass
fishing, big, sound, player, fly , guitar, band))

Vector for one case c(fishing), c(big),
c(sound), c(player), c(fly), , c(guitar),
c(band)

Example text (WSJ)
An electric guitar and bass player stand off to
one side not really part of the scene,

0,0,0,1,0,0,0,0,0,0,1,0
11
Training Data Examples
Lets assume bass-music encoded as 0 bass-fish
encoded as 1
0,0,0,1,0,0,0,0,0,0,1,0,0
guitar, NN, and, CJC, player, NN, stand, VVB, 0
a, AT0, sea, CJC, to, PRP, me, PNP, 1
1,0,0,0,0,0,0,0,0,0,0,0,1
play, VVB, the, AT0, with, PRP, others, PNP, 0

1,0,0,0,0,0,0,0,0,0,0,1,1
..

Inputs to classifiers

1,1,0,0,0,1,0,0,0,0,0,0
guitar, NN, and, CJC, could, VM0, be, VVI
12
ML for Classifiers

Training Data
Co-occurrence
Collocational

Naïve Bayes
Decision lists
Decision trees
Neural nets
Support vector machines
Nearest neighbor methods

Machine Learning
Classifier
13
Naïve Bayes
14
Naïve Bayes Evaluation

Experiment comparing different classifiers
Mooney 96
Naïve Bayes and Neural Network achieved highest
performance
73 in assigning one of six senses to line
Is this good?

Simplest Baseline most frequent sense
Celing human inter-annotator agreement
75-80 on refined sense distinctions (wordnet)
Closer to 90 for binary distinctions

15
Bootstrapping

What if you dont have enough data to train a
system

16
Bootstrapping how to pick the seeds

Hand-labeling (Hearst 1991)
Likely correct
Likely to be prototypical
One sense per collocation (Yarowsky 1995)

E.g., bass play is strongly associated with the
music sense whereas fish is strongly associated
the fish sense

One Sense Per Discourse multiple occurrences of
word in one discourse tend to have the same sense

17
Unsupervised Methods Schutze 98
Machine Learning (Clustering)
Training Data
(word vector)1 (word vector)n
K Clusters ci
18
Agglomerative Clustering

Assign each instance to its own cluster
Repeat
Merge the two clusters that are more similar
Until (specified of clusters is reached)

If there are too many training instances -random
sampling

19
Problems

Given these general ML approaches, how many
classifiers do I need to perform WSD robustly
One for each ambiguous word in the language

How do you decide what set of tags/labels/senses
to use for a given word?
Depends on the application

20
WDS Dictionary and Thesaurus Methods

Most common Lesk method
Choose the sense whose dictionary gloss shares
most words with the target words neighborhood
Exclude stop-words

Def Words in gloss for a sense is called the
signature
21
Lesk Example
Two SENSES for channel S1 (n) channel (a passage
for water (or other fluids) to flow through) "the
fields were crossed with irrigation channels"
"gutters carried off the rainwater into a series
of channels under the street" S2 (n) channel,
television channel, TV channel (a television
station and its programs) "a satellite TV
channel" "surfing through the channels" "they
offer more than one hundred channels" ..
most streets closed to the TV station were
flooded because the main channel was clogged by
heavy rain .
22
Corpus Lesk

Best performer
If a corpus with annotated senses is available
For each sense add all the words in the
sentences containing that sense to the signature
for that sense

CORPUS most streets closed to the TV station
were flooded because the main channel
was clogged by heavy rain. ..
?
23
WSD More Recent Trends

Better ML techniques (e.g., Combining
Classifiers)
Combining ML and Lesk
Other Languages
Building better/larger corpora

24
Today 22/10

Word Sense Disambiguation
Word Similarity
Semantic Role Labeling

25
Word Similarity

Actually relation between two senses
Similarity vs. Relatedness
sun vs. moon mouth vs. food hot vs. cold
Applications?
Thesaurus methods measure distance in online
thesauri (e.g., Wordnet)
Distributional methods finding if the two words
appear in similar contexts

26
WS Thesaurus Methods(1)

Path-length based sim on hyper/hypo hierarchies

Information content word similarity (not all
edges are equal)

probability
Information
Lowest Common Subsumer
27
WS Thesaurus Methods(2)

One of best performers Jiang-Conrath distance

This is a measure of distance. Reciprocal for
similarity!

See also Extended Lesk

28
WS Distributional Methods

Do not have any thesauri for target language
If you have thesaurus, still
Missing domain-specific (e.g., technical words)
Poor hyponym knowledge (for V) and nothing for
Adj and Adv
Difficult to compare senses from different
hierarchies

Solution extract similarity from corpora

Basic idea two words are similar if they appear
in similar contexts

29
WS Distributional Methods (1)

Simple Context feature vector

Stop list
Example fi how many times wi appeared in the
neighborhood of w

More Complex Context feature matrix

aij how many times wi appeared in the
neighborhood of w and was related to w by the
syntactic relation rj
30
WS Distributional Methods (2)

More informative values (referred to as weights
or measure of association in the literature)

Point-wise Mutual Information

t-test

31
WS Distributional Methods (3)

Similarity between vectors

Not sensitive to extreme values
Normalized (weighted) number of overlapping
features
32
WS Distributional Methods (4)

Best combination overall
t-test for weights
Jaccard (or Dice) for vector similarity

33
Today 22/10

Word Sense Disambiguation
Word Similarity
Semantic Role Labeling

34
Semantic Role Labeling

Typically framed as a classification problem
Gildea, Jurfsky 2002
Assign parse tree to input
Find all predicate-bearing words (PropBank,
FrameNet)
For each predicate
determine for each synt. constituent which role
(if any) it plays with respect to the predicate

Common constituent features predicate, phrase
type, head word and its POS, path, voice, linear
position and many others
35
Semantic Role Labeling Example
issued, NP, Examiner, NNP, NP?S?VP?VBD, active,
before, ..
36
Next Time