Title: CPSC 503 Computational Linguistics
1CPSC 503Computational Linguistics
- Computational Lexical Semantics
- Lecture 12
- Giuseppe Carenini
2Today 22/10
- Three well-defined Semantic Task
- Word Sense Disambiguation
- Corpus and Thesaurus
- Word Similarity
- Thesaurus and Corpus
- Semantic Role Labeling
3WSD example table ?? - 1-6
- The noun "table" has 6 senses in WordNet.1.
table, tabular array -- (a set of data )2.
table -- (a piece of furniture )3. table -- (a
piece of furniture with tableware)4. mesa,
table -- (flat tableland )5. table -- (a
company of people )6. board, table -- (food or
meals )
4WSD methods
- Machine Learning
- Supervised
- Unsupervised
- Dictionary / Thesaurus (Lesk)
5Supervised ML Approaches to WSD
6Training Data Example
((word context) ? sense)i
- ..after the soup she had bass with a big salad
7WordNet Bass music vs. fish
- The noun bass'' has 8 senses in WordNet
- bass - (the lowest part of the musical range)
- bass, bass part - (the lowest part in polyphonic
music) - bass, basso - (an adult male singer with )
- sea bass, bass - (flesh of lean-fleshed saltwater
fish of the family Serranidae) - freshwater bass, bass - (any of various North
American lean-fleshed ) - bass, bass voice, basso - (the lowest adult male
singing voice) - bass - (the member with the lowest range of a
family of musical instruments) - bass -(nontechnical name for any of numerous
edible marine and freshwater spiny-finned
fishes)
8Representations for Context
- GOAL Informative characterization of the window
of text surrounding the target word
- TASK Select relevant linguistic information,
encode them as a feature vector
9Relevant Linguistic Information(1)
- Collocational info about the words that appear
in specific positions to the right and left of
the target word
Typically words and their POS
word in position -n, part-of-speech position -n,
word in position n, part-of-speech position
n,
Assume a window of /- 2 from the target
- Example text (WSJ)
- An electric guitar and bass player stand off to
one side not really part of the scene,
guitar, NN, and, CJC, player, NN, stand, VVB
10Relevant Linguistic Information(2)
- Co-occurrence info about the words that occur
anywhere in the window regardless of position
- Find k content words that most frequently
co-occur with target in corpus (for bass
fishing, big, sound, player, fly , guitar, band))
Vector for one case c(fishing), c(big),
c(sound), c(player), c(fly), , c(guitar),
c(band)
- Example text (WSJ)
- An electric guitar and bass player stand off to
one side not really part of the scene,
0,0,0,1,0,0,0,0,0,0,1,0
11Training Data Examples
Lets assume bass-music encoded as 0 bass-fish
encoded as 1
0,0,0,1,0,0,0,0,0,0,1,0,0
guitar, NN, and, CJC, player, NN, stand, VVB, 0
a, AT0, sea, CJC, to, PRP, me, PNP, 1
1,0,0,0,0,0,0,0,0,0,0,0,1
play, VVB, the, AT0, with, PRP, others, PNP, 0
1,0,0,0,0,0,0,0,0,0,0,1,1
..
1,1,0,0,0,1,0,0,0,0,0,0
guitar, NN, and, CJC, could, VM0, be, VVI
12ML for Classifiers
- Training Data
- Co-occurrence
- Collocational
- Naïve Bayes
- Decision lists
- Decision trees
- Neural nets
- Support vector machines
- Nearest neighbor methods
Machine Learning
Classifier
13Naïve Bayes
14Naïve Bayes Evaluation
- Experiment comparing different classifiers
Mooney 96 - Naïve Bayes and Neural Network achieved highest
performance - 73 in assigning one of six senses to line
- Is this good?
- Simplest Baseline most frequent sense
- Celing human inter-annotator agreement
- 75-80 on refined sense distinctions (wordnet)
- Closer to 90 for binary distinctions
15Bootstrapping
- What if you dont have enough data to train a
system
16Bootstrapping how to pick the seeds
- Hand-labeling (Hearst 1991)
- Likely correct
- Likely to be prototypical
- One sense per collocation (Yarowsky 1995)
- E.g., bass play is strongly associated with the
music sense whereas fish is strongly associated
the fish sense
- One Sense Per Discourse multiple occurrences of
word in one discourse tend to have the same sense
17Unsupervised Methods Schutze 98
Machine Learning (Clustering)
Training Data
(word vector)1 (word vector)n
K Clusters ci
18Agglomerative Clustering
- Assign each instance to its own cluster
- Repeat
- Merge the two clusters that are more similar
- Until (specified of clusters is reached)
- If there are too many training instances -random
sampling
19Problems
- Given these general ML approaches, how many
classifiers do I need to perform WSD robustly - One for each ambiguous word in the language
- How do you decide what set of tags/labels/senses
to use for a given word? - Depends on the application
20WDS Dictionary and Thesaurus Methods
- Most common Lesk method
- Choose the sense whose dictionary gloss shares
most words with the target words neighborhood - Exclude stop-words
Def Words in gloss for a sense is called the
signature
21Lesk Example
Two SENSES for channel S1 (n) channel (a passage
for water (or other fluids) to flow through) "the
fields were crossed with irrigation channels"
"gutters carried off the rainwater into a series
of channels under the street" S2 (n) channel,
television channel, TV channel (a television
station and its programs) "a satellite TV
channel" "surfing through the channels" "they
offer more than one hundred channels" ..
most streets closed to the TV station were
flooded because the main channel was clogged by
heavy rain .
22Corpus Lesk
- Best performer
- If a corpus with annotated senses is available
- For each sense add all the words in the
sentences containing that sense to the signature
for that sense
CORPUS most streets closed to the TV station
were flooded because the main channel
was clogged by heavy rain. ..
?
23WSD More Recent Trends
- Better ML techniques (e.g., Combining
Classifiers) - Combining ML and Lesk
- Other Languages
- Building better/larger corpora
24Today 22/10
- Word Sense Disambiguation
- Word Similarity
- Semantic Role Labeling
25Word Similarity
- Actually relation between two senses
- Similarity vs. Relatedness
- sun vs. moon mouth vs. food hot vs. cold
- Applications?
- Thesaurus methods measure distance in online
thesauri (e.g., Wordnet) - Distributional methods finding if the two words
appear in similar contexts
26WS Thesaurus Methods(1)
- Path-length based sim on hyper/hypo hierarchies
- Information content word similarity (not all
edges are equal)
probability
Information
Lowest Common Subsumer
27WS Thesaurus Methods(2)
- One of best performers Jiang-Conrath distance
- This is a measure of distance. Reciprocal for
similarity!
28WS Distributional Methods
- Do not have any thesauri for target language
- If you have thesaurus, still
- Missing domain-specific (e.g., technical words)
- Poor hyponym knowledge (for V) and nothing for
Adj and Adv - Difficult to compare senses from different
hierarchies
- Solution extract similarity from corpora
- Basic idea two words are similar if they appear
in similar contexts
29WS Distributional Methods (1)
- Simple Context feature vector
Stop list
Example fi how many times wi appeared in the
neighborhood of w
- More Complex Context feature matrix
aij how many times wi appeared in the
neighborhood of w and was related to w by the
syntactic relation rj
30WS Distributional Methods (2)
- More informative values (referred to as weights
or measure of association in the literature)
- Point-wise Mutual Information
31WS Distributional Methods (3)
- Similarity between vectors
Not sensitive to extreme values
Normalized (weighted) number of overlapping
features
32WS Distributional Methods (4)
- Best combination overall
- t-test for weights
- Jaccard (or Dice) for vector similarity
33Today 22/10
- Word Sense Disambiguation
- Word Similarity
- Semantic Role Labeling
34Semantic Role Labeling
- Typically framed as a classification problem
Gildea, Jurfsky 2002 - Assign parse tree to input
- Find all predicate-bearing words (PropBank,
FrameNet) - For each predicate
- determine for each synt. constituent which role
(if any) it plays with respect to the predicate
Common constituent features predicate, phrase
type, head word and its POS, path, voice, linear
position and many others
35Semantic Role Labeling Example
issued, NP, Examiner, NNP, NP?S?VP?VBD, active,
before, ..
36Next Time
- Discourse and Dialog
- Overview of Chapters 21 and 24