Aquesta - PowerPoint PPT Presentation

About This Presentation

Title:

Aquesta

Description:

Aquesta s una prova petita – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 24

Provided by: llu5

Learn more at: https://cogcomp.seas.upenn.edu

Category:

Tags: aquesta

more less

Transcript and Presenter's Notes

Title: Aquesta

1
Word Sense Disambiguation Another NLP working
problem for learning with constraints Lluís
Màrquez TALP, LSI, Technical University of
Catalonia UIUC, June 10 2004
2
Word Sense Disambiguation

The problem
WSD is the problem of assigning the correct
meaning to the words occurring in a text or
discourse (sense tagging)
Example
He was mad about stars at the age1 of nine
About 20,000 years ago the last ice age2
ended
age1 the length of time something (or someone)
has existed
age2 a historic period
Origin in the beginning of AI (60s) around first
MT models
Renewed interest with the explosion of
statistical and ML-based approaches to NLP (90s)

3
Word Sense Disambiguation

Usual approaches
Supervised learning (ML) multiclass
classification problem word-experts. Results
about 75 accuracy on subsets of selected
polysemous words. Sometimes better (over 90) on
some specific words
Unsupervised, knowledge-based heuristic
rules based on preexisting knowledge sources
(WorNet, MRDs, multilingual aligned corpora,
etc.). Accuracy around 60 (allwords WSD)
Combined approaches 65 (allwords WSD)
Supervised methods are better but difficult to
apply to allwords WSD

4
WSD ML Approach

Usual Features
Local context patterns (POS, words, lemmas)
the ltagegt of, ltagegt CD
ltagegt limit, mean ltagegt
Broad context features Bag of (relevant) words
Atomic occurs in the sentence
Dark occurs in the sentence
Also syntactic features capturing
predicate-argument relations

5
WSD ML Approach

Main difficulties
Each word is a classification problem gt data
scarceness
High granularity of sense repositories used gt
many classes
Difficulty in capturing the semantic information
present in the context words (sparseness
problem) which are also ambiguous (no
interactions between word-classifiers have been
exploited).

6
WSD Difficulties

Example (from WSJ)

The jury further said in term end presentments
that the City Executive Committee, which had
over-all charge of the election, deserves the
praise and thanks of the City of Atlanta for the
manner in which the election was conducted.
7
WSD Difficulties

Example (from WSJ, WordNet senses)

The juryNN1 furtherRB2 saidVB1 in termNN2
endNN2 presentmentsNN1 that the
City_Executive_ Committee1 , which hadVB4
over-allJJ2 chargeNN6 of the electionNN1 ,
deservesVB1 the praiseNN1 and thanksNN1
of the City_of_Atlanta1 for the mannerNN1 in
which the electionNN1 was conductedVB1 .
8
WSD Difficulties

Example (from WSJ, WordNet senses)

juryNN1 furtherRB2 saidVB1 termNN2
endNN2 presentmentsNN1 hadVB4
over-allJJ2 chargeNN6 electionNN1
deservesVB1 praiseNN1 thanksNN1
mannerNN1 electionNN1 conductedVB1 .
9
WSD Difficulties

Example (from WSJ, WordNet senses)

The jury(2) further(5) said(11) in term(6)
end(15) presentments(3) that the City_Executive_
Committee , which had(21) over-all(2) charge(15)
of the election(2) , deserves the praise(2) and
thanks(2) of the City_of_Atlanta for the
manner(3) in which the election(2) was
conducted(5) .
10
WSD ML Approach

Utility?
Useful for IR / IE / Semantic parsing / Knowledge
acquisition?
Accurately resolving WSD is more difficult that
most of the NLP tasks for which is potentially
helpful
Evaluation Exercises for WSD Senseval-1/2/3
Senseval-3 collocated with ACL-2004
2 major types of task lexical sample,
allwords
10 different languages 1 multilingual lexical
sample task
Several new tasks Automatic subcategorization
acquisition, WSD of WordNet glosses, Semantic
Roles (English and Swedish), Logic Forms, etc.

11
Word Sense Disambiguation

Our implication in Senseval-3
(TALP research group)
As organizers
Lexical sample tasks for Catalan and Spanish
Coarse sense dictionary developed for the tasks
with additional information (collocations,
examples, etc.)
Manual annotation of about 300 examples for 50
different words in each language. Context of 3
sentences. Also POS and lemma annotation
Large corpus of about 1,500 unnanotated examples
for each word
Best results 85 accuracy
But nothing new was presented!!!

12
Word Sense Disambiguation

As participants
English lexical sample task SVMs, constraint
classification, thorough feature optimization and
parameter tuning, (semantically) rich feature
set. Accuracy 71.6 - 78.2, state-of-the-art.
English allwords task combination (cascade
weighted voted scheme) of several supervised and
knowledge based modules. Supervised trained on
frequent words of the SemCor corpus. Knowledge
based modules rely on WordNet and WordNet
Domains. Accuracy 62.40 (67.4)
Desambiguation of WordNet glosses (best results)
Five papers already available. Also resources
(datasets and dictionaries) will be also
available after the workshop in July.

13
New Direction

Allwords WSD in context

... The juryNN1 furtherRB2 saidVB1 in
termNN2 endNN2 presentmentsNN1 that the
City_Executive_ Committee1 , which hadVB4
over-allJJ2 chargeNN6 of the electionNN1 ,
deservesVB1 the praiseNN1 and thanksNN1
of the City_of_Atlanta1 for the mannerNN1 in
which the electionNN1 was conductedVB1 . ...
14
Allwords WSD in context

Example (WSJ, only nouns)

jury term end presentments charge
election praise thanks manner
election
15
Allwords WSD in context

Example (WSJ, only nouns)

jury term end presentments charge
election praise thanks manner
election
One sense per discourse constraint
16
Allwords WSD in context

Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Sense pairs likely to occur together
17
Allwords WSD in context

Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Uncompatible sense pairs
18
Allwords WSD in context

Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Lots of irrelevant/unknown sense pairs
19
Allwords WSD in context

Selectional preferences
To produce compatibility constraints between
verbs and subject/object head nouns
For instance when money1 appears as object the
preferred verbs are raise4 (1.44), take_in5,
collect2 (0.45), earn2, garner2 (0.23),
Need of syntactic information

20
Allwords WSD in context

A very good starting point
Funding MEANING, European research project
Resources MCR, including WordNets from different
languages, ontologies (Domains, SUMO,
TopOntology, SemFile) linked to WordNet synsets,
selectional preferences, etc.
Tools the Senseval-3 allwords WSD system and all
its components
People Lluís Villarejo (PhD student at TALP)
ML approach Inference Learning with Linear
Constraints

21
Allwords WSD in context

Potential problems
Computational requirements
Soft constraints
Lots of irrelevant sense pairs
Can compatibility constraints be reliably
estimated from existing labeled corpora?
We have to codify only the most relevant
constraints between pairs of related words at a
coarse level of granularity (very general
semantic class labels)

22
Allwords WSD in context

Current status
Semantic-class attributes of the context words
have already been incorporated as features for
capturing interactions gain 1-2 points (but
context words are very ambiguous)
Training/testing the system assuming that we know
the actual senses of context words (upper bounds)
(near) Future
Inference on top of classifiers output
Learning with global feedback (coming from
inference)

23
Thanks again for your attention!!!

Write a Comment

User Comments (0)