Aquesta - PowerPoint PPT Presentation

About This Presentation
Title:

Aquesta

Description:

Aquesta s una prova petita – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 24
Provided by: llu5
Category:
Tags: aquesta

less

Transcript and Presenter's Notes

Title: Aquesta


1
Word Sense Disambiguation Another NLP working
problem for learning with constraints Lluís
Màrquez TALP, LSI, Technical University of
Catalonia UIUC, June 10 2004
2
Word Sense Disambiguation
  • The problem
  • WSD is the problem of assigning the correct
    meaning to the words occurring in a text or
    discourse (sense tagging)
  • Example
  • He was mad about stars at the age1 of nine
  • About 20,000 years ago the last ice age2
    ended
  • age1 the length of time something (or someone)
    has existed
  • age2 a historic period
  • Origin in the beginning of AI (60s) around first
    MT models
  • Renewed interest with the explosion of
    statistical and ML-based approaches to NLP (90s)

3
Word Sense Disambiguation
  • Usual approaches
  • Supervised learning (ML) multiclass
    classification problem word-experts. Results
    about 75 accuracy on subsets of selected
    polysemous words. Sometimes better (over 90) on
    some specific words
  • Unsupervised, knowledge-based heuristic
    rules based on preexisting knowledge sources
    (WorNet, MRDs, multilingual aligned corpora,
    etc.). Accuracy around 60 (allwords WSD)
  • Combined approaches 65 (allwords WSD)
  • Supervised methods are better but difficult to
    apply to allwords WSD

4
WSD ML Approach
  • Usual Features
  • Local context patterns (POS, words, lemmas)
  • the ltagegt of, ltagegt CD
  • ltagegt limit, mean ltagegt
  • Broad context features Bag of (relevant) words
  • Atomic occurs in the sentence
  • Dark occurs in the sentence
  • Also syntactic features capturing
    predicate-argument relations

5
WSD ML Approach
  • Main difficulties
  • Each word is a classification problem gt data
    scarceness
  • High granularity of sense repositories used gt
    many classes
  • Difficulty in capturing the semantic information
    present in the context words (sparseness
    problem) which are also ambiguous (no
    interactions between word-classifiers have been
    exploited).

6
WSD Difficulties
  • Example (from WSJ)

The jury further said in term end presentments
that the City Executive Committee, which had
over-all charge of the election, deserves the
praise and thanks of the City of Atlanta for the
manner in which the election was conducted.
7
WSD Difficulties
  • Example (from WSJ, WordNet senses)

The juryNN1 furtherRB2 saidVB1 in termNN2
endNN2 presentmentsNN1 that the
City_Executive_ Committee1 , which hadVB4
over-allJJ2 chargeNN6 of the electionNN1 ,
deservesVB1 the praiseNN1 and thanksNN1
of the City_of_Atlanta1 for the mannerNN1 in
which the electionNN1 was conductedVB1 .
8
WSD Difficulties
  • Example (from WSJ, WordNet senses)

juryNN1 furtherRB2 saidVB1 termNN2
endNN2 presentmentsNN1 hadVB4
over-allJJ2 chargeNN6 electionNN1
deservesVB1 praiseNN1 thanksNN1
mannerNN1 electionNN1 conductedVB1 .
9
WSD Difficulties
  • Example (from WSJ, WordNet senses)

The jury(2) further(5) said(11) in term(6)
end(15) presentments(3) that the City_Executive_
Committee , which had(21) over-all(2) charge(15)
of the election(2) , deserves the praise(2) and
thanks(2) of the City_of_Atlanta for the
manner(3) in which the election(2) was
conducted(5) .
10
WSD ML Approach
  • Utility?
  • Useful for IR / IE / Semantic parsing / Knowledge
    acquisition?
  • Accurately resolving WSD is more difficult that
    most of the NLP tasks for which is potentially
    helpful
  • Evaluation Exercises for WSD Senseval-1/2/3
  • Senseval-3 collocated with ACL-2004
  • 2 major types of task lexical sample,
    allwords
  • 10 different languages 1 multilingual lexical
    sample task
  • Several new tasks Automatic subcategorization
    acquisition, WSD of WordNet glosses, Semantic
    Roles (English and Swedish), Logic Forms, etc.

11
Word Sense Disambiguation
  • Our implication in Senseval-3
    (TALP research group)
  • As organizers
  • Lexical sample tasks for Catalan and Spanish
  • Coarse sense dictionary developed for the tasks
    with additional information (collocations,
    examples, etc.)
  • Manual annotation of about 300 examples for 50
    different words in each language. Context of 3
    sentences. Also POS and lemma annotation
  • Large corpus of about 1,500 unnanotated examples
    for each word
  • Best results 85 accuracy
  • But nothing new was presented!!!

12
Word Sense Disambiguation
  • As participants
  • English lexical sample task SVMs, constraint
    classification, thorough feature optimization and
    parameter tuning, (semantically) rich feature
    set. Accuracy 71.6 - 78.2, state-of-the-art.
  • English allwords task combination (cascade
    weighted voted scheme) of several supervised and
    knowledge based modules. Supervised trained on
    frequent words of the SemCor corpus. Knowledge
    based modules rely on WordNet and WordNet
    Domains. Accuracy 62.40 (67.4)
  • Desambiguation of WordNet glosses (best results)
  • Five papers already available. Also resources
    (datasets and dictionaries) will be also
    available after the workshop in July.

13
New Direction
  • Allwords WSD in context

... The juryNN1 furtherRB2 saidVB1 in
termNN2 endNN2 presentmentsNN1 that the
City_Executive_ Committee1 , which hadVB4
over-allJJ2 chargeNN6 of the electionNN1 ,
deservesVB1 the praiseNN1 and thanksNN1
of the City_of_Atlanta1 for the mannerNN1 in
which the electionNN1 was conductedVB1 . ...
14
Allwords WSD in context
  • Example (WSJ, only nouns)

jury term end presentments charge
election praise thanks manner
election
15
Allwords WSD in context
  • Example (WSJ, only nouns)

jury term end presentments charge
election praise thanks manner
election
One sense per discourse constraint
16
Allwords WSD in context
  • Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Sense pairs likely to occur together
17
Allwords WSD in context
  • Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Uncompatible sense pairs
18
Allwords WSD in context
  • Example (WSJ, only nouns)

jury term end body of citizens...
word or expression
point in time in which something
ends committee, panel
limited period of time
surface of a three dimensional
object presentments charge election an
accusation of crime... electrical
charge the act of presenting something
a impetuous rush toward someone...
a
pleading
a command to do
something praise thanks manner

acnkowledgement of appreciation
with
the help or owing to
Lots of irrelevant/unknown sense pairs
19
Allwords WSD in context
  • Selectional preferences
  • To produce compatibility constraints between
    verbs and subject/object head nouns
  • For instance when money1 appears as object the
    preferred verbs are raise4 (1.44), take_in5,
    collect2 (0.45), earn2, garner2 (0.23),
  • Need of syntactic information

20
Allwords WSD in context
  • A very good starting point
  • Funding MEANING, European research project
  • Resources MCR, including WordNets from different
    languages, ontologies (Domains, SUMO,
    TopOntology, SemFile) linked to WordNet synsets,
    selectional preferences, etc.
  • Tools the Senseval-3 allwords WSD system and all
    its components
  • People Lluís Villarejo (PhD student at TALP)
  • ML approach Inference Learning with Linear
    Constraints

21
Allwords WSD in context
  • Potential problems
  • Computational requirements
  • Soft constraints
  • Lots of irrelevant sense pairs
  • Can compatibility constraints be reliably
    estimated from existing labeled corpora?
  • We have to codify only the most relevant
    constraints between pairs of related words at a
    coarse level of granularity (very general
    semantic class labels)

22
Allwords WSD in context
  • Current status
  • Semantic-class attributes of the context words
    have already been incorporated as features for
    capturing interactions gain 1-2 points (but
    context words are very ambiguous)
  • Training/testing the system assuming that we know
    the actual senses of context words (upper bounds)
  • (near) Future
  • Inference on top of classifiers output
  • Learning with global feedback (coming from
    inference)

23
Thanks again for your attention!!!
Write a Comment
User Comments (0)
About PowerShow.com