Domain Driven Disambiguation DDD - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Domain Driven Disambiguation DDD

Description:

(Domain Vector) (ART,0.8) ... (SPORT,0.0) Classifier. Classify. Instance (instance) Import Model ... Categorization (text vector) Similarity between synsets' ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 25
Provided by: gliozz
Category:

less

Transcript and Presenter's Notes

Title: Domain Driven Disambiguation DDD


1
  • Domain Driven Disambiguation (DDD)
  • Alfio Gliozzo, Bernardo Magnini, Carlo
    Strapparava.
  • gliozzo,magnini, strappa_at_itc.it
  • ITC-irst
  • http//tcc.itc.it/research/textec/topics/disambig
    uation/

2
Outline 1/2 DDD system architecture
  • General Object Oriented Architecture to easily
    develop a great variety of WSD systems
    implementing different Categorization and Feature
    Extraction (FE) techniques.
  • Domain Driven Disambiguator (DDD) as an instance
    of the general architecture.

3
Outline 2/2 DDD Methodology
  • WordNet Domains
  • Textual proprieties of Semantic Domains.
  • Text categorization using WN-Domains
  • DDD classifier

4
A general WSD System Architecture
Instance (Feat1,Value1) (FeatN,ValueN)
Text
  • Corpus
  • Reader
  • get-text
  • (text-id)
  • Classifier
  • Classify-
  • Instance
  • (instance)
  • Import Model
  • (model)
  • Build Model
  • (instances)
  • Feature
  • Extractor
  • Get-token-
  • features
  • (text,tok-id)

sense
Sense
Model
Corpus
Supervised Learning
ARFF
5
DDD system
  • Is a specific implementation of the general WSD
    system architecture.
  • The modules feature extractor and classifier are
    specialized..

WN-D
6
Domain Driven Disambiguation
  • Underlying assumption
  • The polisemy of terms in domain specific corpora
    tends to disappear.
  • Knowing in advance the relevant semantic
    domain(s) of a text makes Word Sense
    Disambiguation easier.
  • Semantic domains play an important role in the
    disambiguation process

7
WordNet Domains 1/3(Magnini and Cavaglià,2000)
  • Developed at ITC-Irst (Magnini and Cavaglià,
    2000)
  • All synsets in WordNet have been annotated by a
    Domain Label (i.e SPORT, ART, ECONOMY, )
  • FACTOTUM label is used for generic synsets (not
    belonging to any domain).

8
WordNet Domains 2/3 Polisemy Reduction
9
WordNet Domain 3/3Semantic Domains Organization
  • 163 Domain labels collected from dictionaries
  • Four level hierarchy (Based on Dewey Decimal
    Classification)
  • Mapping between DDC and WND (completeness)
  • 41 basic domains used for the experiments (2nd
    level)

10
Domains and Texts (Magnini et al. 2002)
  • Many words in a document belongs to the domain of
    the text (statistics on Semcor).
  • One Domain per Discourse is stronger than One
    Sense per Discourse (10 vs. 31 exceptions in
    Semcor)

11
TC using WND 1/2 (Magnini et al., 2001)
  • Domain frequency is evaluated for each domain
    (excluding FACTOTUM) counting its occurrences
    inside the text.
  • Domain relevance is the normalized domain
    frequency value (in the range 0,1) computed on
    windows of fixed size (20-100 tokens).
  • Text vectors collects relevance values for each
    domain
  • Example ((sport . 0.93) (economy . 0.70) (art
    . 0))

12
Text Categorization using WND2/2 example

From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
13
Text Categorization using WND2/2 example
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
14
Text Categorization using WND2/2 example
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
15
Text Categorization using WND2/2 example
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
16
Sense Vector
  • Represents the domain/s of the typical contexts
    in which the sense occurs
  • DV for bank1 is ((economy . 1.75) (sport . 0.2)
    (law . 0) ))
  • DV for a generic sense (FACTOTUM) is uniform
  • Can be evaluated from
  • Sense Tagged Corpora summing the Text Vectors of
    the contexts of the examples (supervised)
  • WordNet-Domains (unsupervised)

17
DDD Classifier
  • Two steps
  • Context Categorization (text vector)
  • Similarity between synsets and contexts vectors
  • Example
  • Bank1 depository financial institution ...
  • Bank2 sloping land
  • TEXT He cashed a check at the bank

Dot Product
1,731878
0,06185
18
IRST results at SENSEVAL-2
19
Domains and Texts (Magnini et al. 2002)
  • Many words in a document belongs to the domain of
    the text (statistics on Semcor).

20
Conclusions and Future Works
  • DDD architecture is an experimental framework to
    easily design and develop several WSD techniques.
  • At the moment the Domain Oriented Methodology has
    been (partially) explored and implemented.
  • We plan to improve the system using machine
    learning techniques (i.e Support Vector Machines,
    AdaBoost, )
  • We plan to integrate the domain approach with
    other WSD techniques to take into account also
    syntactic information.

21
Other directions
  • Acquisition of domain information from Text
    categorized corpora (Magnini et al., 2002a)
  • Investigations on the connections polisemy,
    synonimy and domains
  • Using domain information in order to improve and
    develop ML and memory based TC and IR techniques
  • WDD

22
References 1/2
  • (Magnini and Cavaglià, 2000) Bernardo Magnini
    and Gabriela Cavaglia'. Integrating Subject Field
    Codes into WordNet. In Gavrilidou M., Crayannis
    G., Markantonatu S., Piperidis S. and Stainhaouer
    G. (Eds.) Proceedings of LREC-2000, Second
    International Conference on Language Resources
    and Evaluation, Athens, Greece, 31 MAY- 2 JUNE
    2000, pp. 1413-1418.
  • (Magnini and Strapparava, 2000) Bernardo Magnini
    and Carlo Strapparava. Experiments in Word Domain
    Disambiguation for Parallel Texts. Proceedings of
    the ACL workshop on Word Senses and
    Multilinguality, pag. 27-33, October 7, 2000,
    Hong Kong.
  • (Magnini and Strapparava,2001) Bernardo Magnini
    and Carlo Strapparava. Using WordNet to Improve
    User Modelling in a Web Document Recommender
    System. Proceedings of the NAACL Workshop,
    "WordNet and Other Lexical Resources
    Applications, Extensions and Customizations",
    Pittsburgh, June 3-4, pp. 132-137, 2001. Also
    ITC-irst Technical Report Ref. No. 0106-02.

23
References 2/2
  • (Magnini et al.,2001) Bernardo Magnini, Carlo
    Strapparava, Giovanni Pezzulo and Alfio Gliozzo.
    Using Domain Information for Word Sense
    Disambiguation. Proceedings of Senseval-2, Second
    international Workshop on Evaluationg Word Sense
    Disambiguation Systems, pag. 111-114, 5-6 July,
    2001, Toulose, France.
  • (Magnini et al., 2002a) Bernardo Magnini, Carlo
    Strapparava, Giovanni Pezzulo and Alfio Gliozzo.
    Comparing ontology-Based and Corpus-Based Domain
    Annotations in WordNet in Proceedings of the
    First Global WordNet Conference, Mysore(India),
    2002.
  • (Magnini et al.,2002b) B. Magnini, C.
    Strapparava, G. Pezzulo, A. Gliozzo (2002) "The
    Role of Domain Information in Word Sense
    Disambiguation". to appear in Special Issue of
    Journal of Natural Language Engineering on
    "Evaluating Word Sense Disambiguation Systems".
  • (Gliozzo, 2001) Alfio Gliozzo, Ruolo dei campi
    semantici nella struttura di un lessico
    computazionale utilizzo per la disambiguazione
    automatica di senso Thesis, University of
    Bologna, 2001

24
  • Domain Driven Disambiguation (DDD)
  • Alfio Gliozzo, Bernardo Magnini, Carlo
    Strapparava.
  • gliozzo,magnini, strappa_at_itc.it
  • ITC-irst
  • http//tcc.itc.it/research/textec/topics/disambig
    uation/
Write a Comment
User Comments (0)
About PowerShow.com