Title: German Rigau i Claramunt
1Ontologies
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
2Ontologies
Outline
- WordNet (Miller et al. 90, Fellbaum 98)
- EuroWordNet (Vossen et al. 98)
- Spanish WordNet
- Combining Methods (Atserias et al. 97)
- Mapping hierarchies (Daudé et al. 01)
- Mikrokosmos (Viegas et al. 96)
- Cyc (Malesh et al. 96)
- WordNet 2 (Harabagiu 98)
- MindNet (Richardson et al. 97)
- ThoughtTreasure (Mueller 00)
- Meaning ...
3WordNet EuroWordNet
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
4WordNet EuroWordNetWordNet
- Universidad de Princeton (Miller et al. 1990)
- Conceptos lexicalizados (parabras, lexíes)
- Relacionados entre sí por relaciones semánticas
- sinonimia
- antonimia
- hiperonimia-hiponimia
- meronimia
- implicación
- causa
- ...
5WordNet EuroWordNetRelaciones Semánticas de
WN1.5
- Sinonimia
- Conceptos Lexicalizados (SYNSETS)
- Noción débil de sinonimia Sinonimia en contexto
- Synset Conjunto de palabras o lexías que en un
contexto dado expresan un concepto - Hiperonimia / Hiponimia
- Relación de clase a subclase
6WordNet EuroWordNetRelacions Semàntiques de
WN1.5
- Meronimias
- Parte componente
- mano?brazo
- Elemento de colectividad
- persona?gente
- Sustancia
- periódico?papel
7WordNet EuroWordNetRelaciones Semánticas de
WN1.5
- Antonimia
- grande?pequeño
- Causa
- matar?morir
- Implicación
- divorciarse?casarse
- Derivación
- presidencial?presidente
- Similitud
- bueno?positivo
8WordNet EuroWordNetEjemplo WordNet
ltconveyancegt
ltvehiclegt
ltdoorlockgt
ltcar doorgt
ltmotor vehicle, automovile,...gt
ltcruiser, squad car, patrol car, ...gt
ltcruiser, squad car, patrol car, ...gt
ltcab, taxi, hack, ...gt
9WordNet EuroWordNetEuroWordNet
- Proyecto LE-2 4003
- Telematics Application Programme de la UE
- Redes semánticas de diversas lenguas
- Integradas e interconectadas
- Inglés Universidad de Sheffield
- Holandés Univ. de Amsterdam
- Italiano I.L.C. de Pisa
- Español UB, UPC, UNED.
- Computers and the Humanities
- (Vol.monográfico,1998)
- http//www.hum.uva.nl/ewn/
10WordNet EuroWordNetExtensiones EuroWordNet
- EWN2
- Alemán, Francés, Checo, Sueco, Estonio
- Proyecto ITEM
- Castellano, Catalán, Vasco
- CREL (Centre de Referència dEnginyeria
Lingüística) - Catalán (UB, UPC)
11WordNet EuroWordNetAplicaciones
- Desarrollo de recursos Básicos
- Tratamiento interlingüístico de la información
- - Sistemas multilingües de recuperación de
información (p.e., Internet) - - Módulo léxico-semántico de los sistemas de
ingeniería lingüística - ? Extracción de información
- ? Traducción automática
12WordNet EuroWordNetRequisitos de Diseño
- Preservación de las relaciones semánticas
específicas de cada lengua - Máxima compatibilidad entre los diferentes
recursos - Relativa independencia de los WordNets
- en el proceso de construcción
- en el resultado final
13(No Transcript)
14WordNet EuroWordNetComponentes de EuroWordNet
- Núcleo
- El ILI
- La Top Concept Ontology (TCO)
- Ontología de dominios (DO)
- Periferia
- WordNets específicos
15WordNet EuroWordNetInterlingual Index of
EuroWordNet
- Colección no estructurada de elementos
- Ligados con
- al menos, un synset de un EWN
- un elemento de la TCO o DO
- Asociados a synsets de WN 1.5
16WordNet EuroWordNetTop Concept Ontology of
EuroWordNet
- Jerarquía de conceptos independientes de la
lengua - distinciones semánticas objeto, lugar, dinámico,
- abstracta (no léxica)
- Superpuesta al ILI
- Tres tipos de entidades
- Primer orden entidades concretas
- Segundo orden situaciones estáticas o dinámicas
- Tercer orden proposiciones abstractas
17WordNet EuroWordNetTop Concept Ontology of
EuroWordNet
18WordNet EuroWordNetDomain Ontology of
EuroWordNet
- Jerarquía de etiquetas de dominio
- Reducción de la polisemia
- Dominios
- Tráfico
- Tráfico rodado, tráfico aéreo
- Información Internacional
- Micología
- Medicina
19WordNet EuroWordNetRelaciones de EuroWordNet
- Riqueza superior a WN
- Entre
- synsets (módulos monolingües)
- registros ILI (multilingües)
- actuar-1 EQ-SYNONYM behave in a certain
manner - registros ILI y TCO o OD
20WordNet EuroWordNetRelaciones
Interlingüísticas de EuroWordNet
21WordNet EuroWordNetRelaciones de EuroWordNet
22Spanish WordNetBuilding Process
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
23Spanish WordNetGeneral Methodology
- 1) Mapping to WN1.5
- manual work
- automatic derivation of equivalents, using
bi-lingual dictionaries - 2) Manual correction
- 3) Re-structuring
24Spanish WordNetMain Steps First Core (Manual
Translation)
- Nouns
- A) WN1.5s Tops File plus first level of hyponyms
(about 800 synsets). - B) The rest of EWNs Common Base Concepts (which
were not in our set). - C) Manual translation of synsets intermediate
between (A) and (B) following WN1.5 hyerarchy
¾thus building a compact taxonomy equivalent to
WN1.5 without gaps¾ - Verbs
- Manual translation of EWNs Base Concepts (about
150 synsets)
25Spanish WordNetMain Steps Subset 1
(Semi-automatic)
- Nouns
- Applying authomatic methods using bi-lingual
dictionaries - Manual validation of several subsets to check if
the link is correct - Deriving a Confidence Score (CS) for every
authomatic method (heuristic) - Selecting pairs synset-word above 85 CS
- Some manual correction of this Subset 1 (mainly,
filling gaps) - Verbs
- 3600 English verbs connected to WN1.5 senses and
ambiguously translated to Spanish are manually
inspected and disambiguated
26Spanish WordNetMain Steps Subset 1 (Results 1)
27Spanish WordNetMain Steps Subset 1 (Results 2)
28Spanish WordNetMain Steps Subset 2
- Main goals
- enhance the quality of the Subset 1 by manual
revision - extend it by manual building of synsets
- 4 Sub-tasks
29Spanish WordNetMain Steps Subset 2
- 1) Covering manually those gaps in the hyponymy
chains covered by other languages - 2) Manual cleaning of some automatically-generated
variants. - (a) pairs of synsets which are adjacent in the
hyponymy chain and share at least one variant. - deleting redundant variants
- re-locating to either pre-existant or newly
created synsets - (b) multi-word expressions present in synsets.
- Deleting non-lexicalized
30Spanish WordNetMain Steps Subset 2
- 3) Manual addition of new vocabulary which has
been considered relevant. - It mainly comes from the Catalan WordNet since
we are building both wordnets in parallell, we
detected those synsets which were built for
Catalan and not for Spanish - 4) Manual addition of cross-part of speech
relations between nominal and verbal synsets. - This work has been based mainly on noun-verb
pairs obtained by means of morphological
criteria. (Work carried out by UNED Madrid-)
31Spanish WordNetMain Steps Subset 2 (Results)
32Spanish WordNetMain Steps Subset 2 (Results)
33Spanish WordNetMain Steps Beyond Subset 2
- Massive Manual Checking (from Nov98)
- Using WEI
- Variants automatically generated
- Filling gaps in the hierachy
- New vocabulary
- New Adjectives
34(No Transcript)
35Spanish WordNetMain Steps Beyond Subset 2
36Spanish WordNetMain Steps Beyond Subset 2
37Spanish WordNetMain Steps Parole Coverage
38Spanish WordNetCurrent Figures
- Spanish, Catalan, Basque, (English)
- http//nipadio.lsi.upc.es/wei2.html
39Combining Multiple Methods for the Automatic
Construction of Multilingual WordNets
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
40Combining Multiple Methods ...Outline
- Ten class methods
- Four monosemic criteria
- Four polysemic criteria
- two hybrid criteria
- Three conceptual distance methods
- CD1 using pairwise word coocurrences
- CD2 using headword and genus
- CD3 using bilingual Spanish entries with
multiple translations
41Combining Multiple Methods ...Ten class methods
42Combining Multiple Methods ...Ten class methods
SW
EW
Synset
Synset
Synset
Synset
SW
EW
SW
43Combining Multiple Methods ...Ten class methods
SW
EW
Synset
Synset
Synset
Synset
SW
EW
Synset
EW
SW
Synset
44Combining Multiple Methods ...Ten class methods
- Variant criterion
- Field criterion
lt..., EW, ..., EW, ...gt
SW
lt..., headword-EW, ..., Ind-EW, ...gt
SW
45Combining Multiple Methods ...Ten class methods
46Combining Multiple Methods ...Conceptual
Distance methods
- Conceptual Distance (Agirre et al. 94)
- length of the shortest path
- specificity of the concepts
- using WordNet
- Bilingual dictionary
47Combining Multiple Methods ...Conceptual
Distance methods
- Three conceptual distance methods
- CD1 using pairwise word coocurrences
- CD2 using headword and genus
- CD3 using bilingual Spanish entries with
multiple translations
48Combining Multiple Methods ...Conceptual
Distance methods (Example CD2)
ltentitygt
ltobject, ...gt
ltartifact, artefactgt
lthouse, lodginggt
ltreligious residence, cloisergt
abadía_1_2 Iglesia o monasterio regido por un
abad o abadesa (abbey, a church or a monastery
ruled by an abbot or an abbess)
49Combining Multiple Methods ...Conceptual
Distance methods (Example CD2)
ltentitygt
ltobject, ...gt
ltartifact, artefactgt
ltstructure, constructiongt
lthouse, lodginggt
ltbuilding, edificegt
ltplace of worship, ...gt
ltreligious residence, cloisergt
ltchurch, church buildinggt
ltabbeygt 06 ARTIFACT
abadía_1_2 Iglesia o monasterio regido por un
abad o abadesa (abbey, a church or a monastery
ruled by an abbot or an abbess)
50Combining Multiple Methods ...Three CD methods
51Combining Multiple Methods ...Combining methods
52Combining Multiple Methods ...Resulting Spanish
WordNets
53Mapping Conceptual Hierarchies Using Relaxation
Labelling
- German Rigau i Claramunt
- TALP Research Center
- UPC
54Mapping Conceptual Hierarchies using Relaxation
LabellingOutline
- Setting
- Relaxation Labelling Algorithm
- Constraints
- Experiments Results I (multilingual)
- Experiments Results II (monolingual)
- Further work
55Mapping Conceptual Hierarchies using Relaxation
LabellingSetting
56Mapping Conceptual Hierarchies using Relaxation
LabellingSetting
C1
C2
C3
C4
C5
C6
57Mapping Conceptual Hierarchies using Relaxation
LabellingSetting
- Connecting already existing Hierarchies
- Relaxattion labelling Algorithn
- Constraints
- Between
- Spanish taxonomy automatically derived from an
MRD (Rigau et al. 98) - WordNet
- using a bilingual MRD
58Mapping Conceptual Hierarchies using Relaxation
LabellingSetting
animal
(Tops ltanimal, animate_being, ...gt)
(person ltbeast, brute, ...gt)
(person ltdunce, blockhead, ...gt)
ave
(animal ltbirdgt)
(artifact ltbird, shuttle, ...gt)
(food ltfowl, poultry, ...gt)
(person ltdame, doll, ...gt)
faisán
(animal ltpheasantgt)
(food ltpheasantgt)
rapaz
(animal ltbirdgt)
(artifact ltbird, shuttle, ...gt)
(food ltfowl, poultry, ...gt)
(person ltdame, doll, ...gt)
59Mapping Conceptual Hierarchies using Relaxation
LabellingOutline
- Setting
- Relaxation Labelling Algorithm
- Constraints
- Experiments Results I (multilingual)
- Experiments Results II (monolingual)
- Further work
60Mapping Conceptual Hierarchies using Relaxation
LabellingRelaxation Labelling Algorithm
- Iterative algorithm for function optimization
based on local information - it can deal with any kind of constraints
- variables (senses of the taxonomy)
- labels (synsets)
- Finds a weight assignment for each possible label
for each variable - weights for the labels of the same variable add
up to one - weigth assignation satisfies -to the maximum
possible extent- the set of constraints
61Mapping Conceptual Hierarchies using Relaxation
LabellingRelaxation Labelling Algorithm
- 1) Start with a random weight assigment
- 2) Compute the support value for each label of
each variable (according to the constraints) - 3) Increase the weights of the labels more
compatible with context and decrease those and
decrease those of the less compatible labels. - 4) If a stopping/convergence is satisfied, stop,
- otherwiese go to step 2.
62Mapping Conceptual Hierarchies using Relaxation
LabellingOutline
- Setting
- Relaxation Labelling Algorithm
- Constraints
- Experiments Results I (multilingual)
- Experiments Results II (monolingual)
- Further work
63Mapping Conceptual Hierarchies using Relaxation
LabellingConstraints
- Rely on the taxonomy structure
- Coded with three characters
- X Spanish Taxonomy, I (immediate),
- Y English Taxonomy, A (ancestor)
- X Relation, E (hypernym), O (hyponym), B (both)
- Examples
IIE
AAB
64Mapping Conceptual Hierarchies using Relaxation
LabellingHierarchical Constraints
NAACL2001
65Mapping Conceptual Hierarchies using Relaxation
LabellingHierarchical Constraints
AIE
AIB
AIO
NAACL2001
66Mapping Conceptual Hierarchies using Relaxation
LabellingHierarchical Constraints
IAE
IAB
IAO
NAACL2001
67Mapping Conceptual Hierarchies using Relaxation
LabellingHierarchical Constraints
AAE
AAB
AAO
NAACL2001
68Mapping Conceptual Hierarchies using Relaxation
LabellingOutline
- Setting
- Relaxation Labelling Algorithm
- Constraints
- Experiments Results I (multilingual)
- Experiments Results II (monolingual)
- Further work
69Combining Multiple Methods ...RANLP97Eight
class methods
Prec.
Cov.
SW
EW
Synset 92 5
Synset 89 1
Synset
Synset 89 2
SW
EW
SW
70Combining Multiple Methods ...RANLP97Eight
class methods
Prec.
Cov.
SW
EW
Synset 80 8
Synset 75 2
Synset
Synset 58 17
SW
EW
Synset 61 60
EW
SW
Synset
71Combining Multiple Methods ...RANLP97
Experiments Results
- Poly TOK, FOK TOK, FNOK total
- animal 279 (90) 30 (91) 209 (90)
- food 166 (94) 3 (100) 169 (94)
- cognition 198 (67) 27 (90) 225 (69)
- communication 533 (77) 40 (97) 573 (78)
- all TOK, FOK TOK, FNOK total
- animal 424 (93) 62 (95) 486 (90)
- food 166 (94) 83 (100) 249 (96)
- cognition 200 (67) 245 (90) 445 (82)
- communication 536 (77) 234 (97) 760 (81)
72Combining Multiple Methods ...RANLP97
Experiments Results
piel
(substance ltskin, fur, peelgt)
marta
(substance ltsable, marte, coal_backgt)
visón
(substance ltmink, mink_coatgt)
73Mapping Conceptual Hierarchies using Relaxation
LabellingOutline
- Setting
- Relaxation Labelling Algorithm
- Constraints
- Experiments Results I (multilingual)
- Experiments Results II (monolingual)
- Further work
74A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Generalized Constraints
- All Relationships
- also-see, similar-to, attribute, antonym, etc.
R
R
75A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Generalized Constraints
- Non-structural constraints
- W number of word coincidences
- G word coincidences in glosses
- F number of frame coincidences (verbs)
76A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01POS mapping depencences
Nouns
Adjectives
Adverbs
Verbs
77A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Constraints for Verbs
- Structural constraints
- hyper/hyponymy
- antonymy
- also-see
- Non-structural constraints
- W, G and F
78A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Constraints Adjectives
- Structural constraints
- Adj-to-Adj
- antonymy, similar-to and also-see
- Adj-to-Verb
- participle-of
- Adj-to-Noun
- pertains and attribute
- Non-structural constraints
- W and G
79A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Constraints Adverbs
- Structural constraints
- Adv-to-Adv
- antonymy
- Adv-to-Adj
- derived
- Non-structural constraints
- W and G
80A Complete... ACL00, NAACL01Example extra-POS
WN1.6
00843344a evangelical evangelistic
WN1.5
Similar to
02025107a evangelical evangelistic
00842521a enthusiastic
pertainym
02025107a evangelical
04237485n Gospel Gospels evangel
pertainym
04853575n Gospel Gospels evangel
81A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Example extra-POS
82A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01 Results
- Basic constraint set structural constraints
- Nouns AA hyper/hyponym
- Verbs AA hyper/hyponym, II also-see
- Adjectives II antonymy, similar-to, also-see
- Adverbs II antonymy
83A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01 Results
- Basic constraint set structural constraints
Precision - recall
84A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01 Results
- Basic constraint set W, G and F for verbs
Precision - recall
85A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01Results
- Basic extra-POS relationships
Precision - recall
86A Complete WN1.5 to WN1.6 Mapping ... ACL00,
NAACL01 Results
- Basic extra-POS relationships WGF
Precision - recall
87Mapping Conceptual Hierarchies using Relaxation
Labelling Conclusions
- First complete mapping between Wordnet versions
- Combining structural and non-structural
information - Robust approach based on local information, but
with global effects - Incremental POS approach
- http//www.lsi.upc.es/nlp
- 90 downloads (since November 2000)
88Mapping Conceptual Hierarchies using Relaxation
Labelling Further Work
- mapping other structures
- WN-EDR, WN-LDOCE, etc.
- Other language taxonomies to EuroWordNet
- SpanishEWN to WN1.6
- symmetrical philosophy rather than source-target
89Mikrokosmos
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
90Mikrokosmos
Outline
- Introduction
- Representational Issues
- The Lexicon
- The Ontology
- Acquisition Process
- Lexicon Acquisition
- Guidelines
- Ontology/Lexicon Trade-off
- Semantics in Action
91Mikrokosmos
Introduction
- Knowledge Base Machine Translation (KBMT)
- CRL, NMSU
- 5,000 concepts
- Events
- Objects
- Properties
- 7,000 Spanish word senses
- 40,000 word senses
- after expansion with productive Lexical Rules
- comprar -gt comprador, comprable, ...
- Text Meaning Representation
92Mikrokosmos
Representational Issues The Lexicon
- Typed Feature Structures (Pollard and Sag 87)
- language-dependant
- 10 zones
- phonology
- orthography
- morphology
- Syntactic (subcategorization)
- Semantic (Lexical Semantic Representation)
- syntax-semantic linking
- stylistics
- paradigmatic
- syntacmatic
93Mikrokosmos
Representational Issues The Lexicon
- Adquirir-V1
- syn subj cat NP
- obj cat NP
- sem acquire
- agent HUMAN
- theme OBJECT
- Adquirir-V2
- syn subj cat NP
- obj cat NP
- sem acquire
- agent HUMAN
- theme INFORMATION
94Mikrokosmos
Representational Issues The Ontology
-
- Taxonomic multi-hierarchical
- 14 local or inherited links in average
- language-impartial
- EVENTS, OBJECTS, PROPERTIES
- Methodology Guidelines
95Mikrokosmos
Representational Issues The Ontology
- ACQUIRE
- DEFINITION The transfer of possession event
where the - agent transfers an object to its possession
- IS - A TRANSFER-POSSESSION
- SOURCE HUMAN PLACE
- THEME OBJECT (NOT HUMAN)
- AGENT ANIMAL (DEFAULT HUMAN)
- DESTINATION ANIMAL PLACE (DEFAULT HUMAN)
-
- INHERITED
-
- BENEFICIARY HUMAN
96Mikrokosmos
Acquisition Process The Lexicon
- Multi-lingual
- French, English, Japanese, Russian, Spanish, etc.
- Multi-media
- Multi-process
- Analysis
- Generation (mono and multilingual)
- MT
- Summarization
- IE
- Speech Processing
- Tools
- corpus-search, lookup dictionary, ontology
browser
97Mikrokosmos
Acquisition Process The Ontology
- Guidelines
- 1) Do not add instances as concepts
- Instances do not have their own instances
- Concepts do not have fixed position in
space/time - 2) Do not decompose concepts further
- 3) Use close concepts
- 4) Do not add EVENTs with particular arguments
- 5) Do not add concepts with instance-specific
aspects, - temporal relations
- 6) Do not add language-specific concepts
- 7) Do not add ontologycal concepts for collections
98Mikrokosmos
Acquisition Process Ontology/Lexicon Trade-off
-
- Daily negociations
- lexicon acquirers
- ontology acquirers
-
- Possibilities
- one-to-one mapping
- lexicon unspecification
- lexicon ontology balance
99Mikrokosmos
Acquisition Process Ontology/Lexicon Trade-off
- one-to-one mapping
-
-
- Problems
- Lexical every word in a language is a concept
- conceptual cuire in french is not ambiguous
-
PREPARE-FOOD INST COOKING-EQUIPMENT
COOK INST STOVE
BAKE INST OVEN
cook cuire sur le feu
bake cuire ou four
100Mikrokosmos
Acquisition Process Ontology/Lexicon Trade-off
- Lexicon Unspecification
-
-
- Problems
- BAKE is not in the ontology
-
PREPARE-FOOD INST COOKING-EQUIPMENT
bake cuire ou four INST OVEN
cook cuire sur le feu
101Mikrokosmos
Acquisition Process Ontology/Lexicon Trade-off
PREPARE-FOOD INST COOKING-EQUIPMENT
BAKE INST OVEN
FRY INST STOVE INST FRYING-PAN
cook cuire
bake
102Mikrokosmos
Semantics in Action
- El grupo Roche, a través de su compañía en
España, adquirió Doctor Andreu. - El grupo Roche adquirió Doctor Andreu a través
de su compañía en España. - La adquisición de Doctor Andreu por el grupo
Roche fue hecha a través de su compañía en
España. - ACQUIRE-1 Agent ORGANIZATION-1
- Theme ORGANIZATION-2
- Instrument ORGANIZATION-3
- ORGANIZATION-1 Object-Name Grupo Roche
- ORGANIZATION-2 Object-Name Doctor Andreu
- ORGANIZATION-3 Location España
-
103Mikrokosmos
Semantics in Action
- Onto-Search Ontological search mechanism to
check constraints - check-onto(ACQUIRE, EVENT) 1
- since ACQUIRE is a type of EVENT
- check-onto(ORGANIZATION, HUMAN) 0.9
- since ORGANIZATION HAS-MEMBER HUMAN
104Mikrokosmos
Semantics in Action
- 1) a-través-de INSTRUMENT, LOCATION
- adquirir require PHYSICAL-OBJECT
- 2) en LOCATION, TEMPORAL
- España is not a TEMPORAL-OBJECT
- 3) adquirir ACQUIRE, LEARN
- Doctor Andreu is not an INFORMATION
- 4) Doctor Andreu ORGANIZATION, HUMAN
- the Theme of ACQUIRE is not HUMAN
- 5) compañía CORPORATION, SOCIAL-EVENT
- ORGANIZATIONs typically fill the INSTRUMENT slot
of ACQUIRE acts
105Mikrokosmos
Experiment WSD
- Text 1 2 3 4 Mean
- words 347 385 370 353 364
- words/sentence 16.5 24.0 26.4 20.8 21.4
- open-class words 183 167 177 177 176
- ambiguous words 57 42 57 35 48
- syntax 21 19 20 12 18
- correct 51 41 45 34 43
- 97 99 93 99 97
106Mikrokosmos
Experiment WSD
- Text Mean Mean Unseen
- words 364 390
- words/sentence 21.4 26
- open-class words 176 104
- ambiguous words 48 26
- syntax 18 9
- correct 43 23
- 97 97
107WordNet2
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
108WordNet2
Outline
- Introduction
- Text Inferences
- Defining Features
- Plausible inferences
- Inference Rules
- Semantic Paths
- What WordNet cannot do
109WordNet2
Introduction
- (Harabagiu 98)
- Commonse reasoning requires extensive knowledge
- 100 millions of concepts and relations
- WordNet
- represents almost all English words
- 100.000 synsets
- linked by semantic relations
- WordNet2
- each synset has a gloss that, when disambiguated
may increase the number of relations - WordNet glosses into semantic networks
- NEW RELATIONS
110WordNet2
Text Inferences
- German was hungry
- He opened the refrigerator
- hungry (feeling a need or desire to eat)
- eat (take in solid food)
- refrigerator (an appliance in which foods can be
stored at low temperature)
111WordNet2
Defining Features
- Transform each concepts gloss into a graph
where concepts are nodes and lexical relations
are links - ltculturegt (all the knowledge shared by society)
- ltsharegt --AGENT--gt ltsocietygt
- ltdoctorgt (licensed medical practitioner)
- ltmedical practitionergt --ATRIBUTTE--gt ltlicensedgt
112WordNet2
Defining Features
ship
OBJECT
guide
PURPOSE
LOCATION
pilot
person
water
GLOSS
ATTRIBUTE
ATTRIBUTE
difficult
qualified
113WordNet2
Inference Rules
- Rule 1 Rule 2
- VC1 IS-A VC2 VC1 IS-A VC2
- VC2 IS-A VC3 VC2 ENTAIL VC3
- ------------------------- ----------------------
--- - VC1 IS-A VC3 VC1 ENTAIL VC3
- Rule 3 Rule 2
- VC1 IS-A VC2 VC1 IS-A VC2
- VC2 R_IS-A VC3 VC2 R_ENTAIL VC3
- ------------------------- ----------------------
--- - VC1 PLAUSIBLE (not VC3) VC1 EXPLAINS VC3
- 16 1 regles
114WordNet2
Semantic Paths
- 0) Create and load the KB
- 1) Place markers on KB concepts
- 2) Propagate markers
- The algorithm avoids cycles
- 3) Detect collisions
- To each marker collision it corresponds a path
- 4) Extract Inferences
115WordNet2
Semantic Paths
- Inference sequence
- German was hungry
- German felt a desire to eat
- German felt a desire to take in food
- COLLISION Germanhe felt a desire to take
food, stored in an appliance, which he opened - He opened an appliance where food is stored
- He opened the refrigerator
116WordNet2
What WordNet cannot do
- Major WordNet limitations
- 1) The lack of compound concepts
- 2) The small number of causation and
entailment relations - 3) the lack of preconditions for verbs
- 4) the absence of case relations
117ThoughtTreasure
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
118ThoughtTreasure
Overview
- a comprehensive platform for
- NLP English, French
- commonsense reasoning
- A hotel room has a bed, night table, ...
- People has fingernails
- soda is a drink
- one hangs up at the end of a phone call
- the sky is blue
- dogs bark
- someone who is 16 years old is a teenager
119ThoughtTreasure
Overview
- 25,000 concepts organized into a hierarchy
- EVIAN -gt FLAT-WATER -gt DRINKING-WATER
-
- 55,000 words (English, French)
- food lt-gt aliment lt-gt FOOD
-
- 50,000 asertions about concepts
- green-pea is green
-
- 100 scripts
120ThoughtTreasure
Overview
- Text Agents for recognizing names, phones, etc
- mechanisms for learning new words
- X-phile is someone who likes X
- a syntactic parser
- a NL generator
- a semantic parser
- an anaphoric parser
- planning agents for achieving goals
- understanding agents
121ThoughtTreasure
Example
- Who created Bugs Bunny?
- 1.0 (create human-interrogative-pronoun
Bugs-Bunny) - 0.9 (create rock-group-the-Who Bugs-Bunny)
- 1.0 (create Tex-Avery Bugs-Bunny)
- 0.1 (not (create rock-group-the-Who Bugs-Bunny))
122Meaning
- German Rigau i Claramunt
- http//www.lsi.upc.es/rigau
- TALP Research Center
- Departament de Llenguatges i Sistemes Informàtics
- Universitat Politècnica de Catalunya
123Meaning
Overview
- Bases de Conocimiento
- Enriquecimiento automático de EWN (modelos
verbales, etc.) - Aproximación mixta (KB ML)
- Q/A
- Problema
- ambigüedad estructural y léxica
- Aproximación
- localizar automáticamente ejemplos de sentidos
(Leacock et al. 98, Mihalcea y Moldovan 99) - WSD a gran escala (Boosting, SVM, transductivos
) - Acquisición Conocimiento (Ribas 95, McCarthy 01)
124MeaningExploiting EWN Semantic Relations
125MeaningExploiting EWN Semantic Relations
partido 1 Todos los partidos piden reformas
legales para TV3. La derecha planea agruparse en
un partido. El diputado reiteró que ni él ni UDC,
como partido, han recibido dinero de
Pellerols. partido 2 Pero España puso al
partido intensidad, ritmo y coraje. El
seleccionador cree que el partido de hoy contra
Italia dará la medida de España El Racing no gana
en su campo desde hace seis partidos.
126MeaningExploiting EWN Semantic Relations
partido 1 No negociaremos nunca com un partido
político que sea partidario de la independencia
de Taiwan. Una vez más es noticia la desviación
de fondos destinadoss a la formación ocupacional
hacia la financiación de un partido
político. Estas lleyess fueron votadas gracias a
un consenso general de los partidos
políticos. partido 2 Rivera pide el suporte de
la afición para encarrilar las semifinales. Sólo
el equipo de Valero Ribera puede sentenciar una
semifinal como lo hizo ayer en un Palau Blaugrana
completamente entregado. El Racing ganó los
cuartos de final en su campo.
127Meaning
Arquitecture
Italian Web Corpus
English Web Corpus
WSD
WSD
Italian EWN
English EWN
ACQ
ACQ
UPLOAD
UPLOAD
Multilingual Central Repository
PORT
PORT
PORT
PORT
Basque EWN
Spanish EWN
ACQ
ACQ
UPLOAD
UPLOAD
Basque Web Corpus
Catalan EWN
Spanish Web Corpus
WSD
Catalan Web Corpus
WSD