Title: Combining Resources: Taxonomy Extraction from Multiple Dictionaries
- Rogelio Nazar Maarten Janssen
- IULA, Universitat Pompeu Fabra, Barcelona
2Information from Dictionaries
- Dictionaries good source for information
- Long tradition of taxonomy extraction
- Calzolari (1977), Amsler (1981), Chodorow et al
(1985), Fox et al. (1988), Alshawi (1989),
Boguraev (1991), Barrière Popowich (1996),
Chang (1998), Renau Battaner (2008) - Exploiting Machine Readable Dictionaries
- Parsing definitional phrases
- Pattern extraction, Shallow parsing
- Full treatment of a single dictionary
3Combining Resources
- There is a lot of information available
- Hand crafted, high-qualify resources
- Combining yields new data
- Taxonomy from multiple dictionaries
- Language-independent shallow method
- Combining definitions of the same word
- Various dictionaries, online versions
- DRAE, DGLE, Clave, DEM
- Frequency Based
4Consolidated Genus Terms
- Dictionaries differ
- Different lexicon and definitions
- Even if only for legal reasons
- Hyperonym should be the same
- A cat is an animal
- Unless there is uncertainty in the hyperonym
- Most dictionaries should use same genus
- Statistically relevant
- 3x
- ablandabrevas
- persona
- 2x
- com.
- inútil
- 1x
- substantivo
- común
- fig.
6Raw HTML input
- Directly from harvested text
- With begin/end tags
- No textual analysis
- More than definitions
- Examples, multiple senses, etc.
- Sense matching impossible
- Entries unsystematic
- Dictionaries do not match in senses
- Minimum number of dictionaries
- Raw frequency count
- Hyperonym tends to be repeated
- Candidates have to be words
- Of the same word-class
- Use of a stop-list
- Dictionary generated
- Words that occur in more than 10 entries
8 deconstrucción (3 dictionaries) teoría 2 1 EWN
0.desconstrucción 0.deconstrucción 1.teoría
filosófica 1.doctrina filosófica 2.filosofía
3.creencia 4.contenido mental 5.conocimiento
5.cognición 6.rasgo psicológico
descubrimiento (5 dictionaries) acción 3 3 cosa 3
5 efecto 2 - EWN 0.descubrimiento 1.logro
1.presentación 1.revelación 2.realización
2.información 2.exposición 3.acción 3.hecho
3.acto de habla 3.comunicación visual 4.acto
4.actividad humana 4.comunicación 5.relación
social 6.relación 7.abstracción cumbia (5
dictionaries) danza 2 - EWN 0.cumbiamba
0.cumbia 1.baile regional 1.danza popular
2.baile social 3.baile 4.recreación
4.diversión 5.actividad 6.acto 6.actividad
humana asta (5 dictionaries) mar 6 - lanza 6
- media 5 - toro 5 - cuerno 5 - bandera 4 - EWN
0.cuerno 0.asta 1.tomadero 1.materia animal
1.cogedero 1.bastón 1.agarradera 1.asimiento
1.asidero 1.asa 2.materia 2.apéndice 2.vara
2.palo 3.porción 3.sustancia 3.parte
3.herramienta 4.utillaje 5.artefacto 6.objeto
físico 6.cosa 6.objeto 6.objeto inanimado
7.competente 7.respirar 7.capaz 7.entidad
9WordNet Verification
- WordNet (still) best available taxonomy
- Not the best resource for evaluation
- Automatic Verification
- 100 Random nouns
- Best 5 hyperonymy candidates
- Match when candidate in chain
- Only about 50 accurracy
10Manual post-verification
11WordNet vs. Dictionary
- WordNet
- Many intermediate/artificial levels
- Compulsory hyperonym
- Contains proper names
- Dictonaries
- More word-senses
- Alternative definitions (synonymy, paraphrasis,
) - Differences
- Different choice of hyperonym
- Different lexicon
12Human post-evalutation
13Effect Dictionaries
14Thank you