Title: Sin t
1Departament de Llenguatges i Sistemes
Informàtics Universitat Politècnica de Catalunya
Automatic Assignment of Domain Labels to WordNet
Mauro Castillo V. Francis Real V. German Rigau C.
GWC 2004
2Outline
- Introduction
- WordNet
- WN Domains
- Experimentation
- Evaluation and results
- Discussion
- Conclusions
3Introduction
- To semantically enrich any WN version with the
semantic domain labels of MultiWordNet Domains - WN is an standard resource for semantic
processing - Effectiveness of Word Domain Disambiguation
- The work presented explores the automatic and
sistematic assignment of domain labels to glosses - Proposed Method can be used to correct and verify
the suggested labeling
4WordNet
- The version WN1.6 was used because of the
availability of WN Domains
5WN Domains
TOP
WordNet Domain hierarchy developed at IRST
(Magnini and Cavagliá, 2000)
6WN Domains
- The synsets have been annotated semiautomatically
with one or more labels - Most of synsets it has single a label
Distribution of domain labels for synset
noun 1.170 verb 1.078 adj 1.076 adv 1.033
Average labels for synset
7WN Domains
- A domain may include synsets of different
syntactic categories e.g. MEDICINE - doctor1 (n)
- operar7 (v)
- medical1 (a)
- clinically1 (r)
- A domain label may also contain senses from
different Wn subhierarchies. e.g. SPORT - athleta1 ? life-form1
- game-equipment1 ? physical-object1
- sport1 ? act2
- playing-field1 ? location1
8WN Domains
- Synsets that have more than one label, do not
seem to follow any pattern
- sultanan1 (pale yellow seedless grape used for
raisins and wine)
Botany Gastronomy
- moroccon2 (a soft pebble-grained leather made
from goatskin used for shoes and book bindings
etc.)
Anatomy Zoology
- canicola_fevern1(an acute feverish disease in
people and in dogs marked by gastroenteritis and
mild jaundice)
Medicine Physiology Zoology
- bluen1, bluenessn1 (the color of the clear
sky in the daytime "he had eyes of bright blue")
Color Quality
9WN Domains
- FACTOTUM Used to mark the senses of WN that do
not have a specific domain - STOP Senses The synsets that appear frequently
in different contexts, for instance numbers,
colours, etc.
10Experimentation
- Process to automatically assign domain labels to
WN1.6 glosses - Validation procedures of the consistency of the
domains assignment in WN1.6, and especially, the
automatic assignment of the factotum labels
Distribution of synset with and without the
domain label factotum in WN1.6
11Experimentación
Test set was randomly selected (around 1) and
the other synsets were used as a training set
Corpus test for nouns and verbs
12Experimentation
castlen4, castlingn1 CHESS SPORT
castle castling interchanging the positions of
the king and a rook
castle chess castle sport castling chess castli
ng sport interchanging chess interchanging sport
interchanging chess interchanging sport intercha
nging chess interchanging sport king chess king
sport rook chess rook sport
Calculation of frequency
13Experimentation
Measures
M1 Square root formula
M2 Association Ratio
Ar(w,D) Pr(wD)log2(Pr(wD) / Pr(w))
M3 Logarithm formula
log2(Nc(w,D) / c(w)c(D))
14Experimentation
TRAINING
MATRIX OF WEIGHTS
CALCULATION
VALIDATION
15Experimentation
POSITION 1 person 30.23 POSITION 2 politics
13.40 POSITION 3 law 11.08 ... ...
VD ? weigth(wi,dj)percentage
16Evaluation y Results nouns
AP Accuracy first label AT Accuracy all
labels P Precision R Recall F1 2PR/(PR)
MiA Measures the success of each formula (M1,
M2 or M3) when the first proposed label is
correct MiD Measures the success of each
formula (M1, M2 or M3) when the first proposed
label is correct (or subsumed as correct one in
the domain hierarchy).
Results for nouns with factotum CF
Results for nouns without factotum SF
17Evaluation y Results verbs
AP Accuracy first label AT Accuracy all
labels P Precision R Recall F1 2PR/(PR)
MiA Measures the success of each formula (M1,
M2 or M3) when the first proposed label is
correct MiD Measures the success of each
formula (M1, M2 or M3) when the first proposed
label is correct (or subsumed as correct one in
the domain hierarchy).
Results for verbs with factotum CF
Results for verbs without factotum SF
18Evaluation y Results
- On average, the method assigns
- Noun 1.23 domains labels (1.170)
- Verb 1.20 domains labels (1.078)
- We obtain better results with nouns
- The best average results were obtained with the
M1 measure - The first proposed label (noun) 70 accuracy
- The results of verbs are worse than nouns, one of
the reasons may be the high number of verbal
synsets labels with factotum domain
19Discussion
Monosemic words
credit applicationn1 (an application for a line
of credit)
Domains SCHOOL
Proposal 1. Banking
Proposal 2. Economy
Banking
economy
banking
20Discussion
Relation between labels
Academic_programn1 (a program of education in
liberal arts and sciences (usually in preparation
for higher education))
Domains PEDAGOGY
Proposal 1. School
Proposal 2. University
pedagogy
school
university
21Discussion
Relation between labels
shoppingn1 (searching for or buying goods or
services "went shopping for a reliable plumber"
"does her shopping at the mall rather than down
town")
Domains ECONOMY
Proposal 1. Commerce
social_science
commerce
economy
22Discussion
Relation between labels
Fire_control_radarn1 (radar that controls the
delivery of fire on a military target)
Domains MERCHANT_NAVY
Proposal 1. Military
social_science
transport
military
merchant_navy
23Discussion
Uncertain cases
birthmarkn1 (a blemish on the skin formed
before birth)
Domains QUALITY
Proposal 1. Medicine
bardolatryn1 (idolization of William
Shakespeare)
Domains RELIGION
Proposal 1. History
Proposal 1. Literature
24Conclusions
- The procedure to assign automatically domain
labels to WN gloss seems to be dificult - The proposal process is very reliable with the
first proposal labels - The proposal labels are ordered by priority
- It is posible to add new correct labels or
validate the old ones
25Departament de Llenguatges i Sistemes
Informàtics Universitat Politècnica de Catalunya
Automatic Assignment of Domain Labels to WordNet
Mauro Castillo V. Francis Real V. German Rigau C.
GWC 2004