Title: Describing and Measuring Lexical Resources
1Describing and Measuring Lexical Resources
- Núria Bel
- PAROLE Workshop, July 2009
2Lexical Resources
- ("fiesta" NST
- ALO "fiest"
- CL (PF-AS SF-A)
- GD (F)
- KN MS
- PLC (NF)
- TYN (ABS)
- AUTHOR "juan"
- DATE "28-Aug-99"
- SITE "FB52")
Entry of fiesta borrowed from MT Incyta (Metal
family)
3Lexical resources
- ("abandonar" VST
- ALO "abandon"
- ARGS (((SUBJ N1 (TYPE P1))
- OPT (POBJ N1 (TYPE P0) (N1-PREP
"a")) - GFT (RFX T) (ABS-PAPL ADV/PRED))
- ((SUBJ N1)
- (DOBJ N1)
- (POBJ N1 (TYPE P0) (N1-PREP "a")))
- ((SUBJ N1 (TYPE P0)) (DOBJ N1
(TYPE P1))) - ((SUBJ N1 (TYPE P1)) OPT (DOBJ
N1))) - CL (AR)
- PLC (NF)
- AUTHOR "juan"
- DATE "31-Aug-99"
- SITE "FB52")
Entry of abandon borrowed from MT Incyta (Metal
family)
4Three problems
- Costs of development
- Coverage
- Quality and consistency
51) Costs Manual Development
- Once, one told me that in an average of 10
minutes per entry, and 60.000 entries as average
per operative dictionary 5 p/y
6Coverage? What happens when lexica are used?
- Briscoe and Carroll (1993), half of parse
failures on unseen test data were caused by
missing or inaccurate lexical information. - A terminological database contains 1.400.000
terms, a e-lexica 60.000, why? - Yallop et al. (2005) calculated that in the
100M-word British National Corpus, of a total of
124,120 distinct adjectives, 70,246 occur only
once. - Why dont merge existing dictionaries?
- lack of common encoding,
- no agreement on how to represent linguistic
properties, - lack of trust.
7Quality Consistency
- In the encoding of 2,300 Spanish adjectives by 3
people, the Kappa coefficient of human intercoder
agreement was only of 0.79. - Merlo and Stevenson (2001) made a similar
experiment with more complex verb cases. They
say - Evaluating the experts performance.....
confirm our expectations. First, the task is
difficult, i.e. not performed at 100 or close
even by trained experts, when compared to the
gold standard, with the highest percent agreement
with Levin at 0.85. Second, with respect to
comparison of the experts among themselves, the
rate of agreement is never very high, and the
variability in agreement is considerable, ranging
from 0.53 to 0.66
8Probably a lexicon looks like this
9Agnese, 1544
10And at least, we want it to look like this
11Hase, 1737
12What happened in between?
- There was some normalization, standardization
but, more crucially... changes were mostly
related to the emergence of instruments to
measure a position. - Latitude and longitude measured, by standard, in
degrees and representing angular distances from
the centre of the Earth together with the
specially created instruments compass,
astrolabe, chronometer, were a series of
necessary factors for the stability required to
start accumulating data (Latour, 1987). - Note that these instruments delivered values in
an standard metric, and values were considered
uncontroversial and therefore trusted and
accumulated
13Other components for success
- The magnitude of the enterprise.
-
- Thousands of ships sent around the world, no
2,5 person/years .. - Instruments fostered massive acquisition.
14Other components for success
- 2) Data gathered by the navy with these
instruments was trusted and hence accumulated
without revisions also because - these registers were sent by sailors in as
geographical measures and were later plotted by
experts according to an standard projection and
depending on the use that the data was to have.
15Some complained about loosing information
16(No Transcript)
17Lessons
- Data gathering independent of its representation.
- Data gathering and its projection is done by
objective instruments - Measuring instead of describing? Measuring
instead of introspection?
18Measuring?
- To give a representation of facts, by abstracting
into a set of properties that can be observed
(and replicated). - How to observe lexical properties?
- ALO "fiest"
- CL (PF-AS SF-A)
- GD (F)
- KN MS
- TYN (ABS)
19Another analogy subatomic particles
Physics history is the quest for demonstrating
what entities exists. Researchers need to build
experiments for demonstrating the existence of
invisible particles. Experiments turned out to be
specially built devises instruments that were
intended to identify traces of the existence of
subatomic particles. Emulsions and film
development were able to register tracks, and
hence made the particles observable.
Differences in the traces were used to define
different particles.
20Bubble Chamber, 1950
21LRL Bubble Chamber, 1959
22And observations could be measured
23Again a problem of magnitude
- The lab had to measure a milion pictures per
year. - And quicklyKowarski (1960)
- "Clearly the problem is that of speed, and since
human attention and action introduce a
rock-bottom bottleneck, speed can be achieved
either by pouring in parallel through many
bottlenecks, or by eliminating them altogether.
Either vast armies of slaves armed with templates
and desk calculators or few people operating a
lot of discriminating and thinking machinery. The
evolution is towards the elimination of humans,
function by function (Image and logic, Peter
Louis Galison)
24Already existing experiments can turn out to be
measuring instruments
- Work done in Lexical Acquisition, for instance,
can be our thinking machinery -
- Stevenson and Merlo (2001), Joanis et al.
(2007), Lapata and Keller (2004) show how
observable cues (property traces) can be put in
relation with lexical properties. -
- We have also done some experiments (Bel et al.,
2008) for classifying words (71 precision for
Spanish Mass Nouns, but 92 for Gradual
Adjectives). - These are experiments for predicting properties
but, can they be used for measuring properties?
25Define contexts as cues of linguistic properties
and measure percentages of matched contexts
26Some automatic measures might not be perfect, but
the error is known and systematic!!
Applications using these measures can take the
error into account.
27From this information, standard metrics can be
defined to map to categorical lexical properties
with a known uncertainty and error. Decisions
like prioritize Precision can be adopted for
reducing manual revision, etc.
28Some proposals
- Lexical properties can be registered in a numeric
scale rather than as categories. - The mapping of measures to categories must be
according to standard projections or particular
use cases. - The standard mapping must have a known error that
can be handled by application developers.
29- Thanks to discuss that with me!