Describing and Measuring Lexical Resources - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Describing and Measuring Lexical Resources

Description:

Costs of development. Coverage. Quality and consistency. 1) Costs: Manual ... Emulsions and film development were able to register tracks, and hence made the ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 30

Provided by: nuri6

Category:

more less

Transcript and Presenter's Notes

Title: Describing and Measuring Lexical Resources

1
Describing and Measuring Lexical Resources

Núria Bel
PAROLE Workshop, July 2009

2
Lexical Resources

("fiesta" NST
ALO "fiest"
CL (PF-AS SF-A)
GD (F)
KN MS
PLC (NF)
TYN (ABS)
AUTHOR "juan"
DATE "28-Aug-99"
SITE "FB52")

Entry of fiesta borrowed from MT Incyta (Metal
family)
3
Lexical resources

("abandonar" VST
ALO "abandon"
ARGS (((SUBJ N1 (TYPE P1))
OPT (POBJ N1 (TYPE P0) (N1-PREP
"a"))
GFT (RFX T) (ABS-PAPL ADV/PRED))
((SUBJ N1)
(DOBJ N1)
(POBJ N1 (TYPE P0) (N1-PREP "a")))
((SUBJ N1 (TYPE P0)) (DOBJ N1
(TYPE P1)))
((SUBJ N1 (TYPE P1)) OPT (DOBJ
N1)))
CL (AR)
PLC (NF)
AUTHOR "juan"
DATE "31-Aug-99"
SITE "FB52")

Entry of abandon borrowed from MT Incyta (Metal
family)
4
Three problems

Costs of development
Coverage
Quality and consistency

5
1) Costs Manual Development

Once, one told me that in an average of 10
minutes per entry, and 60.000 entries as average
per operative dictionary 5 p/y

6
Coverage? What happens when lexica are used?

Briscoe and Carroll (1993), half of parse
failures on unseen test data were caused by
missing or inaccurate lexical information.
A terminological database contains 1.400.000
terms, a e-lexica 60.000, why?
Yallop et al. (2005) calculated that in the
100M-word British National Corpus, of a total of
124,120 distinct adjectives, 70,246 occur only
once.
Why dont merge existing dictionaries?
lack of common encoding,
no agreement on how to represent linguistic
properties,
lack of trust.

7
Quality Consistency

In the encoding of 2,300 Spanish adjectives by 3
people, the Kappa coefficient of human intercoder
agreement was only of 0.79.
Merlo and Stevenson (2001) made a similar
experiment with more complex verb cases. They
say
Evaluating the experts performance.....
confirm our expectations. First, the task is
difficult, i.e. not performed at 100 or close
even by trained experts, when compared to the
gold standard, with the highest percent agreement
with Levin at 0.85. Second, with respect to
comparison of the experts among themselves, the
rate of agreement is never very high, and the
variability in agreement is considerable, ranging
from 0.53 to 0.66

8
Probably a lexicon looks like this
9
Agnese, 1544
10
And at least, we want it to look like this
11
Hase, 1737
12
What happened in between?

There was some normalization, standardization
but, more crucially... changes were mostly
related to the emergence of instruments to
measure a position.
Latitude and longitude measured, by standard, in
degrees and representing angular distances from
the centre of the Earth together with the
specially created instruments compass,
astrolabe, chronometer, were a series of
necessary factors for the stability required to
start accumulating data (Latour, 1987).
Note that these instruments delivered values in
an standard metric, and values were considered
uncontroversial and therefore trusted and
accumulated

13
Other components for success

The magnitude of the enterprise.
Thousands of ships sent around the world, no
2,5 person/years ..
Instruments fostered massive acquisition.

14
Other components for success

2) Data gathered by the navy with these
instruments was trusted and hence accumulated
without revisions also because
these registers were sent by sailors in as
geographical measures and were later plotted by
experts according to an standard projection and
depending on the use that the data was to have.

15
Some complained about loosing information
16
(No Transcript)
17
Lessons

Data gathering independent of its representation.
Data gathering and its projection is done by
objective instruments
Measuring instead of describing? Measuring
instead of introspection?

18
Measuring?

To give a representation of facts, by abstracting
into a set of properties that can be observed
(and replicated).
How to observe lexical properties?
ALO "fiest"
CL (PF-AS SF-A)
GD (F)
KN MS
TYN (ABS)

19
Another analogy subatomic particles
Physics history is the quest for demonstrating
what entities exists. Researchers need to build
experiments for demonstrating the existence of
invisible particles. Experiments turned out to be
specially built devises instruments that were
intended to identify traces of the existence of
subatomic particles. Emulsions and film
development were able to register tracks, and
hence made the particles observable.
Differences in the traces were used to define
different particles.
20
Bubble Chamber, 1950
21
LRL Bubble Chamber, 1959
22
And observations could be measured
23
Again a problem of magnitude

The lab had to measure a milion pictures per
year.
And quicklyKowarski (1960)
"Clearly the problem is that of speed, and since
human attention and action introduce a
rock-bottom bottleneck, speed can be achieved
either by pouring in parallel through many
bottlenecks, or by eliminating them altogether.
Either vast armies of slaves armed with templates
and desk calculators or few people operating a
lot of discriminating and thinking machinery. The
evolution is towards the elimination of humans,
function by function (Image and logic, Peter
Louis Galison)

24
Already existing experiments can turn out to be
measuring instruments

Work done in Lexical Acquisition, for instance,
can be our thinking machinery
Stevenson and Merlo (2001), Joanis et al.
(2007), Lapata and Keller (2004) show how
observable cues (property traces) can be put in
relation with lexical properties.
We have also done some experiments (Bel et al.,
2008) for classifying words (71 precision for
Spanish Mass Nouns, but 92 for Gradual
Adjectives).
These are experiments for predicting properties
but, can they be used for measuring properties?

25
Define contexts as cues of linguistic properties
and measure percentages of matched contexts
26
Some automatic measures might not be perfect, but
the error is known and systematic!!
Applications using these measures can take the
error into account.
27
From this information, standard metrics can be
defined to map to categorical lexical properties
with a known uncertainty and error. Decisions
like prioritize Precision can be adopted for
reducing manual revision, etc.
28
Some proposals

Lexical properties can be registered in a numeric
scale rather than as categories.
The mapping of measures to categories must be
according to standard projections or particular
use cases.
The standard mapping must have a known error that
can be handled by application developers.