Describing and Measuring Lexical Resources - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Describing and Measuring Lexical Resources

Description:

Costs of development. Coverage. Quality and consistency. 1) Costs: Manual ... Emulsions and film development were able to register tracks, and hence made the ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 30
Provided by: nuri6
Category:

less

Transcript and Presenter's Notes

Title: Describing and Measuring Lexical Resources


1
Describing and Measuring Lexical Resources
  • Núria Bel
  • PAROLE Workshop, July 2009

2
Lexical Resources
  • ("fiesta" NST
  • ALO "fiest"
  • CL (PF-AS SF-A)
  • GD (F)
  • KN MS
  • PLC (NF)
  • TYN (ABS)
  • AUTHOR "juan"
  • DATE "28-Aug-99"
  • SITE "FB52")

Entry of fiesta borrowed from MT Incyta (Metal
family)
3
Lexical resources
  • ("abandonar" VST
  • ALO "abandon"
  • ARGS (((SUBJ N1 (TYPE P1))
  • OPT (POBJ N1 (TYPE P0) (N1-PREP
    "a"))
  • GFT (RFX T) (ABS-PAPL ADV/PRED))
  • ((SUBJ N1)
  • (DOBJ N1)
  • (POBJ N1 (TYPE P0) (N1-PREP "a")))
  • ((SUBJ N1 (TYPE P0)) (DOBJ N1
    (TYPE P1)))
  • ((SUBJ N1 (TYPE P1)) OPT (DOBJ
    N1)))
  • CL (AR)
  • PLC (NF)
  • AUTHOR "juan"
  • DATE "31-Aug-99"
  • SITE "FB52")

Entry of abandon borrowed from MT Incyta (Metal
family)
4
Three problems
  • Costs of development
  • Coverage
  • Quality and consistency

5
1) Costs Manual Development
  • Once, one told me that in an average of 10
    minutes per entry, and 60.000 entries as average
    per operative dictionary 5 p/y

6
Coverage? What happens when lexica are used?
  • Briscoe and Carroll (1993), half of parse
    failures on unseen test data were caused by
    missing or inaccurate lexical information.
  • A terminological database contains 1.400.000
    terms, a e-lexica 60.000, why?
  • Yallop et al. (2005) calculated that in the
    100M-word British National Corpus, of a total of
    124,120 distinct adjectives, 70,246 occur only
    once.
  • Why dont merge existing dictionaries?
  • lack of common encoding,
  • no agreement on how to represent linguistic
    properties,
  • lack of trust.

7
Quality Consistency
  • In the encoding of 2,300 Spanish adjectives by 3
    people, the Kappa coefficient of human intercoder
    agreement was only of 0.79.
  • Merlo and Stevenson (2001) made a similar
    experiment with more complex verb cases. They
    say
  • Evaluating the experts performance.....
    confirm our expectations. First, the task is
    difficult, i.e. not performed at 100 or close
    even by trained experts, when compared to the
    gold standard, with the highest percent agreement
    with Levin at 0.85. Second, with respect to
    comparison of the experts among themselves, the
    rate of agreement is never very high, and the
    variability in agreement is considerable, ranging
    from 0.53 to 0.66

8
Probably a lexicon looks like this
9
Agnese, 1544
10
And at least, we want it to look like this
11
Hase, 1737
12
What happened in between?
  • There was some normalization, standardization
    but, more crucially... changes were mostly
    related to the emergence of instruments to
    measure a position.
  • Latitude and longitude measured, by standard, in
    degrees and representing angular distances from
    the centre of the Earth together with the
    specially created instruments compass,
    astrolabe, chronometer, were a series of
    necessary factors for the stability required to
    start accumulating data (Latour, 1987).
  • Note that these instruments delivered values in
    an standard metric, and values were considered
    uncontroversial and therefore trusted and
    accumulated

13
Other components for success
  • The magnitude of the enterprise.
  • Thousands of ships sent around the world, no
    2,5 person/years ..
  • Instruments fostered massive acquisition.

14
Other components for success
  • 2) Data gathered by the navy with these
    instruments was trusted and hence accumulated
    without revisions also because
  • these registers were sent by sailors in as
    geographical measures and were later plotted by
    experts according to an standard projection and
    depending on the use that the data was to have.

15
Some complained about loosing information
16
(No Transcript)
17
Lessons
  • Data gathering independent of its representation.
  • Data gathering and its projection is done by
    objective instruments
  • Measuring instead of describing? Measuring
    instead of introspection?

18
Measuring?
  • To give a representation of facts, by abstracting
    into a set of properties that can be observed
    (and replicated).
  • How to observe lexical properties?
  • ALO "fiest"
  • CL (PF-AS SF-A)
  • GD (F)
  • KN MS
  • TYN (ABS)

19
Another analogy subatomic particles
Physics history is the quest for demonstrating
what entities exists. Researchers need to build
experiments for demonstrating the existence of
invisible particles. Experiments turned out to be
specially built devises instruments that were
intended to identify traces of the existence of
subatomic particles. Emulsions and film
development were able to register tracks, and
hence made the particles observable.
Differences in the traces were used to define
different particles.
20
Bubble Chamber, 1950
21
LRL Bubble Chamber, 1959
22
And observations could be measured
23
Again a problem of magnitude
  • The lab had to measure a milion pictures per
    year.
  • And quicklyKowarski (1960)
  • "Clearly the problem is that of speed, and since
    human attention and action introduce a
    rock-bottom bottleneck, speed can be achieved
    either by pouring in parallel through many
    bottlenecks, or by eliminating them altogether.
    Either vast armies of slaves armed with templates
    and desk calculators or few people operating a
    lot of discriminating and thinking machinery. The
    evolution is towards the elimination of humans,
    function by function (Image and logic, Peter
    Louis Galison)

24
Already existing experiments can turn out to be
measuring instruments
  • Work done in Lexical Acquisition, for instance,
    can be our thinking machinery
  • Stevenson and Merlo (2001), Joanis et al.
    (2007), Lapata and Keller (2004) show how
    observable cues (property traces) can be put in
    relation with lexical properties.
  • We have also done some experiments (Bel et al.,
    2008) for classifying words (71 precision for
    Spanish Mass Nouns, but 92 for Gradual
    Adjectives).
  • These are experiments for predicting properties
    but, can they be used for measuring properties?

25
Define contexts as cues of linguistic properties
and measure percentages of matched contexts
26
Some automatic measures might not be perfect, but
the error is known and systematic!!
Applications using these measures can take the
error into account.
27
From this information, standard metrics can be
defined to map to categorical lexical properties
with a known uncertainty and error. Decisions
like prioritize Precision can be adopted for
reducing manual revision, etc.
28
Some proposals
  • Lexical properties can be registered in a numeric
    scale rather than as categories.
  • The mapping of measures to categories must be
    according to standard projections or particular
    use cases.
  • The standard mapping must have a known error that
    can be handled by application developers.

29
  • Thanks to discuss that with me!
Write a Comment
User Comments (0)
About PowerShow.com