EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE

About This Presentation
Title:

EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE

Description:

... living room; the car ... the vehicle. Massimo ... segmentation: approximated with loose' windows ... The WINDOW of the CAR. The CAR's WINDOW. The CAR WINDOW ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 77
Provided by: Massimo46

less

Transcript and Presenter's Notes

Title: EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE


1
EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE
  • Massimo PoesioUniversità di Trento and
    University of Essex

Vilem Mathesius Lectures Praha, 2007
2
Plan of the series
  • Wednesday Annotating context dependence, and
    particularly anaphora
  • Yesterday Using anaphorically annotated corpora
    to investigate local global salience
  • Today Using anaphorically annotated corpora to
    investigate anaphora resolution

3
Todays lecture
  • The Vieira / Poesio work on robust definite
    description resolution
  • Bridging references
  • Discourse-new
  • (If time allows)Task-oriented evaluation

4
Outline/Punchline
  • Starting point the robust system for resolving
    definite descriptions in (Poesio and Vieira,
    1998, 2000)
  • Using heuristics and accessing WordNet
  • Large-scale evaluation
  • Weaker aspect of the system bridging
    descriptions
  • Difficult to annotate reliably
  • Main problem lack of lexical and commonsense
    knowledge
  • Need to keep track of salient entities
    (focusing?)

5
Subsequent work
  • We have been working to improve the performance
    of the system developing
  • Better annotation methods (MATE, GNOME)
  • Unsupervised methods for lexical acquisition
  • Basic vector-space methods for synonymy (Poesio,
    Schulte im Walde, and Brew, 1998)
  • Syntactic constructions for meronymy (Poesio et
    al, LREC 2002)
  • Focus-tracking methods (Poesio et al, 2000, to
    appear)

6
Background the Vieira / Poesio algorithm
7
Preliminary corpus study (Poesio and Vieira, 1998)
Massimo Poesio Add better examples (e.g., from
The book of evidence)
Annotators asked to classify about 1,000 definite
descriptions from the ACL/DCI corpus (Wall Street
Journal texts) into three classes
  • DIRECT ANAPHORA a house the house
  • DISCOURSE-NEW the belief that ginseng tastes
    like spinach is more widespread than one would
    expect
  • BRIDGING DESCRIPTIONSthe flat the living
    room the car the vehicle

8
Poesio and Vieira, 1998
  • Results
  • More than half of the def descriptions are
    first-mention
  • Subjects didnt always agree on the
    classification of an antecedent (bridging
    descriptions 8)

9
The Vieira / Poesio system for robust definite
description resolution
  • Follows a SHALLOW PROCESSING approach (Carter,
    1987 Mitkov, 1998) it only uses
  • Structural information (extracted from Penn
    Treebank)
  • Existing lexical sources (WordNet)
  • (Very little) hand-coded information

(Vieira Poesio, 1996 / Vieira, 1998 /
Vieira Poesio, 2001)
10
Methods for resolving direct anaphors
  • DIRECT ANAPHORA
  • the red car, the car, the blue car
    premodification heuristics
  • segmentation approximated with loose windows

11
Methods for resolving discourse-new definite
descriptions
  • DISCOURSE-NEW DEFINITES
  • the first man on the Moon, the fact that Ginseng
    tastes of spinach a list of the most common
    functional predicates (fact, result, belief) and
    modifiers (first, last, only )
  • heuristics based on structural information (e.g.,
    establishing relative clauses)

12
A knowledge-based classification of bridging
descriptions (Vieira, 1998)
  • Based on LEXICAL RELATIONS such as synonymy,
    hyponymy, and meronimy, available from a lexical
    resource such as WordNetthe flat the living
    room
  • The antecedent is introduced by a PROPER
    NAMEBach the composer
  • The anchor is a NOMINAL MODIFIER introduced as
    part of the description of a discourse
    entityselling discount packages the discounts

13
continued (cases NOT attempted by our system)
  • The anchor is introduced by a VPKadane oil is
    currently drilling two oil wells. The activity
  • The anchor is not explicitly mentioned in the
    text, but is a discourse topicthe industry (in
    a text about oil companies)
  • The resolution depends on more general
    commonsense knowledgelast weeks earthquake
    the suffering people

14
Distribution of bridging descriptions
Class Total Percentage
Syn/Hyp/Mer 12/14/12 19
Names 49 24
Compound Nouns 25 12
Events 40 20
Discourse Topic 15 7
Inference 37 18
Total 204 100
15
The (hand-coded) decision tree
  1. Apply safe discourse-new recognition heuristics
  2. Attempt to resolve as same-head anaphora
  3. Attempt to classify as discourse new
  4. Attempt to resolve as bridging description.
    Search backward 1 sentence at a time and apply
    heuristics in the following order
  5. Named entity recognition heuristics R.66,
    P.95
  6. Heuristics for identifying compound nouns acting
    as anchors R.36
  7. Access WordNet R, P about .28

16
Overall Results
  • Evaluated on a test corpus of 464 definite
    descriptions
  • Overall results

R P F
Version 1 53 76 62
Version 2 57 70 62
D-N def 77 77 77
17
Overall Results
  • Results for each type of definite description

R P F
Direct anaphora 62 83 71
Disc new 69 72 70
Bridging 29 38 32.9
18
Questions raised by the Vieira / Poesio work
  • Do these results hold for larger datasets?
  • Do discourse-new detectors help?
  • Bridging
  • How to define the phenomenon?
  • Where to get the information?
  • How to combine salience with lexical
    commonsense knowledge?
  • Can such a system be helpful for applications?

19
Direction 2 bridging
  • Check out ACL 04 presentation

20
Mereological bridging references
Cartonnier (Filing Cabinet) with Clock This piece
of mid-eighteenth-century furniture was meant to
be used like a modern filing cabinet papers were
placed in leather-fronted cardboard boxes (now
missing) that were fitted into the openshelves.
A large table decorated in the same manner would
have been placed in front for working with those
papers. Access to the cartonnier's lower half
can only be gained by the doors at the sides,
because the table would have blocked the front.
21
PREVIOUS RESULTS
  • A series of experiments using the Poesio / Vieira
    dataset, containing 204 bridging references,
    including 39 WordNet bridges
  • (Vieira and Poesio, 2000, but also Carter 1985,
    Hobbs - a number of papers-, etc) need lexical
    knowledge
  • But even large lexical resources such as WordNet
    not enough, particularly for mereological
    references (Poesio et al, 1997 Vieira and
    Poesio, 2000 Poesio, 2003 Garcia-Almanza, 2003)
  • Partial solution use lexical acquisition (HAL,
    Hearst-style construction method). Best results
    (for mereology) construction-style

22
FINDING MERONYMICAL RELATIONS USING SYNTACTIC
INFORMATION
  • Some syntactic constructions suggest semantic
    relations
  • (Cfr. Hearst 1992, 1998 for hyponyms)
  • Ishikawa 1998, Poesio et al 2002 use syntactic
    constructions to extract mereological information
    from corpora
  • The WINDOW of the CAR
  • The CARs WINDOW
  • The CAR WINDOW
  • See also Berland Charniak 1999, Girju et al
    2002

23
LEXICAL RESOURCES FOR BRIDGING A SUMMARY
Class Syn Hyp Mer Total WN
Total 12 14 12 38
WordNet 4 (33.3) 8 (57.1) 3 (33.3) 15 (39)
HAL 4 (33.3) 2 (14.3) 2 (16.7) 8 (22.2)
Constructions 1 (8.3) 0 8 (66.7) 9 (23.7)
(All using the Vieira / Poesio dataset.)
24
FOCUSING AND MEREOLOGICAL BRIDGES
Cartonnier (Filing Cabinet) with Clock This piece
of mid-eighteenth-century furniture was meant to
be used like a modern filing cabinet papers were
placed in leather-fronted cardboard boxes (now
missing) that were fitted into the openshelves.
A large table decorated in the same manner would
have been placed in front for working with those
papers. Access to the cartonnier's lower half
can only be gained by the doors at the sides,
because the table would have blocked the front.
(See Sidner, 1979 Markert et al, 1995.)
25
FOCUS (CB) TRACKING GOOGLE SEARCH (POESIO, 2003)
  • Analyzed 169 associative BDs in GNOME corpus (58
    mereology)
  • Correlation between distance and focusing (Poesio
    et al, 2004) and choice of anchor
  • 77.5 anchor same or previous sentence 95.8 in
    last five sentences
  • CB(U-1) anchor for only 33.6 of BDs,
  • but 89 of anchors had been CB or CP
  • Using Google distance to choose among salient
    anchor candidates

26
FINDING MEREOLOGICAL RELATIONS USING GOOGLE
  • Lexical vicinity measure (for MERONYMS) between
    NBD and NPA
  • Search in Google for the NBD of the NPA (cfr.
    Ishikawa, 1998 Poesio et al, 2002)
  • E.g., the drawer of the cabinet
  • Choose as anchor the PA whose NPA results in the
    greater number of hits
  • Preliminary results for associative BDs around
    70 P/R (by hand)
  • See also Markert et al, 2003, 2005 Modjeska et
    al, 2003

27
NEW EXPERIMENTS (Poesio et al, 2004)
  • Using the GNOME corpus
  • 58 mereological bridging refs realized by the-nps
  • 153 mereological bridging references in total
  • Reliably annotated
  • Completely automatic feature extraction
  • Google WordNet for lexical distance
  • Using (an approximation of) salience
  • Using machine learning to combine the features

28
More (and reliably annotated) data the GNOME
corpus
  • Texts from 3 genres (museum descriptions,
    pharmaceutical leaflets, tutorial dialogues)
  • Reliably annotated syntactic, semantic and
    discourse information
  • grammatical function, agreement features
  • anaphoric relations
  • uniqueness, ontological information, animacy,
    genericity,
  • Reliable annotation of bridging references
  • http//cswww.essex.ac.uk/Research/NLE/corpora/GNOM
    E

29
METHODS
  • Salience features
  • Utterance distance
  • First mention
  • Global first mention (approximate CB)
  • Lexical distance
  • WordNet (using a pure hypernym-based search
    strategy)
  • Google
  • Tried both separately and together
  • Statistical classifiers MLP, Naïve Bayes
  • (MatLab / Weka ML Library)

30
Lexical Distance 1 (WordNet)
  • Computing WordNet Distance
  • Get the head noun of the anaphor and find all the
    (noun) senses for the head noun.
  • Get all the noun senses for the head noun of the
    potential antecedent under consideration.
  • Retrieve the hypernym trees from WordNet for each
    sense of anaphor and the antecedent.
  • Traverse each unique path in these trees and find
    a common parent for the anaphor and the
    antecedent count the no. of nodes they are
    apart.
  • Select the least distance path across all
    combinations.
  • If no common parent is found, assign an
    hypothetical distance (30).

31
Lexical Distance, 1 WordNet
32
Lexical Distance 2 (Google)
  • As in (Poesio, 2003)
  • But use Google API to access the Google search
    engine
  • Computing Google hits
  • Get the head noun for BR and potential
    candidate.
  • Check whether the potential candidate is a mass
    or count noun.
  • If count, build the query as the body of the
    person and search for the pattern.
  • Retrieve the no. of Google hits

33
WN vs GOOGLE
Description Results
No path in WordNet 503/1720
No path in WordNet between BD and correct anchor 10/58
Anchor with Min WN Distance correct 8/58
Zero Google Hits 1089/1720
Zero Google Hits for correct anchor 24/58
Max Google Hits identify correct candidate 8/58
34
A FEW ML ISSUES
  • Linear Scaling applied to WN distance and Google
    hits so that all values within same range.
  • When the entire training data set was used the
    results were poor due to heavy imbalance in the
    data
  • 58 in class A (4), 1350 in class B
  • Henceforth used 11 and 13 training data set
  • Tested first with the 58 the-nps, then with all
    153 mereological references

35
BASELINES
BASELINE ACCURACY
Random choice (previous 5) 4
Random choice (previous) 19
Random choice among FM 21.3
Min Google Distance 13.8
Min WN Distance 13.8
FM entity in previous sentence 31
Min Google in previous sentence 17.2
Min WN in previous sentence 25.9
Min Google among FM 12
Min WN among FM 24.1
36
RESULTS (58 THE-NPs, 5050)
WN DISTANCE GOOGLE DISTANCE
MatLab NN, self-tuned 92 (79.3) 89 (76.7)
Weka NN Algorithm 91 (78.4) 86 (74.1)
Weka Naïve Bayes 88 (75.9) 85 (73.3)
Prec Recall F
WN distance 75.4 84.5 79.6
Google distance 70.6 86.2 77.6
37
MORE RESULTS
13 dataset
Accuracy F
WN distance 80.6 55.7
Google distance 82 56.7
all 153 mereological BRs
Accuracy F
WN distance 224 (74.2) 76.3
Google distance 230 (75.2) 75.8
38
MEREOLOGICAL BDS REALIZED WITH BARE-NPS
The combination of rare and expensive materials
used on this cabinet indicates that it was a
particularly expensive commission. The four
Japanese lacquer panels date from the mid- to
late 1600s and were created with a technique
known as kijimaki-e. For this type of lacquer,
artisans sanded plain wood to heighten its strong
grain and used it as the background of each
panel. They then added the scenic elements of
landscape, plants, and animals in raised lacquer.
Although this technique was common in Japan, such
large panels were rarely incorporated into French
eighteenth-century furniture. Heavy Ionic
pilasters, whose copper-filled flutes give an
added rich color and contrast to the gilt-bronze
mounts, flank the panels. Yellow jasper, a
semiprecious stone, rather than the usual marble,
forms the top.
39
HARDER TEST
Using classifiers trained on balanced /slightly
unbalanced data (the-nps) on unbalanced ones
(10-fold cross validation)
Distance Balance Accuracy on balanced F on bal Accuracy on unbal F on unbal
WN 1113 70.275.9 .7.4 80.291.7 .20
Google 1113 64.479.8 .7.5 63.688.4 .1.3
WN Google 1113 66.377.9 .6.4 65.392.5 .2.5
40
DISCUSSION
  • Previous results
  • Construction-based techniques provide adequate
    lexical resources, particularly when using Web as
    corpus
  • But need to combine lexical knowledge and
    salience modeling
  • This work
  • Combining (simple) salience with lexical
    resources results in significant improvements
  • Future work
  • Larger dataset
  • Better approximation of focusing

41
Back to discourse-new detection
  • The GUITAR system
  • Recent results

42
GUITAR (Kabadjov, to appear)
  • A robust, usable anaphora resolution system
    designed to work as part of an XML pipeline
  • Incorporates
  • Pronouns the Mitkov algorithm
  • Definite descriptions the Vieira / Poesio
    algorithm
  • Proper nouns the Bontcheva alg.
  • Several versions
  • Version 1 (Poesio Kabadjov, 2004) direct
    anaphora
  • Version 2 DN detection
  • Version 3 proper name resolution
  • Freely available from http//privatewww.essex.ac.
    uk/malexa/GuiTAR/

43
DISCOURSE-NEW DEFINITE DESCRIPTIONS
(1) Toni Johnson pulls a tape measure across the
front of what was once a stately Victorian
home. (2) The Federal Communications Commission
allowed American Telephone Telegraph Co.
to continue offering discount phone services
for large-business customers and said it
would soon re-examine its regulation of
the long-distance market.
Poesio and Vieira (1998) about 66 of definite
descriptions in their texts (WSJ) are
discourse-new
44
WOULD DNEW RECOGNITION HELP?
First version of GUITAR without DN detection on
subset of DDs in GNOME corpus - 574 DDs, of which
- 184 anaphoric (32)- 390 discourse-new (67.9)
Total Sys Ana Corr NM WM SM R P F
574(184) 198 457(119) 38 27 5226.3 79.6(60.1) 79.6(64.7) 79.6(62.3)
45
SPURIOUS MATCHES
  • If your doctor has told you in detail HOW MUCH to
    use and HOW OFTEN then keep to this advice.
  • ..
  • If you are not sure then follow the advice on
    the back of this leaflet.

46
GOALS OF THE WORK
  • Vieira and Poesios (2000) system incorporated
    DISCOURSE-NEW DD DETECTORS (P69, R72, F70.5)
  • Two subsequent strands of work
  • Bean and Riloff (1999), Uryupina (2003) developed
    improved detectors (e.g., Uryupina F86.9)
  • Ng and Cardie (2002) questioned whether such
    detectors improve results
  • Our project systematic investigation of whether
    DN detectors actually help
  • ACL 04 ref res features, preliminary results
  • THIS WORK results of further experiments

47
DN CLASSIFIERTHE UPPER BOUND
  • Current number of SMs 52/198 (26.3)
  • If SM 0, PRF overall 509/574 88.7
  • (PRF on anaphora only 119/146 81.5)

48
VIEIRA AND POESIOS DN DETECTORS
Recognize SEMANTICALLY FUNCTIONAL descriptions
SPECIAL PREDICATES / PREDICATE MODIFIERS
(HAND-CODED) the front of what was once a
stately Victorian home the best chance of
saving the youngest children PROPER
NAMES. the Federal Communications
Commission
LARGER SITUATION descriptions (HAND-CODED)
the City, the sun, .
49
VIEIRA AND POESIOS DN DETECTORS, II
PREDICATIVE descriptions COPULAR CLAUSES
he is the hardworking son of a Church of
Scotland minister .
APPOSITIONS. Peter Kenyon, the Chelsea
chief executive
Descriptions ESTABLISHED by modification
The warlords and private militias who were once
regarded as the Wests staunchest allies are
now a greater threat to the countrys
security than the Taliban . (Guardian,
July 13th 2004, p.10)
50
VIEIRA AND POESIOS DECISION TREES
Tried both hand-coded and ML
Hand-coded decision tree
1. Try the DN detectors with highest accuracy
(attempt to classify as functional using special
predicates, and as predicative by looking for
apposition)
2. Attempt to resolve the DD as direct anaphora
3. Try other DN detectors in order proper name,
establishing clauses, proper name
modification .
ML DT swap 1. and 2.
51
VIEIRA AND POESIOS RESULTS
P R F
Baseline 50.8 100 67.4
DN detection 69 72 70
Hand-coded DT(partial) 62 85 71.7
Hand-coded DT(total) 77 77 77
ID3 75 75 75
52
BEAN AND RILOFF (1999)
Developed a system for identifying DN definites
Adopted syntactic heuristics from Vieira and
Poesio, and developed several new techniques
SENTENCE-ONE (S1) EXTRACTION identify as
discourse-new every description found in
first sentence of a text.
DEFINITE PROBABILITY create a list of nominal
groups encountered at least 5 times with
definite article, but never with indefinite
VACCINES block heuristics when prob. too low.
53
BEAN AND RILOFFS ALGORITHM
1. If the head noun appeared earlier, classify as
anaphoric
2. If DD occurs in S1 list, classify as DN unless
vaccine
3. Classify DD as DN if one of the following
applies (a) high definite probability
(b) matches a EHP pattern (c) matches one of
the syntactic heuristics
4. Classify as anaphoric
54
BEAN AND RILOFFS RESULTS
P R
Baseline 100 72.2
Syn heuristics 43 93.1
Syn Heuristics S1EHPDO 66.360.769.2 84.387.383.9
Syn Heuristics S1 EHP DO 81.7 82.2
Syn Heuristics S1 EHP DO V 79.1 84.5
55
NG AND CARDIE (2002)
  • Directly investigate question of whether
    discourse-new detectors improves performance of
    anaphora resolution system
  • Dealing with ALL types of anaphoric expressions

56
NG AND CARDIES FEATURES
NP TYPE (11) pronoun, the-np, a-np, etc.
String-level (LEXICAL) features (4)
str_match head_match embedded
SYNTACTIC PROPERTIES OF ANAPHOR (7)
contains superlative anaphor occurs in
appositive / predicative contruction pre-post
modified
57
NG AND CARDIES FEATURES (II)
POSITION IN TEXT OF NP (3) header / first
sentence / first para
SEMANTIC FEATURES (4) alias UK /
United Kingdom whether NP is title (the
president)
COMPOSITION OF DD the_2n the_num_n
58
NG AND CARDIES METHODS
  • DN detectors
  • statistical classifiers trained using C4.5 and
    RIPPER
  • Features predicate superlative detection /
    head match / position in text of NP
  • Tested over MUC-6 (F86) and MUC-7 (F84)
  • 2 architectures for integration of detectors and
    AR
  • Run DN detector first, apply AR on NPs
    classified as anaphoric
  • Run AR if str_match or aliasY otherwise, as in
    1.

59
NG AND CARDIES RESULTS(DNEW CLASSIFIER)
F
Baseline(Majority class) MUC-6 63.8
Baseline(Majority class) MUC-7 73.2
DNEW classifier MUC-6 86.1
DNEW classifier MUC-7 84.0
60
NG AND CARDIES RESULTS
MUC-6 MUC-6 MUC-6 MUC-7 MUC-7 MUC-7
P R F P R F
Baseline (no DN detection) 70.3 58.3 63.8 65.5 58.2 61.6
DN detection runs first 57.4 71.6 63.7 47.0 77.1 58.4
Same head runs first 63.4 68.3 65.8 59.7 69.3 64.2
61
NG AND CARDIES RESULTS
MUC-6 MUC-6 MUC-6 MUC-7 MUC-7 MUC-7
P R F P R F
Baseline (no DN detection) 70.3 58.3 63.8 65.5 58.2 61.6
DN detection runs first 57.4 71.6 63.7 47.0 77.1 58.4
Same head runs first 63.4 68.3 65.8 59.7 69.3 64.2
62
URYUPINA (2003)
  • Trained two separate classifiers
  • A DNEW detector
  • A UNIQUENESS detector (to identify proper names,
    semantic definites, and larger situation DDs)

63
URYUPINAS METHODS
  • A DN statistical classifier trained using RIPPER
  • Trained / tested over Ng and Cardies MUC-7 data

64
URYUPINAS FEATURES
STRING-LEVEL INFORMATION Capitalized
words digits
SYNTACTIC STRUCTURE OF ANAPHOR contains
apposition post modified
CONTEXT FEATURES distance between anaphor
and previous NP with same head
65
URYUPINAS FEATURESWEB-BASED DEFINITE
PROBABILITY

66
URYUPINAS RESULTS(DNEW CLASSIFIER)
P R F
All NPs No Def Prob 87.9 86.0 86.9
All NPs Def Prob 88.5 84.3 86.3
Def NPs No Def Prob 82.5 79.3 80.8
Def NPs Def Prob 84.8 82.3 83.5
(On MUC-7)
67
URYUPINAS RESULTS(DNEW CLASSIFIER)
P R F
All NPs No Def Prob 87.9 86.0 86.9
All NPs Def Prob 88.5 84.3 86.3
Def NPs No Def Prob 82.5 79.3 80.8
Def NPs Def Prob 84.8 82.3 83.5
(On MUC-7)
68
PRELIMINARY CONCLUSIONS
  • Quite a lot of agreement on features for DN
    recognition
  • Recognizing predicative NPs
  • Recognizing establishing relatives
  • Recognizing DNEW proper names
  • Identifying functional DDs
  • Automatic detection of these better
  • Using the Web best
  • All these systems integrate DN detection with
    some form of AR resolution
  • See Ngs results concerning how globally
    optimized classifiers are better than locally
    optimized ones (ACL 2004)

69
PRELIMINARY CONCLUSIONS, II
  • Ng and Cardies results not the last word
  • Performance of their DN detector not as high as
    Uryupinas (F84 vs. F87 on same dataset, MUC-7)
  • Overall performance of their resolution system
    not that high
  • best performance F65.8 on ALL NPs
  • But on full NPs (i.e., excluding PNs and
    pronouns) F31.7 (GUITAR on DDs, unparsed text
    F56.4)
  • Room for improvement

70
A NEW SET OF EXPERIMENTS
  • Incorporate the improvements in DN detection
    technology to
  • the Vieira / Poesio algorithm, as reimplemented
    in a state-of-the-art specialized AR system,
    GUITAR
  • a statistical general purpose AR resolver
    (Uryupina, in progress)
  • Test over a large variety of data
  • New GNOME corpus (623 DDs)
  • Original Vieira and Poesio dataset (1400 DDs)
  • MUC-7 (for comparison with Ng and Cardie,
    Uryupina) (3000 DDs)

71
ARCHITECTURE
  • A two-level system
  • Run GUITARs direct anaphora resolution
  • Results used as one of the features of a
    statistical discourse-new classifier
  • A globally optimized system (Ng, ACL 2004)
  • Trained / tested over
  • GNOME corpus
  • Vieira / Poesio dataset, converted to MMAX,
    converted to MAS-XML (still correcting the
    annotation)

72
A NEW SET OF FEATURES
DIRECT ANAPHORA Run the Vieira / Poesio
algorithm -1 if no result else distance
PREDICATIVE NP DETECTOR DD occurs in
apposition DD occurs in copular construction
PROPER NAMES c-head c-premod Bean
and Riloffs S1
73
A REVISED SET OF FEATURES (II)
FUNCTIONALITY Uryupinas four definite
probabilities (computed off the Web)
superlative
ESTABLISHING RELATIVE (a single feature)
POSITION IN TEXT OF NP (Ng and Cardie)
header / first sentence / first para
74
LEARNING A DN CLASSIFIER
  • Use of the data
  • 8 for parameter tuning
  • 10-fold cross-validation over the rest
  • Classifiers from the Weka package
  • Decision Tree (C4.5), NN (MLP), SVM
  • 3 evaluations (overall, DN, DA)
  • Performance comparison t-test (cfr. Dietterich 98

75
3 EVALUATIONS
OVERALL
DN
DA
76
RESULTSOVERALL
T Res C PRF
GuiTAR 574 574 457 79.6
GuiTAR MLP 574 574 473 82.4
GuiTAR C4.5 574 574 466 81.18
p ? .1
notsig
77
RESULTSDNEW CLASSIFICATION
P R F A
DNC4.5 86.9 92.3 89.3 85.04
DNMLP 86.4 94.6 90.2 85.89
DNSVM 90.0 86.4 88.1 84.15
BASELINE(all DDs are DN) 67.5 100 80.6 67.5
78
RESULTSDIRECT ANAPHORA RESOLUTION
T Res C NM WM SM P R F
GuiTAR 184 198 119 38 27 52 60.1 64.7 62.3
GuiTAR MLP 184 142 104 60 20 18 74.1 56.5 63.4
GuiTAR C4.5 184 158 106 56 22 30 68.9 57.7 62.1
GuiTAR SVM 184 198 119 38 27 52 60.1 64.7 62.3
79
RESULTSANAPHORA RESOLUTION
CROSS CROSS CROSS UNSEEN UNSEEN UNSEEN
P R F P R F
Without DN detection 62.6 69.1 65.7 76.5 72.2 74.3
With DN detection 84.2 59.2 69.6 92.3 66.7 77.4
80
ERROR ANALYSIS
  • A 65 reduction in spurious matches
  • the answer to any of these questions
  • the title of cabinet maker and sculptor to Louis
    XIV, King of France
  • the other half of the plastic
  • But a 58 increase in no matches
  • the palm of the hand

81
THE DECISION TREE
DirectAna lt -1?
82
RESULTSTHE VIEIRA/POESIO CORPUS
  • Tested on 400 DDs (the test corpus)
  • Initial results at DN detection very poor
  • Problem the two conversions resulted in the loss
    of much information about modification,
    particularly relatives
  • Currently correcting the annotation by hand

83
RESULTSAUTOMATIC PARSING
  • GUITAR without DN detection over the same texts,
    but using a chunker 10 less accuracy
  • Main problem many DDs not detected (particularly
    possessives)
  • Currently experimenting with full parsers (tried
    several, settled on Charniaks)

84
CONCLUSIONS AND DISCUSSION
  • All results so far support the idea that DN
    detectors improve the performance of AR with DD
    (if perhaps by only a few percent)
  • Some agreement on what features are useful
  • One clear lesson interleave AR and DN detection!
  • But will need to test on larger corpora (also to
    improve performance of classifier)
  • Current work
  • Test on unparsed text
  • Test on MUC-7 data

85
Task-based evaluation
  • RANLP / EMNLP slides

86
Conclusions
87
URLs
  • Massimo Poesio http//cswww.essex.ac.uk/staff/po
    esio
  • GUITAR http//privatewww.essex.ac.uk/malexa/GuiT
    AR/
  • WEKAhttp//www.cs.waikato.ac.nz/ml
Write a Comment
User Comments (0)