Title: Learning noun phrase coreference resolution
1Learning noun phrase coreference resolution
Veronique Hoste ATILA Meeting 2004
2Outline
- Definition
- Data sets
- Experimental setup
- The effect of optimization
- The effect of skewedness
- Results on the test set
3Definition (Hirst, 81)
- Anaphora is the device of making in discourse an
abbreviated reference to some entity in the
expectation that the perceiver will we able to
disabbreviate the reference and thereby determine
the identity of the entity.
4Definition (Hirst, 81)
- Anaphora is the device of making in discourse an
abbreviated reference to some entity in the
expectation that the perceiver will we able to
disabbreviate the reference and thereby determine
the identity of the entity.
ANAPHOR
5Definition (Hirst, 81)
ANTECEDENT or REFERENT
ANAPHOR
- Anaphora is the device of making in discourse an
abbreviated reference to some entity in the
expectation that the perceiver will we able to
disabbreviate the reference and thereby determine
the identity of the entity.
6Definition (Hirst, 81)
ANTECEDENT or REFERENT
ANAPHOR
- Anaphora is the device of making in discourse an
abbreviated reference to some entity in the
expectation that the perceiver will we able to
disabbreviate the reference and thereby determine
the identity of the entity.
RESOLUTION
7Example (KNACK-2002)
-
- () In de praktijk is er van autonomie of
vrijheid in de beide Kashmirs geen sprake, want
ze zijn sinds jaar en dag de twistappel tussen
Pakistan en India. Die twee landen onstonden in
1947 om een conflict tussen moslims en hindoes te
vermijden. () De Verenigde staten probeerden
vruchteloos van Pakistan en India de belofte af
te dwingen dat ze geen kernwapens zouden
inzetten. Dat leidde zelfs tot economische
sancties tegen beide landen.
8Example (KNACK-2002)
-
- () In de praktijk is er van autonomie of
vrijheid in de beide Kashmirs geen sprake, want
ze zijn sinds jaar en dag de twistappel tussen
Pakistan en India. Die twee landen onstonden in
1947 om een conflict tussen moslims en hindoes te
vermijden. () De Verenigde staten probeerden
vruchteloos van Pakistan en India de belofte af
te dwingen dat ze geen kernwapens zouden
inzetten. Dat leidde zelfs tot economische
sancties tegen beide landen.
9Example (KNACK-2002)
-
- () In de praktijk is er van autonomie of
vrijheid in de beide Kashmirs geen sprake, want
ze zijn sinds jaar en dag de twistappel tussen
Pakistan en India. Die twee landen onstonden in
1947 om een conflict tussen moslims en hindoes te
vermijden. () De Verenigde staten probeerden
vruchteloos van Pakistan en India de belofte af
te dwingen dat ze geen kernwapens zouden
inzetten. Dat leidde zelfs tot economische
sancties tegen beide landen.
10Which anaphora?
- Identity relation
- lt-gt type-token relation I prefer the red car,
but my husband wanted the grey one. - lt-gt part-whole relation If the gas tank is
empty, you should refuel the car. - NPs
- Personal, posessive and demonstrative pronouns
- Definite and indefinite NPs
11MUC-6 and MUC-7
- Message Understanding Conference
- Extensively used for evaluation
- Articles from WSJ and NYT
- Identity relation between NPs
- MUC-6 2141coreferential NPs in training set
- and 2091 in test set
- MUC-7 2569 coreferential NPs in training
- set and 1728 in test set
12KNACK-2002
- Articles from KNACK 2002 on different topics
national and international politics, science,
culture, - Annotation adapted version of MUC guidelines
- Identity, bound, ISA, modality relations between
NPs - http//cnts.uia.ac.be/hoste/manual_dutch.ps
- Ca. 12,546 coreferentially annotated NPs
13(No Transcript)
14Approaches
- The past mostly knowledge-based techniques
(constraints and preferences) - e.g. Lappin Leass (1994), Baldwin (CogNIAC,
1996) - Recently machine learning (C4.5,Ripper, Maximum
entropy)
Redefine coreference resolution as a
CLASSIFICATION task
15A classification based approach
- Given two entities in a text, NP1 and NP2,
classify the pair as coreferent of not
coreferent. - E.g.
- De Verenigde staten probeerden vruchteloos van
Pakistan en India de belofte af te dwingen dat
ze geen kernwapens zouden inzetten. - ze - de belofte not coreferential
- ze - Pakistan en India coreferential
- ze - De Verenigde Staten not
coreferential
16Selected features (41)
- Positional features (eg. dist_sent, dist_NP)
- Local context features
- Morphological and lexical features (e.g.
i/j/ij-pron, j_demon, j_def, i/j/ij-proper,
num_agree) - Syntactic features (e.g. i/j/ij_SBJ, appos)
- String-matching features (comp_match,
part_match, alias, same_head) - Semantic features (syn, hyper, same_NE, 4
features indicating semantic class)
17Positive and negative instances
- Per NP type (Pronouns/Proper nouns/Common
nouns) - Positive combination of the anaphor with each
preceding element in the coreference chain. - Negative combination of the anaphor with each
preceding NP which is not part of the coreference
chain (search scope lt 20 sentences) - e.g. MUC-7 1,905 coreferential NPs
- positive 11,266 inst.
- negative 159,815 inst.
18Two step procedure
- First step validation
- Application of Timbl and Ripper on train set
10-fold-cv - Evaluation accuracy, precision, recall, F-beta
- Second step testing
- Training of Timbl and Ripper on train set
testing on test set. - Reconstruction of coreference chains
- Evaluation using MUC scoring software
19Algorithms compared
- Ripper
- Cohen, 95
- Rule Induction
- Algorithm parameters different class ordering
principles negative conditions or not loss
ratio values cover parameter values - TiMBL
- Memory-Based Learning
- Algorithm parameters ib1, igtree overlap, mvdm
5 feature weighting methods 4 distance weighting
methods 10 values of k
20Default classifier results (MUC6)
21Conclusions default experiments
- The concatenation of the NP-type classifiers is
beneficial for Ripper, not for Timbl. - Low precision scores for Timbl (large number of
false positives). The scores are up to 30 lower
than the ones for Ripper. - Reason feature weighting?
-
- Higher recall for Timbl distinguishes better
between true and false negatives.
22GA optimization
23GA individuals
Feature weighting 0,1,2,3,4
Neighbour weighting 0,1,2,3
Values 0,1,2
k
0 1 0 1 2 0 2 1 0 2 0 0 2 1 0 2 2 0 3 2
2.0288721872
Parameters
Features
24GA optimization results MUC6
25Is skewedness a problem?
- In an unbalanced data set, the majority class is
represented by a large portion of all the
instances whereas the other class, the minority
class has only a small part of the instances. - E.g. MUC-7 only 6 positive instances
- Imbalanced data sets may result in poor
performances of standard classification
algorithms - gt problem of ignoring the minority class
-
26Strategies for dealing with skewed data sets
- Sampling
- undersampling
- oversampling
- Adjusting misclassification costs (high cost to
misclassification of the minority class) - Weighting of examples (focus on the minority
class)
27Sampling
- Undersampling examples from the majority class
are - removed
- Problem throw away possibly useful
information - Oversampling examples from the minority class
are - duplicated
- Problem no increase of information,
overfitting - General observation in ML literature
- - undersampling leads to better
performance - - oversampling does not help
28Skewedness (MUC6)
29Downsampling results MUC6
30Changing loss ratio in Ripper
- Loss ratio parameter allows to specify the
relative - cost of false positives and false negatives
- Focus on recall loss ratio lt 1
- Focus on precision loss ratio gt 1
31Skewedness summary
- Comparison of the sensitivity of Timbl and Ripper
to the skewed data set (ML past C4.5) - Both learners large number of FN
- Ripper has a much poorer performance on the
minority class (Forgetting exceptions ?) - Ripper is also more sensitive to rebalancing
- No particular downsampling level or loss ratio
value leads to overall best performance - gt yet another optimization step ...
32Testing
- Construction of test instances all NPs starting
from the second NP in the document are considered
a possible anaphor, whereas all preceding NPs are
considered possible antecedents of the anaphor
under consideration. - Useful?
33(No Transcript)
34Testing (ctd.)
- Application of optimized classifiers
- Antecedent selection
- New evaluation procedure evaluation of the
equivalence classes (transitive closure of a
coreference chain)
35Testing results (MUC6)
36What about ...
- One reason Lockheed Martin Corp. did not
announce a full acquisition of Loral Corp. was
that Lockheed could not meet the price he had
placed on Lorals 31 percent ownership of
Globalstar telecommunciations Ltd. Lockheed will
invest 344 million in Loral Space and
Communciations Corp., a new company whose
principal holding will be Lorals interest in
Globalstar.
37What about ...
- Hughes pays U.S. 4 mln fine from
whistleblower case. Hughes Electronics Corp. has
paid the U.S. government 4 million to settle a
1990 lawsuit.
38What about ...
- Chinas Foreign Trade Minister Wu Yi has
extended an olive branch to Taiwan saying Beijing
remained committed in talks with the breakaway
island to establish direct trade and
communication links.