Title: Incorporating Contextual Cues in Trainable Models for Coreference Resolution
1Incorporating Contextual Cues in Trainable
Models for Coreference Resolution
- 14 April 2003
- Ryu Iida
- Computational Linguistic Laboratory
- Graduate School of Information Science
- Nara Institute of Science and Technology
2Background
- Two approaches to coreference resolution
- Rule-based approach Mitkov 97, Baldwin 95,
Nakaiwa 96, Okumura 95, Murata 97 - Many attempted to encode linguistic cues into
rules - This was significantly influenced by Centering
TheoryGrosz 95, Walker et al. 94, Kameyama, 86 - Best-achieved performance in MUC Precision
roughly 70 (Message Understanding Conference)
Recall roughly 60 - Corpus-based machine learning approachAone and
Bennett 95, Soon et al. 01, Ng and Cardie 02,
Seki 02 - Cost effective
- They have achieved a performance comparable to
best performing rule-based systems
Problem Further manual refinement is needed in
this study but it will be
prohibitively costly
Problem These previous work tend to lack an
appropriate reference to the theoretical
linguistic work on coherence and coreference
3Background
- Challenging issue
- Achieving a good union between theoretical
linguistic findings and corpus-based empirical
methods
4Outline of this Talk
- Background
- Problems with previous statistical approaches
- Two methods
- Centering features
- Tournament-based search model
- Experiments
- Conclusions
5Statistical approaches Soon et al. 01, Ng and
Cardie 02
- Reach a level of performance comparable to
state-of-the-art rule-based systems - Recast the task of anaphora resolution as a
sequence of classification problems
6Statistical approaches Soon et al. 01, Ng and
Cardie 02
antecedent
- the task is to classify these pairs of noun
phrases into positive or negative - positive instance Pair of an anaphor and the
antecedent - negative instance Pairs of an anaphor and the
NPs located between the anaphor and the antecedent
MUC-6
A federal judge in Pittsburgh issued a temporary
restraining order preventing Trans World Airlines
from buying additional shares of USAir Group
Inc. The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to buy the
company for 52 a share.
?
anaphor
7Statistical approaches Soon et al. 01, Ng and
Cardie 02
- Feature set Ng and Cardie 02
POS DEMONSTRATIVE STRING_MATCHNUMBERGENDERSEM
_CLASSDISTANCESYNTACTIC ROLE
features
candidate
anaphor
Organization1
Prp_noun1
Organization1
SENT_DIST0
positive
STR_MATCH0
Pronoun0
Pronoun0
training (C4.5)
Model (decision tree)
8Statistical approaches Soon et al. 01, Ng and
Cardie 02
- Test Phase Ng and Cardie, 02
extract NPs
candidates
Select the best-scored candidateas the output
Input each pair of given anaphorand one of these
candidates to the decision tree
-2.0
-1.1
-0.4
We refer to Ng and Cardies model as the baseline
of our empirical evaluation
-1.0
-3.5
-0.3
-2.5
- Precision 78.0, Recall 64.2
- Slightly better than best-performing rule-based
model at MUC-7
9A drawback of the previous statistical models
The previous models do not capture local context
appropriately
antecedent
anaphor
Kameyama 98
features
POS Noun Prop_Noun YesPronoun NoNE
PERSONSEM_CLASS Person SENT_DIST 0
Positive and negative instances may have the
identical feature vector
POS Noun Prop_Noun YesPronoun NoNE
PERSONSEM_CLASS Person SENT_DIST 0
10Two methods
11Two methods
- Use more sophisticated linguistic cues centering
features - Augmentation of a set of new features inspired by
Centering Theory that implement local contextual
factors - Improve the search algorithm tournament model
- A new model which makes pair-wise comparisons
between candidates
12Centering Features
features
POS Noun Prop_Noun YesPronoun NoNE
PERSONSEM_CLASS Person SENT_DIST 0
POS Noun Prop_Noun YesPronoun NoNE
PERSONSEM_CLASS Person SENT_DIST 0
the problem is that the current feature set does
not tell the difference between these two
candidates
- Introduce extra devices such as the
forward-looking center list - Encode state transitions on them into a set of
additional features
13Two methods
- Use more sophisticated linguistic cues centering
features - We augment the feature set with a set of new
features inspired by Centering theory that
implement local contextual factors - Improve the search algorithm tournament model
- We propose a new model which makes pair-wise
comparisons between antecedent candidates
14Tournament model
- What we want to do is to answer a question which
is more likely to be coreferent, Sarah or
Glendora - Conduct a tournament consisting of a series of
matches in which candidates compete with each
other - Match victory is determined by a pairwise
comparison between candidates as a binary
classification problem - Most likely candidate is selected through a
single-elimination tournament of matches
Sarah went downstairs and received another
curious shock, for when Glendora flapped into the
dining room in her home made moccasins, Sarah
asked her when she had brought coffee to her
room, and Glendora said she hadn't.
15Tournament model
Training instances
features
class
right
NP5
NP1
ANP
- In the tournament, the correct antecedent NP5
must prevail over any of the other four
candidates - Extract four training instances
- Induce a pairwise classifier from a set of
extracted training instances - The classifier classifies a given pair of
candidates into left or right
NP5
NP4
ANP
right
NP7
NP5
ANP
left
NP8
NP5
ANP
left
the right hand side of a given pair wins (is more
likely to be the antecedent)
antecedent
NP7
ANP
NP6
NP5
NP4
NP3
NP2
NP1
NP8
coreferent
coreferent
coreferent
anaphor
beginning of document
16Tournament model
- the first match is arranged between the nearest
candidates (NP7 and NP8) - each of the following matches arranged in turn
between the winner (NP8) of the previous match
and a new challenger (NP5)
NP7
ANP
NP6
NP5
NP4
NP3
NP2
NP1
NP8
coreferent
coreferent
coreferent
anaphor
beginning of document
17Tournament model
3. the winner is next matched against the next
challenger (NP4) 4. this process is repeated
until the last one participate 5. the model
selects the candidate that prevails through the
final round as the answer
NP7
ANP
NP6
NP5
NP4
NP3
NP2
NP1
NP8
coreferent
coreferent
coreferent
anaphor
beginning of document
18Experiments
19Experiments
- Empirical evaluation on Japanese zero-anaphora
resolution - Japanese does not normally use personal pronoun
as anaphor - Instead, Japanese uses zero-pronouns
- Comparison among four models
- Baseline model
- Baseline model with Centering Features
- Tournament model
- Tournament model with Centering Features
20Centering Features in Japanese
- Japanese anaphora resolution model Nariyama 02
- Expansion of Kameyamas work on the application
of Centering Theory to Japanese zero-anaphora
resolution - Expanding the original forward-looking center
listinto Salience Reference List (SRL) to take
into account broader contextual information - More use of linguistic information
- In the experiments, we introduced two features to
reflect the SRL-related contextual factors
21Method
- Data
- GDA-tagged Japanesenewspaper article corpus
- Texts 2,176 60
- Sentences 24,475 -
- Tags of anaphoric relation 14,743
8,946 - Tags of ellipsis (Zero-anaphor) 5,966
0 - As a preliminarily test, only resolving subject
zero-anaphors, 2,155 instances in total - Conduct five fold cross-validation on that data
set with support vector machines
GDA
MUC-6
22Feature set (see our paper for details)
- Features for simulating Ng and Cardies feature
set - Centering Features
- Features for capturing the relations between two
candidates
- Order in SRL
- Heuristic rule of preference
introduce only in tournament model but not in the
baseline model
- Preference of SRL in two candidates
- Preference of Animacy in two candidates
- Distance between two candidates
23Results
Tournament model
Baseline model Centering Features
Baseline model
Tournament model Centering Features
24Results (1/3) the effect of incorporating
centering features
Baseline model Centering Features
67.0
64.0
Baseline model
centering features were reasonably effective
25Results (2/3)
Tournament model
Baseline model Centering Features
70.8
67.0
64.0
Baseline model
Introducing the tournament model significantly
improved the performance regardless the size of
training data
26Results (3/3)
Tournament model
Baseline model Centering Features
70.8
69.7
67.0
64.0
Baseline model
Tournament model Centering Features
most complex model did not outperform the
tournament model without centering features
The improvement ratio of this model against the
data size is the best of all
27Results after cleaning data (March 03)
74.3
Tournament model Centering Features
72.5
Tournament model
the tournament model with centering featuresis
more effective than the one without centering
features
28Conclusions
- Our concern is achieving a good union between
theoretical linguistic findings and corpus-based
empirical methods - We presented a trainable coreference resolution
model that is designed to incorporate contextual
cues by means of centering features and a
tournament-based search algorithm. These two
improvements worked effectively in our
experiments on Japanese zero-anaphora resolution.
29Future Work
- In Japanese zero-anaphora resolution,
- Identification of relations between the topic and
subtopics - Analysis of complex and quoted sentences
- Refinement of the treatment of selectional
restrictions
30(No Transcript)
31Tournament model
Training instances
features
class
right
beginning of document
NP5
NP1
ANP
NP1
NP5
NP4
ANP
right
coreferent
NP2
NP7
NP5
ANP
left
coreferent
NP3
antecedent
NP8
NP5
ANP
left
NP4
- In the tournament, the correct antecedent NP5
must prevail over any of the other four
candidates - extract four training instances
- Induce from a set of extracted training
instances a pairwise classifier
NP5
NP6
NP7
coreferent
NP8
ANP
anaphor
32Tournament model
A tournament consists of a series of matchesin
which candidates compete with each other
beginning of document
NP1
coreferent
NP2
coreferent
NP3
NP4
NP5
NP6
NP7
coreferent
NP8
ANP
anaphor
33Tournament model
- What we want to do is to answer a question which
is more likely to be coreferent, Sarah or
Glendora - Implement a pairwise comparison between
candidates as a binary classification problem
Sarah went downstairs and received another
curious shock, for when Glendora flapped into the
dining room in her home made moccasins, Sarah
asked her when she had brought coffee to her
room, and Glendora said she hadn't.
downstairs
dining room
Sarah
CHAIN(Cb Cp Sarah)
CHAIN(Cb Cp Glendora)
transition
lt
Glendora
Sarah
she
34Tournament model
She
extract NPs
downstairs
Training instances
Glendora
lt
Glendora
downstairs
she
moccasins
Sarah
lt
Glendora
moccasins
she
her
she
lt
Glendora
coffee
she
coffee
her
lt
Glendora
Sarah
she
room
lt
Glendora
room
she
Glendora
she
output class
coreferred
coreferent
35Conclusions
- To incorporate linguistic cues into trainable
approaches - Add features which takes into consideration
linguistic cues such as Centering Theory
Centering Features - Propose the novel search model which the
candidates are compared in terms of the
likelihood of antecedentsTournament model - In Japanese zero-anaphora resolution
task,Tournament model significantly outperforms
earliermachine learning approaches Ng and
Cardie 02
Incorporating linguistic cues in machine
learning models is effective
36Data
- GDA-tagged Japanesenewspaper article corpus
- Texts 2,176 60
- Sentences 24,475 -
- Tags of anaphoric relation 14,743
8,946 - Tags of ellipsis (Zero-anaphor) 5,966
0
GDA
MUC-6
ltn idtagid1gt?????????lt/ngt???????????????ltn
idtagid2gt????????lt/ngt?????????????????????????
??????????????????????? ???ltn eqtagid2gt???lt/ngt?
????????????????????? ltn eqtagid1gt????lt/ngt?????
???????????????????? ltn eqtagid1gt????lt/ngt??????
?????ltv agttagid1gt???ltvgt?????????????????????
Extract 2,155 example
37Statistical approaches Soon et al. 01, Ng and
Cardie 02
- Reach a level of performance comparable to
state-of-the-art rule-based systems - Recast the task of anaphora resolution as a
sequence of classification problems - Pair of an anaphor and the antecedentpositive
instance - Pairs of an anaphor and the NPs located between
the anaphor and the antecedent negative instance
the task is to classify these pairs of noun
phrases into positive or negative.
outputclass
38Centering Features
- Centering Theory Grosz 95, Walker et al. 94,
Kameyama, 86 - Part of an overall theory of discourse structure
and meaning - Two levels of discourse coherence global and
local - Centering models the local-level component
of attentional state - e.g. Intrasentential centering Kameyama 97
39Centering Features in English Kameyama 97
Sarah went downstairs and received another
curious shock, for when Glendora flapped into
the dining room in her home made
moccasins, Sarah asked her when she had
brought coffee to her room, and Glendora said
she hadn't.
CHAIN(Cb Cp Sarah)
ESTABLISH(Cb Cp Glendora)
CHAIN(Cb Glendora, Cp Sarah)
CHAIN(Cb Cp Glendora)
CHAIN(Cb NULL, Cp Glendora)
CHAIN(Cb Cp Glendora)
Kameyama 97
40Centering Features in English Kameyama 97
- The essence is that takes into account the
preference between candidates - Cb and Cp distinguish the two candidates
Sarah went downstairs and received another
curious shock, she hadn't.
CHAIN(Cb Cp Sarah)
transition
CHAIN(Cb Cp Glendora)
Implement local contextual factor centering
features
41Tournament model
A tournament consists of a series of matchesin
which candidates compete with each other
42Rule-based Approaches
- Encoding linguistic cues into rules manually
- Thematic roles of the candidates
- Order of the candidates
- Semantic relation between anaphors and
antecedents - etc..
- This approaches are influenced by Centering
TheoryGrosz 95, Walker et al. 94, Kameyama, 86 - The Coreference Resolution Task of Message
Understanding Conference (MUC-6 / MUC-7) - Precision roughly 70
- Recall roughly 60
Further manual refinement of rule-based
modelswill be prohibitively costly
43Statistical Approaches with Tagged-Corpus
- The statistical approaches have achieved a
performance comparable to the best-performing
rule-based systems - Lack an appropriate reference to theoretical
linguisticwork on coherence and coreference
44Test Phase Soon et al. 01
extracting NP
A federal judge in Pittsburgh issued a temporary
restraining order preventing Trans World Airlines
from buying additional shares of USAir Group
Inc. The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to buy the
company for 52 a share.
candidates
judge
Pittsburgh
order
Trans World Airlines
share
USAir Group Inc
order
a suit
USAir
anaphor
- Precision 67.3, Recall 58.6 on MUC data set
45Improving Soons model
- Ng and Cardie 02
- Expanding the feature set
- 12 features ? 53 features
- Introducing a new search algorithm
POS DEMONSTRATIVE STRING_MATCHNUMBERGENDERSEM
_CLASSDISTANCE SYNTACTIC ROLE
46Test Phase Soon et al. 01
extracting NP
A federal judge in Pittsburgh issued a temporary
restraining order preventing Trans World Airlines
from buying additional shares of USAir Group
Inc. The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to buy the
company for 52 a share.
candidates
judge
Pittsburgh
order
Trans World Airlines
share
USAir Group Inc
order
a suit
USAir
anaphor
- Precision 67.3, Recall 58.6 on MUC data set
47Task of Coreference Resolutions
- Two process
- Resolution of anaphors
- Resolution of antecedents
- applications
- Machine Translation, IR, etc
antecedent
A federal judge in Pittsburgh issued a temporary
restraining order preventing Trans World Airlines
from buying additional shares of USAir Group
Inc. The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to buy the
company for 52 a share.
(Same color NPs are coreferred)
MUC-6
anaphor
48Future Work
- Evaluate some examples
- Tournament model doesnt deal with Direct quote
- Proposed methods cannot deal with different
discourse structures
???????????????????????????????????????????????
????????????????????????????????????????????
SRL
49Centering Features of Japanese
- Adding the likelihood of antecedents into
features - In Japanese, wa-marked NPs tend to be topics
- Topics tend to be omitted
- Salience Reference List (SRL) Nariyama 02
- Store NPs in SRL from the beginning of text
- Overwrite the old entity if new entity fills same
point
Topic/f (wa) gt Focus (ga) gt I-Obj (ni) gt
D-Obj (wo) gt Others
preferred
NP1-waNP2-wo? NP3-ga?NP4-ha? NP5-ni(f-ga)V
?
50Evaluation of models
- Introduce a confidence measure
- Confidence coefficient is the value when two
candidatesare the nearest at the tournament
0.9
2.4
3.2
3.8
51Evaluation of Tournament model
- investigate the Tournament model (the best
performance )
52Centering Features
????????????????????????????????????????(f?)???
?????
President A proposed the armistice, but
President B ignored this. And he started action.
SRL
???????? gt NULL gtNULL gt NULL gt NULL
???????????? gt NULL gt?? gt NULL gt NULL
53Features (1/3) Ngs model, Tournament model
- Features are decided by one candidate
candidate1
candidate2
anaphor
- POS
- Pronoun
- Particle
- Named-Entity
- the number of anaphoric relations
- First NP in a sentence
- Order of SRL
54Features (2/3) Ngs model, Tournament model
- Features are decided by a pair of an anaphor and
the candidate
candidate1
candidate2
anaphor
- Selectional restrictions
- the pair of candidate and anaphor satisfies
selectional restriction in Nihongo Goi Taikei - log-likelihood ratio calculated from
cooccurrence data - Distance in terms of sentence between an
anaphor and the candidate
55Features (3/3) only Tournament model
- Features are decided by the relation between two
candidates
candidate1
candidate2
anaphor
- Distance in terms of sentence between two
candidates - Animacy
- Whether or not one candidate belongs to the
class of PERSON or ORGANIZATION - Which candidate is preferred in SRL
56Anaphoric relations
endophora
Antecedent exists in a context
anaphora
Antecedents precede anaphors
cataphora
Anaphors precede antecedents
exophora
Antecedent doesnt exist in context
- Variety of antecedents
- Noun Phrase (NP), Sentence, text, etc
- Many previous works deal with anaphora
resolutions - The number of antecedent-anaphor examples is the
most of all
57Results (examples 2155 ? 2681)
- Tournament model
- The Model using Centering Features gets worse
thanthe model without Centering Features - Modify some tagging errors by hand
- examples 2155 ? 2681