Title: Reporter: Jun Lang
1Automatic Acquisition of Gender Information for
Anaphora Resolution
Reading Group
- Reporter Jun Lang
- Date 2005-12-26
- Email bill_lang_at_ir.hit.edu.cn
2Author Paper Information
- Author Shane Bergsma
- Postgraduate of Depart. of CS, Univ. of Alberta
- Supervisor Dekang Lin (with Minipar)
- Research Interests Anaphora Resolution
- Publications
- Shane Bergsma, Automatic Acquisition of Gender
Information for Anaphora Resolution, Canadian
AI'2005 Best Paper Award - Colin Cherry and Shane Bergsma, An Expectation
Maximization Approach to Pronoun Resolution,
CoNLL 2005 - Three months intern of Google, in 2005
- Homepage http//www.cs.ualberta.ca/bergsma/
3Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
4Why do such research?
Anaphora resolution
Richard, DAARC 2000 1
Number Agreement
Minipar for Plural Info.
his,him,her,himself,herself
Gender Agreement
Mr., Mrs.
actress, actor, etc.
With Probability
Masculine he
- John never saw the car. He arrived late.
- John never saw the car. It arrived late.
Feminine she
Neutral it
Plural they
5Related Works
- Typically two step
- Filter gender and number, and binding theory
- Select best more recent, more frequent
- Recent Trend
- Machine learning classifier using annotated
corpus - Gender Information
- WordNet (Soon and Ng, CL 2001 2)
- Learn gender from unlabelled text, on un-aware
pronoun, accuracy 70 - Web for anaphora resolution
- Page Counts for Patterns (Natalia EACL Workshop
20033)
6Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
7Automatic Acquisition Noun-Gender Pairs
- Government and Binding Theory
- Principle A A reflexive pronoun must be bound by
an antecedent in its governing category (Niyu Ge
1994 4 - Pattern matching
- Eg. John explained himself.
- Method for determining probability
- On large amount of text, Count the number of
times it binds with gender information
8Parsed Corpus Frequencies
Dependency Relation
Noun-Pronoun pair
9Web Frequencies
- Out of Vocabulary Using Web(Frank Keller, EMNLP
20025) - Count the number of pages retuned by Google
10Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
11Gender Information Modeling(1/5) Maximum
likelihood
- Counting for Maximum likelihood formulation
- Five parsed corpus templates
- Five web mining templates
- Example
- doctor-himself 224 doctor-herself 126
- doctor-itself 0 doctor-themselves 14
12Gender Information Modeling(2/5) Fault of
Maximum Likelihood
- Two issues
- Small counts will result in large probability
swings. - (1,2) 30, (2,2) 50, (3,2) 60
- Need a measure of how certain about the prob.
- For example 3/(32) 300/(300200) 0.6
- Solution
- Each noun-pronoun pair of gender is treated as a
separate event. It is followed by Beta
distribution.
13Gender Information Modeling(3/5) Beta
Distribution
14Gender Information Modeling(4/5) Example of Beta
Distribution
- Example1
- Gretzky-his 4650
- Gretzky-her 0
- Gretzky-its 54
- Gretzky-their 40
- Betahis4651,95
- Example2
- Beta3,2
- Beta300,200
15Gender Information Modeling(5/5) Features and
Classifier
- Feature vector for machine learning classifier
- One to all, just like example1 in the above slide
- u
-
- Each gender attribute with a feature space and a
separate classifier
16Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
17Testing Gender Classification(1/3) Data Set
- Gather gender info. AQUAINT corpus and Reuters
corpus(6 gigabytes of text totally) - Training and testing data (manual labelled)
- Third person pronoun-antecedent pairs in 118
documents from the slate section of American
National Corpus - Training set 1398 labeled pronouns in 79
documents - Testing set 1381 labeled pronouns in 41
documents - Masculine 24, Feminine 7.6, Neutral 33.8,
Plural 34.6
18Testing Gender Classification(2/3) Tool and
features
- Machine Learning tool
- SVMlight
- Why such tool?
- Efficient implementations
- Easy for continuous-valued gender features
- It had shown good performance on various machine
learning tasks - Configuration
- with a linear kernel
- without normalization
- Features
- u and of Beta distribution each pattern has
two features - Each gender attribute with a feature space and a
separate classifier
19Testing Gender Classification(3/3) Experiment
result
Masculine 24, Feminine 7.6, Neutral 33.8,
Plural 34.6
20Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
21Pronoun Resolution with Enhanced Gender(1/5)
Framework
- Purpose
- Whether gender information can improve the
performance of anaphora resolution.
he...Susan...she...
Searching Scope previous and current
sentence Motivation 97 antecedents in the
current or previous sentence
If fail, re-search backward by decreasing the
margin distance, loop so, until success
22Pronoun Resolution with Enhanced Gender(2/5)
baseline
Rejecting match where the gender is known and it
does not agree with the pronoun.
Always choose the previous noun phrase
Rejecting match where the gender is known and it
does not agree with the pronoun or the
corresponding gender classifier gender.
he...Susan...she...
he.Susan. John Smith...she...
X
v
heSusan.the actressJohn Smith...she
...
X
23Pronoun Resolution with Enhanced Gender(3/5)
Robust
- A machine learning anaphora resolution system
- On SVMlight
- Construct positive and negative instances
- Same as Soon and Ng, CL 20012
Positive 1251
..he.the actressJohn Smith.U.S.A...sh
e.
Negative 2909
24Pronoun Resolution with Enhanced Gender(4/5)
Features Extraction
- Features(44 totally)
- Syntactic and Semantic features
- Beta distribution of gender sources
- Preprocess
- Tokenizing
- Parsing
- Linking nouns (matching strings and sharing
gender info)
25Syntax
NE
Special
26(No Transcript)
27Pronoun Resolution with Enhanced Gender(5/5)
with and without gender
Christopher Kennedy, COLING 19966 75 on 306
anaphoric pronouns
Ruslan Mitkov, CICLING 20027 62
on 2263 anaphoric pronouns
28Outline
- Why do such research?
- Related Works
- Automatic Acquisition Noun-Gender Pairs
- Parsed Corpus Frequencies
- Web Frequencies
- Gender Information Modeling
- Testing Gender Classification
- Pronoun Resolution with Enhanced Gender
- Conclusion
29Conclusion
Gender Information
Anaphora resolution
Improve
Best on broadest and accurate
Improvement
QA
First using web mining
Mining new features
Improvement
New approaches to employing the existing features
Improved parser Lager corpus
Growth of the WWW
Detect pleonastic pronouns
Resolve cataphora
30References
- Evans, Richard and Constantin Orasan. "Improving
Anaphora Resolution By Identifying Animate
Entities in Texts." DAARC2000. - Soon, Wee Meng, Hwee Tou Ng and Daniel Chung Yong
Lim. "A Machine Learning Approach to Coreference
Resolution of Noun Phrases." Computational
Linguistics 27, No. 4 (2001) 521-44. - Natalia Modjeska, Katja Markert, Malvina Nissim
and. "Using the Web for Nominal Anaphora
Resolution." Paper presented at the EACL 2003
Workshop on The Computational Treatment of
Anaphora, Budapest, Hungary, 14 April 2003. - Niyu Ge, John Hale, and Eugene Charniak.
Introduction to Government Binding theory
Second Edition. Basil Blackwell, Cambridge, UK,
1994. - Frank Keller, Maria Lapata, and Olga Ourioupina.
Using the web to overcome data sparseness. In
Proceedings of the Conference on Empirical
Methods in Natural Language Processing, pages
230-237, 2002. - Christopher Kennedy and Branimir Boguarev.
"Anaphora for Everyone Pronominal Anaphora
Resolution Without a Parser." Paper presented at
the Proceedings of the 16th International
Conference on Computational Linguistics (COLING
96) 1996. - Ruslan Mitkov, Richard Evans and Constantin
Orasan. "A New, Fully Automatic Version of
Mitkov's Knowledge-Poor Pronoun Resolution
Method." Paper presented at the Proceedings of
the Third International Conference on Intelligent
Text Processing and Computational Linguistics
(CICLing-2002) 2002.
31Reading Group
Thanks! Any Question?
- Reporter Jun Lang
- Date 2005-12-26
- Email bill_lang_at_ir.hit.edu.cn