Reporter: Jun Lang - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Reporter: Jun Lang

Description:

Colin Cherry and Shane Bergsma, An Expectation Maximization Approach to Pronoun ... pairs in 118 documents from the slate section of American National Corpus ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 32
Provided by: junl7
Category:
Tags: jun | lang | reporter | slate

less

Transcript and Presenter's Notes

Title: Reporter: Jun Lang


1
Automatic Acquisition of Gender Information for
Anaphora Resolution
Reading Group
  • Reporter Jun Lang
  • Date 2005-12-26
  • Email bill_lang_at_ir.hit.edu.cn

2
Author Paper Information
  • Author Shane Bergsma
  • Postgraduate of Depart. of CS, Univ. of Alberta
  • Supervisor Dekang Lin (with Minipar)
  • Research Interests Anaphora Resolution
  • Publications
  • Shane Bergsma, Automatic Acquisition of Gender
    Information for Anaphora Resolution, Canadian
    AI'2005 Best Paper Award
  • Colin Cherry and Shane Bergsma, An Expectation
    Maximization Approach to Pronoun Resolution,
    CoNLL 2005
  • Three months intern of Google, in 2005
  • Homepage http//www.cs.ualberta.ca/bergsma/

3
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

4
Why do such research?
Anaphora resolution
Richard, DAARC 2000 1
Number Agreement
Minipar for Plural Info.
his,him,her,himself,herself
Gender Agreement
Mr., Mrs.
actress, actor, etc.
With Probability
Masculine he
  • John never saw the car. He arrived late.
  • John never saw the car. It arrived late.

Feminine she
Neutral it
Plural they
5
Related Works
  • Typically two step
  • Filter gender and number, and binding theory
  • Select best more recent, more frequent
  • Recent Trend
  • Machine learning classifier using annotated
    corpus
  • Gender Information
  • WordNet (Soon and Ng, CL 2001 2)
  • Learn gender from unlabelled text, on un-aware
    pronoun, accuracy 70
  • Web for anaphora resolution
  • Page Counts for Patterns (Natalia EACL Workshop
    20033)

6
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

7
Automatic Acquisition Noun-Gender Pairs
  • Government and Binding Theory
  • Principle A A reflexive pronoun must be bound by
    an antecedent in its governing category (Niyu Ge
    1994 4
  • Pattern matching
  • Eg. John explained himself.
  • Method for determining probability
  • On large amount of text, Count the number of
    times it binds with gender information

8
Parsed Corpus Frequencies
Dependency Relation
Noun-Pronoun pair
9
Web Frequencies
  • Out of Vocabulary Using Web(Frank Keller, EMNLP
    20025)
  • Count the number of pages retuned by Google

10
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

11
Gender Information Modeling(1/5) Maximum
likelihood
  • Counting for Maximum likelihood formulation
  • Five parsed corpus templates
  • Five web mining templates
  • Example
  • doctor-himself 224 doctor-herself 126
  • doctor-itself 0 doctor-themselves 14

12
Gender Information Modeling(2/5) Fault of
Maximum Likelihood
  • Two issues
  • Small counts will result in large probability
    swings.
  • (1,2) 30, (2,2) 50, (3,2) 60
  • Need a measure of how certain about the prob.
  • For example 3/(32) 300/(300200) 0.6
  • Solution
  • Each noun-pronoun pair of gender is treated as a
    separate event. It is followed by Beta
    distribution.

13
Gender Information Modeling(3/5) Beta
Distribution
14
Gender Information Modeling(4/5) Example of Beta
Distribution
  • Example1
  • Gretzky-his 4650
  • Gretzky-her 0
  • Gretzky-its 54
  • Gretzky-their 40
  • Betahis4651,95
  • Example2
  • Beta3,2
  • Beta300,200

15
Gender Information Modeling(5/5) Features and
Classifier
  • Feature vector for machine learning classifier
  • One to all, just like example1 in the above slide
  • u
  • Each gender attribute with a feature space and a
    separate classifier

16
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

17
Testing Gender Classification(1/3) Data Set
  • Gather gender info. AQUAINT corpus and Reuters
    corpus(6 gigabytes of text totally)
  • Training and testing data (manual labelled)
  • Third person pronoun-antecedent pairs in 118
    documents from the slate section of American
    National Corpus
  • Training set 1398 labeled pronouns in 79
    documents
  • Testing set 1381 labeled pronouns in 41
    documents
  • Masculine 24, Feminine 7.6, Neutral 33.8,
    Plural 34.6

18
Testing Gender Classification(2/3) Tool and
features
  • Machine Learning tool
  • SVMlight
  • Why such tool?
  • Efficient implementations
  • Easy for continuous-valued gender features
  • It had shown good performance on various machine
    learning tasks
  • Configuration
  • with a linear kernel
  • without normalization
  • Features
  • u and of Beta distribution each pattern has
    two features
  • Each gender attribute with a feature space and a
    separate classifier

19
Testing Gender Classification(3/3) Experiment
result
Masculine 24, Feminine 7.6, Neutral 33.8,
Plural 34.6
20
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

21
Pronoun Resolution with Enhanced Gender(1/5)
Framework
  • Purpose
  • Whether gender information can improve the
    performance of anaphora resolution.

he...Susan...she...
Searching Scope previous and current
sentence Motivation 97 antecedents in the
current or previous sentence
If fail, re-search backward by decreasing the
margin distance, loop so, until success
22
Pronoun Resolution with Enhanced Gender(2/5)
baseline
Rejecting match where the gender is known and it
does not agree with the pronoun.
Always choose the previous noun phrase
Rejecting match where the gender is known and it
does not agree with the pronoun or the
corresponding gender classifier gender.
he...Susan...she...
he.Susan. John Smith...she...
X
v
heSusan.the actressJohn Smith...she
...
X
23
Pronoun Resolution with Enhanced Gender(3/5)
Robust
  • A machine learning anaphora resolution system
  • On SVMlight
  • Construct positive and negative instances
  • Same as Soon and Ng, CL 20012

Positive 1251
..he.the actressJohn Smith.U.S.A...sh
e.
Negative 2909
24
Pronoun Resolution with Enhanced Gender(4/5)
Features Extraction
  • Features(44 totally)
  • Syntactic and Semantic features
  • Beta distribution of gender sources
  • Preprocess
  • Tokenizing
  • Parsing
  • Linking nouns (matching strings and sharing
    gender info)

25
Syntax
NE
Special
26
(No Transcript)
27
Pronoun Resolution with Enhanced Gender(5/5)
with and without gender
Christopher Kennedy, COLING 19966 75 on 306
anaphoric pronouns
Ruslan Mitkov, CICLING 20027 62
on 2263 anaphoric pronouns
28
Outline
  • Why do such research?
  • Related Works
  • Automatic Acquisition Noun-Gender Pairs
  • Parsed Corpus Frequencies
  • Web Frequencies
  • Gender Information Modeling
  • Testing Gender Classification
  • Pronoun Resolution with Enhanced Gender
  • Conclusion

29
Conclusion
Gender Information
Anaphora resolution
Improve
Best on broadest and accurate
Improvement
QA
First using web mining
Mining new features
Improvement
New approaches to employing the existing features
Improved parser Lager corpus
Growth of the WWW
Detect pleonastic pronouns
Resolve cataphora
30
References
  • Evans, Richard and Constantin Orasan. "Improving
    Anaphora Resolution By Identifying Animate
    Entities in Texts." DAARC2000.
  • Soon, Wee Meng, Hwee Tou Ng and Daniel Chung Yong
    Lim. "A Machine Learning Approach to Coreference
    Resolution of Noun Phrases." Computational
    Linguistics 27, No. 4 (2001) 521-44.
  • Natalia Modjeska, Katja Markert, Malvina Nissim
    and. "Using the Web for Nominal Anaphora
    Resolution." Paper presented at the EACL 2003
    Workshop on The Computational Treatment of
    Anaphora, Budapest, Hungary, 14 April 2003.
  • Niyu Ge, John Hale, and Eugene Charniak.
    Introduction to Government Binding theory
    Second Edition. Basil Blackwell, Cambridge, UK,
    1994.
  • Frank Keller, Maria Lapata, and Olga Ourioupina.
    Using the web to overcome data sparseness. In
    Proceedings of the Conference on Empirical
    Methods in Natural Language Processing, pages
    230-237, 2002.
  • Christopher Kennedy and Branimir Boguarev.
    "Anaphora for Everyone Pronominal Anaphora
    Resolution Without a Parser." Paper presented at
    the Proceedings of the 16th International
    Conference on Computational Linguistics (COLING
    96) 1996.
  • Ruslan Mitkov, Richard Evans and Constantin
    Orasan. "A New, Fully Automatic Version of
    Mitkov's Knowledge-Poor Pronoun Resolution
    Method." Paper presented at the Proceedings of
    the Third International Conference on Intelligent
    Text Processing and Computational Linguistics
    (CICLing-2002) 2002.

31
Reading Group
Thanks! Any Question?
  • Reporter Jun Lang
  • Date 2005-12-26
  • Email bill_lang_at_ir.hit.edu.cn
Write a Comment
User Comments (0)
About PowerShow.com