Title: Improving LSAbased Summarization with Anaphora Resolution
1Improving LSA-based Summarization with Anaphora
Resolution
- Josef SteinbergerMijail A. KabadjovMassimo
PoesioOlivia Sanchez-Graillet
HLT/EMNLP 2005 Vancouver, Canada
2Content
- Exploiting coherence for summarization
- Why anaphora resolution might help summarization
- LSA-based Summarization
- Anaphoric resolver GUITAR
- Combining lexical and anaphoric knowledge
- Evaluation results
- Conclusion and future work
3Exploiting coherence for summarization
- Lexical approaches
- Lexical relations used to identify central terms
(Barzilay and Elhadad, 1997 Gong and Liu, 2002) - Coreference-based approaches
- Identifying central terms by running coreference-
(anaphoric-) resolver over the text (Boguraev and
Kennedy, 1997 Baldwin and Morton, 1998 ) - A combination of both?
- Does adding the anaphoric information improve
summarization performance?
4Why anaphora resolution might help summarization
- PRIEST IS CHARGED WITH POPE ATTACH
- A Spanish priest was charged here today with
attempting to murder the Pope. Juan Fernandez
Krohn, aged 32, was arrested after a man armed
with a bayonet approached the Pope while he was
saying prayers at Fatima on Wednesday night.
According to the police, Fernandez told the
investigators today that he trained for the past
six months for the assault. . . . If found
guilty, the Spaniard faces a prison sentence of
15-20 years. - (Boguraev and Kennedy, 1997)
5Latent Semantic Analysis (LSA)
- Technique for extracting hidden dimensions of the
semantic representation of terms, sentences, or
documents, on the basis of their contextual use
(Landauer, 1997) - Used in various NLP applications (Information
retrieval Berry et al., 1995 text segmentation
Choi et al., 2001) - Gong and Liu (2002) - first LSA-based
summarization approach
6Singular Value Decomposition
7LSA-based Summarization
- Gong and Liu
- for each row in VT (topic), choose the sentence
with the highest value (best description of the
topic) - Our approach
- Compute the length of each sentence vector in
matrix S.VT - Dimensionality reduction level (r) is learned
from the data (take dimension i if singmax /
singi lt threshold)
8GuiTAR
- Stands for General Tool for Anaphora Resolution
- O-O architecture
- XML in/ XML out
- Version (2.1) resolves
- Definite Descriptions (Vieira and Poesio, 2000)
- Includes Discourse-new classifier (Poesio, et.
al., 2005) - Personal Pronouns (Mitkov, 1998)
- Possessive Pronouns (adapted Mitkovs algorithm)
9Combining lexical and anaphoric knowledge
Substitution method
- GuiTAR as pre-processor
- Example
- S If we dont do it now, Australia is going to
be in deficit and debt into the next century. - S If Australia dont do spending cuts now,
Australia is going to be in deficit and debt into
the next century.
10Combining lexical and anaphoric knowledge
Addition method
- Modifying the source SVD matrix
11Evaluation
- 37 files from CAST corpus of manually produced
summaries (Orasan et. al., 2003) - Anaphoric relations annotation
- Parsed the corpus with Charniaks parser (2000)
- Annotated with MMAX (Mueller and Strube, 2003)
- Evaluation Measures
- Relative Utility (Radev et. al., 2000)
- Cosine Similarity
- F-score
- Main Topic Similarity (Steinberger and Jezek,
2004)
12Evaluation Upper bound
Not significant - significant (by t-test at 95
confidence)
13Evaluation AR Performance
- Anaphora resolution performance of GuiTAR v2.1
14Evaluation GuiTAR improvement
Not significant - significant (by t-test at 95
confidence)
15Conclusion and Future Work
- Anaphoric information leads to significant
improvement on summarization performance - Results suggest the better the AR performance,
the greater the improvement - Next steps
- Evaluate summarizer with GuiTAR v3.0 (PN)
- Evaluate summarizer on DUC 2002 data
preliminary results rank our summarizer 3rd from
15 systems (measured by ROUGE) - Explore different weighting scheme (i.e., giving
anaphors higher score)