Title: Summarization
1Summarization
- Ryan Davies
- Laura Emond
- CSI5386, NLP
- March 16, 2005
2Outline
- Introduction
- Summary of the articles
- Demo
- Microsoft Autosummarize
- MEAD
3Introduction
- Why summarize?
- Thousands of articles, too much to read,
summaries help selectivity - To be concise get just the most crucial
information - Types of summaries extract, summary, abstract,
abridgement, précis, digest, highlight, synopsis
4Articles
- Our articles are from this journal
- Computational Linguistics, Volume 28, Number 4,
December 2002 - Articles
- Dragomir R. Radev Eduard Hovy Kathleen
McKeownIntroduction to the Special Issue on
Summarization - Simone Teufel Marc MoensArticles Summarizing
Scientific Articles Experiments with Relevance
and Rhetorical Status - Klaus ZechnerAutomatic Summarization of
Open-Domain Multiparty Dialogues in Diverse
Genres - Horacio Saggion Guy LapalmeGenerating
Indicative-Informative Summaries with SumUM
5Introduction to the Special Issue on Summarization
- Driver Increase of online information
- Present the main ideas in less space
- Could not easily summarize a document in which
everything was equally important - Try to keep it as informative as possible
- Information content appears in bursts
- Indicative summary basically keywords
- Informative summary reproduces the content
6Definition
- Summary
- Produced from 1 texts
- lt 50 of the original
- Conveys most important information
7Processes
- Extraction identify important material
- Abstraction reformulate (novel)
- Fusion combine extractions
- Compression get rid of peripheral parts
- Early approaches from IR, now NLP
8Approaches
- 1. Single document (extraction)
- Take sentences from original document
- Surface level (signals)
- Scoring key phrases, frequency, position
- Now ML, NLP for key passages, relations between
words instead of bags of words - Word relatedness, discourse structure
- Connected words (anaphora, synonyms, shared
words) - Topic / theme
9Approaches
- 2. Single document (abstraction)
- Abstraction encompasses any method that does not
rely strictly on extraction - Information Extraction
- Compressive Summarization
- Ontological Abstraction
10Information Extraction
- Designer specifies slots to be filled by
certain information - eg. Earthquake description must include date,
location, severity
11Compressive Summarization
- Borrows from language generation. Two
approaches - Words are extracted from the document, and
re-formed using a bigram language model. - Sentences are selected, combined and then reduced
by dropping the least important fragments.
(Similar to how humans cut and paste).
12Ontological Abstraction
- Leaves plenty of room for innovation.
- Relies on an external knowledge base to recognize
new information in the document. - Seems to be less NLP and more AI. Not feasible
until complete knowledge base is available.
13Approaches
- 3. Multiple document summarization
- Similar to single-document summarization. Fills
in pre-determined slot values to construct a
briefing (big picture / whole story / information
synthesis). - Uses Biographies, describe events (hurricane)
- Differences are the need to
- Avoid redundancy (measure similarity between
sentence pairs) - Identify differences (discourse rules)
- nsure summary coherence (time order of articles,
text order within articles, time stamps)
14Evaluation
- Humans agree just 60 of the time for sentence
content even for straight-forward news articles - Better results for short summaries (people
probably agree on the objective keystone data but
not the supporting evidence which is more
subjective) - Metrics
- Form (grammar, coherence, organization)
- Content (compare to human abstracts)
- Extraneous information ? precision
- Omitted ? recall
- Tasks categorization, question answering, ad hoc
15State of the art
- Simple sentence extraction is evolving to become
a process involving extracting, merging and
editing phrases - Analysis of simple news articles ? analysis of
longer documents (scientific articles, medical
journal, patents) Teufel Moens, Saggion
Lapalme - Analysis of written text ? speech Zechner
- Novel System development to automate the
summarization process Saggion and Lapalme
16(No Transcript)
17Summarizing Scientific Articles Experiments with
Relevance and Rhetorical Status
- Simone Teufel Marc Moens
- Contribution Focus on Scientific Articles
- Scientific articles dissimilar from news
articles, etc - First paragraph not usually a good summary
- A given sentence may be an important result, or a
criticism of a previous result
18Basic Method
- Authors believe that sentence (or clause)
extraction is currently still the best basic
method in most cases - Need to recognize scientifically significant
information from supporting data
19Identification Approach
- To identify candidate sentences, authors suggest
classification by rhetorical status - Rhetorical status categorization of sentences
based on their purpose in the document - AIM, TEXTUAL, OWN, BACKGROUND, CONTRAST, BASIS,
OTHER
20Classification Tools Used
- Traditional text extraction characteristics
- Metadiscourse Agentivity
- Citations Relatedness
21Traditional Characteristics
- Feature-based attribution
22Feature-Based Attribution (1)
- Features mostly drawn from already-developed
summarization techniques. - Particular features chosen (and weighted)
informally by intuition and trial-and-error, ie.
it just works.
23Feature-Based Attribution (2)
- Some features, evaluated individually, give poor
results - ie. some are even worse than chance,
- or kappa lt 0
- where -1 means always wrong,
- 1 means always right,
- and 0 means equivalent to random sentence
classification) - However, removing any one feature degrades the
overall performance of the system, even if that
feature performs poorly individually.
24Metadiscourse Agentivity
- Metadiscourse explicit phrases which indicate
the purpose of the statement (they argue that,
we conclude that, agree, suggest) - List generated manually, again by intuition and
by examining the corpus. - Agentivity we, they, our
- Superior anaphora resolution techniques would
help here.
25Citation Relatedness
- Need to recognize the context of citations
- Negative critical, contrastive
- Positive basis for current work
- Neutral
26More Rhetorical Status Indicators
- Problem structure (problems, solutions, results)
texts do not always make this clear, but if
they do this info can be useful - Intellectual attribution (other researchers
claim that, we have discovered that) uses
metadiscourse heavily - Scientific argumentation (progression of
rhetorically-coherent statements that convey the
scientific contribution) related to problem
structure - Attitude toward others work (rival, flawed,
contributing) often explicit
27Relevance (1)
- Authors also suggest ordering statements by
relevance - Relevance importance of the statement to the
meaning of the document
28Relevance (2)
- Many methods used
- Most follow the pattern of matching certain words
or phrases to a manually-generated list of
(dis)qualifiers - Eg. Some action phrases disqualify (argue,
intend, lacks) - Eg. Some agent phrases qualify or disqualify
sentences (we qualifies, they usually
disqualifies) - Negation is also considered
29Summarization Process (1)
- Set of relevant sentences is considered
- Humans followed this decision tree for
categorizing sentences (for learning stage and
gold standard)
30Summarization Process (2)
- Sentences in general were found to have the
following category distribution (top) - Relevant sentences were more evenly distributed
(bottom)
31Summarization Process (3)
- End-result summary focus is on AIM, CONTRAST, and
BASIS sentences. - Sentences from other categories considered as
well. - Decision to include sentences based on relevance
score.
32Evaluation
- Authors evaluated the systems two main
components individually - Categorization
- Relevance determination
33Categorization Performance (1)
- Used F-measure as an indicator of accuracy
(performance) - F 2PR/PR
- Pprecision, Rrecall
34Categorization Performance (2)
- Human performance and stability was fairly high
in this task - The performance of the system was below human
performance, but far higher than TFIDF
text-extraction baseline.
35Categorization Performance (3)
- Authors also examined which features had the most
impact on performance. - Some features greatly impacted disambiguation of
some categories, while performing worse than
chance for others - Regardless, removing any one feature from the
pool consistently decreased overall performance.
36Relevance Det. Performance (1)
- The system performed well when only the three
categories AIM, CONTRAST, and BASIS are
considered. - Recall 0.44 average
- Precision 0.79 average
- Once other categories, such as BACKGROUND, are
considered, recall and precision plummet quickly,
but remain much higher than the baseline
performance.
37Conclusion (1)
- The authors of this paper made two major
contributions to this area of the science - They applied rhetorical status information to
scientific articles, and - They chose a set of classification features which
produces higher performance on scientific
articles than generic sets.
38Conclusion (2)
- Areas of future improvement identified in the
article - Automatic gathering of metadiscourse features
- Using a more sophisticated statistical classifier
- Improving anaphora resolution for the agent
feature
39(No Transcript)
40Generating Indicative-Informative Summaries with
SumUM
- Horacio Saggion, Guy Lapalme (Sheffield, U of
Montreal) - Software that does technical indicative
informative summaries - Indicative identifies topics
- Informative elaborates on these topics
(qualitative and quantitative) - Purpose give the reader an exact and concise
idea of what is in the source - Dynamic summarization
- Shallow syntactic and semantic analysis
- Concept identification
- Text regeneration
41Steps
- Interpret the text
- Extract relevant information (topics)
- Condense and construct summary
- Present summary in NL
- Coherent selection and expression not solved yet
42Selective Analysis
- Selective analysis process of conceptual
identification and text re-generation - Imitates how humans write abstracts.
- Indicative selection The indicative terms are
used to find concepts, definitions, etc. - Informative selection Looks for informative
marker and matches a pattern. - Indicative generation Put in conceptual order,
merge, put in one paragraph. - Informative generation Provides more information
to the reader by filling in more templates.
43Template
- Shows what SumUM looks for to construct the
indicative abstract
44Pattern Matching
45Methodology
- Corpus had 100 document abstract pairs from
computer and information science journals, which
were studied. - Studied how professional abstracts are written.
- Mapped location of abstract sentences to the
document manually (using photocopies). - Interpreted sentences with 334 FSTs representing
linguistic and domain specific patterns.
46Conceptual Information in Technical Documents
- Some generic (author, research institution,
date) - Some discipline specific (algorithms in CS,
treatment in medicine) - Identified and classified with thesauri
- 55 concepts (research activity, article)
- 39 relations (studying, reporting, thinking)
- 52 types of information (background,
elaboration) - Extract the information ? sort ? edit (language
understanding and production like deduction,
generalization, paraphrase) - Remove peripheral linguistics (it appears that ?
apparently), concatenate, truncate, delete
phrases, etc. - Impersonalize (syntactic verb transformation)
47Conceptual Information
48Transformations in human generated summaries
- Domain verbs (40), noun editing (38),
merge/split (38), complex reformulation (23),
no changes (11) - 70 from intro, conclusion, title, captions
- 89 of sentences were edited
49SumUM Architecture
- Steps
- Text segmentation
- P.O.S. tagging
- Partial syntactic and semantic analysis
- Sentence classification
- Template instantiation
- Content selection
- Text regeneration
- Topic elaboration
Source Saggion, Horacio.
50Evaluation
- Compared SumUM to n-STEIN, Microsoft
AutoSummarize - Based on content and quality
- Co-selection of sentences to include was low
(37) between humans, so its very subjective, no
ideal exists - Evaluation can be extrinsic or intrinsic
- Extrinsic how well it helps to perform a task
- Intrinsic comparison with source document how
many main ideas does it capture? - Parsing
- Recall measures the of correct syntactic
constructions identified by algorithm compared to
the amount existing overall - Precision ratio of the of correct syntactic
constructions to the total number of constructions
51Future
- Anaphora resolution
- Lexical cohesion (elaboration on topics)
- Local discourse analysis (coherence)
- SumUM does not demonstrate intelligent behavior
(question answering, paraphrase, anaphora
resolution) - Currently ignores enumerations
- Currently overlooks paragraph structure
52(No Transcript)
53Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres
54Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres
- Additional challenges exist due to informality.
- The following are addressed
- coping with speech disfluencies
- identifying the units for extraction
- maintaining cross-speaker coherence
55Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres
- Some issues not addressed
- Topic segmentation
- Anaphora resolution
- Discourse structure detection
- Speech recognition error compensation
- Prosodic information integration
56Process
- Pre-process the speech to a transcript
- Disfluency detection
- Sentence-boundary detection
- Distributed information tagging
- Then text-based summarization approaches are
applied
57Disfluency Detection
- Disfluencies decrease readability and conciseness
of summaries - Goal is to tag the disfluencies so that the final
summarization step can ignore them
58Sentence Boundary Detection
- Most speech recognition processors split the
transcript into segments based on pauses in the
flow of sound - Some verbal sentences can contain pauses
internally - while several sentences could run together
without pause between them - Goal is to recognize both cases
59Distributed Information
- Most common case question-answer pairs.
- Goal is to mark such groups as inseparable to the
final-stage summarizer.
60Methodology
- Corpus
- As the corpora, it used 8 recorded dialogues
(including news shows which are formal dialogue
and recordings of phone calls and meetings which
are informal dialogue), which were transcribed
and tagged. - Used Penn TreeBank for training.
- It was then annotated.
61Annotations
- Annotations
- Annotation done by humans nucleus ideas and
satellite ideas done independently, then
collaboratively - Gold standard look at topic boundaries
determined by the majority of annotators - Disfluencies were also annotated (non-lexical um
uh and lexical like, you know) repetitions
(insertion, substitution, repetition),
interruptions - Detected with 3 things POS tagger, false-start
(abandoned, incomplete clause 10 of speech) and
repetition filter - Questions were annotated according to type Wh- or
yes/no - Back channel (is that right? Really?) and
rhetorical Qs not included
62Tokenization
- Tokenization
- Removal of noises (human and nonhuman)
- Expand contractions
- Eliminate information on case and punctuation
- Removal of stop words (using SMART, and a list of
closed class words) - Truncation (stemming)
63Tagging speech
- Components
- POS tagger (Brill, rules-based)
- Repetition finder
- Decision tree (false starts)
- Sentence tagging
- Complete
- Noncomplete
- Turn tagging
- Performance very good for sentence boundaries at
the end of turns - Question-answer pairs
- Improves fluency significantly in the summary if
these are identified - More informative
- More coherent
64Decision Making
- Weighting
- Compute vectors for each sentence and the topical
segment. Compare. Those with most similarities
get promoted. - Uses Maximum Marginal Relevance algorithm to find
highly weighted sentences and prevent redundancy
so it is maximally similar to the segment and
maximally dissimilar to other sentences being
included in the summary. - Emphasis Factors
- Lead emphasis (for stuff at beginning)
- Q-A emphasis
- False start de-emphasis
- Speaker emphasis
- Q-A linking
- Ensures that the answer immediately follows the
question - Can do clean or direct from transcript summaries
65Evaluation
- Components evaluated individually
- Two baselines
- LEAD (first n sentences)
- MMR (maximum marginal relevance)
66Eval Disfluencies Boundaries
- Each component individually increased the
relevance vector (as compared to the gold
standard) consistently - Both components combined also consistently had a
better performance effect than each one
individually - They had more effect on less-formal copora
(10-15 vs. 5)
67Eval Distributed Info
- This component only had a significant positive
effect on two of the corpora, each of which
contained many question-answer pairs. - Relevance vectors largely unaffected
- Summary coherence is significantly improved
whenever question-answer pairs are present
(unmeasurable quantitatively)
68Conclusion
- System with all components applied consistently
performs better than LEAD and MMR baselines
(except in news corpus) - System makes most significant improvement in the
informal conversation corpora
69Related Work
- Restricted-domain dialogue summarization
- Eg. Spoken news
- Prosody-based emphasis detection
- Authors own previous work
70(No Transcript)
71Demo of TS Systems
- Microsofts AutoSummarize
- MEAD
- www.newsinessence.com
72MEAD Demo
- An example of statistically motivated text
summarization. Based on sentence extraction. - Written in Perl, uses XML.
- Uses centroid, position, and overlap with first
sentence (often this is the title) - Does multi-document summarization
- Computational Linguistics And Information
Retrieval (CLAIR) at U of Michigan - Led by one of the computer scientists that wrote
the introduction to the Computational Linguistics
journal we studied - http//tangra.si.umich.edu/clair/md/demo.cgi
73Screenshot
http//articles.health.msn.com/id/100101065?GT163
05
74NewsInEssence a deployment of MEAD
- Interactive Multi-source News Summarization
- system for finding and summarizing clusters of
related news articles from multiple sources on
the Web - 2001, Dragomir Radev
- NIE can start from a URL and retrieve documents
that are similar, or NIE can retrieve documents
that match a given - set of keywords
- www.newsinessence.com
75NIE
- Creating Cluster keywords or use a seed article
as input - Sources Chicago Suntimes, Globe and Mail,
Guardian, International Herald Tribune, Newsday,
Reuters, San Francisco Chronicle, Seattle Post
Intelligencer, The Boston Herald
76NIE
It picks up a lot of page navigation features
since they are prominently located
77NIE 10 summary
Maybe itd do better if it ignored short phrases
Science, Travel it wastes its quota
78Questions??
79References
- ACL Anthology A Digital Archive of Research
Papers in Computational Linguistics.
Computational Linguistics, Volume 28, Number 4,
December 2002. http//acl.ldc.upenn.edu/J/J02/ - CLAIR. News in Essence. lthttp//www.newsinesse
nce.comgt - NewsInEssence. About. lthttp//lada.si.umich.edu
8080/clair/nie1/docs/about.htmlgt - NewsInEssence. Help. lthttp//lada.si.umich.edu8
080/clair/nie1/docs/help.htmlgt - Pullum, Geoff. Geoff Pullum's Six Golden Rules
of giving an academic presentation.
lthttp//people.ucsc.edu/pullum/goldenrules.htmlgt - Radev, Dragomir Hovy, Eduard McKeown, Kathleen.
Introduction to the Special Issue on
Summarization. - Computational Linguistics, Volume 28, Number 4,
December 2002. - Radev, Dragomir. BlairGoldensohn, Sasha Zhang,
Zhu. Experiments in Single and Multi Document
Summarization Using MEAD. lthttp//tangra.si.umich
.edu/radev/papers/duc01.pdfgt - Radev, Dragomir et al. MEAD Documentation
- v3.07. Nov. 2002
80References (2)
- Simone Teufel Marc Moens.Articles Summarizing
Scientific Articles Experiments with Relevance
and Rhetorical Status Computational Linguistics,
Volume 28, Number 4, December 2002. Pg. 409-445 - Text Summarization Project lthttp//www.site.uottaw
a.ca/tanka/ts.htmlgt - Saggion, Horacio Lapalme, Guy.Generating
Indicative-Informative Summaries with SumUM . - Computational Linguistics, Volume 28, Number 4,
December 2002. Pg. 497-526. - Saggion, Horacio. Génération automatique de
résumes par analyse sélectif. lthttp//www.dcs.she
f.ac.uk/saggion/TheThesis.psgt - Saggion, Horacio. SumUM Summarization at the
Université de Montréal lthttp//www.dcs.shef.ac.uk
/saggion/sumumweb01.htmlgt - Zechner, Klaus. Automatic Summarization of
Open-Domain Multiparty Dialogues in Diverse
Genres Computational Linguistics, Volume 28,
Number 4, December 2002. Pg. 447-485.