Summarization

About This Presentation

Title:

Summarization

Description:

Thousands of articles, too much to read, summaries help selectivity ... Uses: Biographies, describe events (hurricane) Differences are the need to: ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 81

Provided by: SITE71

Category:

more less

Transcript and Presenter's Notes

Title: Summarization

1
Summarization

Ryan Davies
Laura Emond
CSI5386, NLP
March 16, 2005

2
Outline

Introduction
Summary of the articles
Demo
Microsoft Autosummarize
MEAD

3
Introduction

Why summarize?
Thousands of articles, too much to read,
summaries help selectivity
To be concise get just the most crucial
information
Types of summaries extract, summary, abstract,
abridgement, précis, digest, highlight, synopsis

4
Articles

Our articles are from this journal
Computational Linguistics, Volume 28, Number 4,
December 2002
Articles
Dragomir R. Radev Eduard Hovy Kathleen
McKeownIntroduction to the Special Issue on
Summarization
Simone Teufel Marc MoensArticles Summarizing
Scientific Articles Experiments with Relevance
and Rhetorical Status
Klaus ZechnerAutomatic Summarization of
Open-Domain Multiparty Dialogues in Diverse
Genres
Horacio Saggion Guy LapalmeGenerating
Indicative-Informative Summaries with SumUM

5
Introduction to the Special Issue on Summarization

Driver Increase of online information
Present the main ideas in less space
Could not easily summarize a document in which
everything was equally important
Try to keep it as informative as possible
Information content appears in bursts
Indicative summary basically keywords
Informative summary reproduces the content

6
Definition

Summary
Produced from 1 texts
lt 50 of the original
Conveys most important information

7
Processes

Extraction identify important material
Abstraction reformulate (novel)
Fusion combine extractions
Compression get rid of peripheral parts
Early approaches from IR, now NLP

8
Approaches

1. Single document (extraction)
Take sentences from original document
Surface level (signals)
Scoring key phrases, frequency, position
Now ML, NLP for key passages, relations between
words instead of bags of words
Word relatedness, discourse structure
Connected words (anaphora, synonyms, shared
words)
Topic / theme

9
Approaches

2. Single document (abstraction)
Abstraction encompasses any method that does not
rely strictly on extraction
Information Extraction
Compressive Summarization
Ontological Abstraction

10
Information Extraction

Designer specifies slots to be filled by
certain information
eg. Earthquake description must include date,
location, severity

11
Compressive Summarization

Borrows from language generation. Two
approaches
Words are extracted from the document, and
re-formed using a bigram language model.
Sentences are selected, combined and then reduced
by dropping the least important fragments.
(Similar to how humans cut and paste).

12
Ontological Abstraction

Leaves plenty of room for innovation.
Relies on an external knowledge base to recognize
new information in the document.
Seems to be less NLP and more AI. Not feasible
until complete knowledge base is available.

13
Approaches

3. Multiple document summarization
Similar to single-document summarization. Fills
in pre-determined slot values to construct a
briefing (big picture / whole story / information
synthesis).
Uses Biographies, describe events (hurricane)
Differences are the need to
Avoid redundancy (measure similarity between
sentence pairs)
Identify differences (discourse rules)
nsure summary coherence (time order of articles,
text order within articles, time stamps)

14
Evaluation

Humans agree just 60 of the time for sentence
content even for straight-forward news articles
Better results for short summaries (people
probably agree on the objective keystone data but
not the supporting evidence which is more
subjective)
Metrics
Form (grammar, coherence, organization)
Content (compare to human abstracts)
Extraneous information ? precision
Omitted ? recall
Tasks categorization, question answering, ad hoc

15
State of the art

Simple sentence extraction is evolving to become
a process involving extracting, merging and
editing phrases
Analysis of simple news articles ? analysis of
longer documents (scientific articles, medical
journal, patents) Teufel Moens, Saggion
Lapalme
Analysis of written text ? speech Zechner
Novel System development to automate the
summarization process Saggion and Lapalme

16
(No Transcript)
17
Summarizing Scientific Articles Experiments with
Relevance and Rhetorical Status

Simone Teufel Marc Moens
Contribution Focus on Scientific Articles
Scientific articles dissimilar from news
articles, etc
First paragraph not usually a good summary
A given sentence may be an important result, or a
criticism of a previous result

18
Basic Method

Authors believe that sentence (or clause)
extraction is currently still the best basic
method in most cases
Need to recognize scientifically significant
information from supporting data

19
Identification Approach

To identify candidate sentences, authors suggest
classification by rhetorical status
Rhetorical status categorization of sentences
based on their purpose in the document
AIM, TEXTUAL, OWN, BACKGROUND, CONTRAST, BASIS,
OTHER

20
Classification Tools Used

Traditional text extraction characteristics
Metadiscourse Agentivity
Citations Relatedness

21
Traditional Characteristics

Feature-based attribution

22
Feature-Based Attribution (1)

Features mostly drawn from already-developed
summarization techniques.
Particular features chosen (and weighted)
informally by intuition and trial-and-error, ie.
it just works.

23
Feature-Based Attribution (2)

Some features, evaluated individually, give poor
results
ie. some are even worse than chance,
or kappa lt 0
where -1 means always wrong,
1 means always right,
and 0 means equivalent to random sentence
classification)
However, removing any one feature degrades the
overall performance of the system, even if that
feature performs poorly individually.

24
Metadiscourse Agentivity

Metadiscourse explicit phrases which indicate
the purpose of the statement (they argue that,
we conclude that, agree, suggest)
List generated manually, again by intuition and
by examining the corpus.
Agentivity we, they, our
Superior anaphora resolution techniques would
help here.

25
Citation Relatedness

Need to recognize the context of citations
Negative critical, contrastive
Positive basis for current work
Neutral

26
More Rhetorical Status Indicators

Problem structure (problems, solutions, results)
texts do not always make this clear, but if
they do this info can be useful
Intellectual attribution (other researchers
claim that, we have discovered that) uses
metadiscourse heavily
Scientific argumentation (progression of
rhetorically-coherent statements that convey the
scientific contribution) related to problem
structure
Attitude toward others work (rival, flawed,
contributing) often explicit

27
Relevance (1)

Authors also suggest ordering statements by
relevance
Relevance importance of the statement to the
meaning of the document

28
Relevance (2)

Many methods used
Most follow the pattern of matching certain words
or phrases to a manually-generated list of
(dis)qualifiers
Eg. Some action phrases disqualify (argue,
intend, lacks)
Eg. Some agent phrases qualify or disqualify
sentences (we qualifies, they usually
disqualifies)
Negation is also considered

29
Summarization Process (1)

Set of relevant sentences is considered

Humans followed this decision tree for
categorizing sentences (for learning stage and
gold standard)

30
Summarization Process (2)

Sentences in general were found to have the
following category distribution (top)
Relevant sentences were more evenly distributed
(bottom)

31
Summarization Process (3)

End-result summary focus is on AIM, CONTRAST, and
BASIS sentences.
Sentences from other categories considered as
well.
Decision to include sentences based on relevance
score.

32
Evaluation

Authors evaluated the systems two main
components individually
Categorization
Relevance determination

33
Categorization Performance (1)

Used F-measure as an indicator of accuracy
(performance)
F 2PR/PR
Pprecision, Rrecall

34
Categorization Performance (2)

Human performance and stability was fairly high
in this task
The performance of the system was below human
performance, but far higher than TFIDF
text-extraction baseline.

35
Categorization Performance (3)

Authors also examined which features had the most
impact on performance.
Some features greatly impacted disambiguation of
some categories, while performing worse than
chance for others
Regardless, removing any one feature from the
pool consistently decreased overall performance.

36
Relevance Det. Performance (1)

The system performed well when only the three
categories AIM, CONTRAST, and BASIS are
considered.
Recall 0.44 average
Precision 0.79 average
Once other categories, such as BACKGROUND, are
considered, recall and precision plummet quickly,
but remain much higher than the baseline
performance.

37
Conclusion (1)

The authors of this paper made two major
contributions to this area of the science
They applied rhetorical status information to
scientific articles, and
They chose a set of classification features which
produces higher performance on scientific
articles than generic sets.

38
Conclusion (2)

Areas of future improvement identified in the
article
Automatic gathering of metadiscourse features
Using a more sophisticated statistical classifier
Improving anaphora resolution for the agent
feature

39
(No Transcript)
40
Generating Indicative-Informative Summaries with
SumUM

Horacio Saggion, Guy Lapalme (Sheffield, U of
Montreal)
Software that does technical indicative
informative summaries
Indicative identifies topics
Informative elaborates on these topics
(qualitative and quantitative)
Purpose give the reader an exact and concise
idea of what is in the source
Dynamic summarization
Shallow syntactic and semantic analysis
Concept identification
Text regeneration

41
Steps

Interpret the text
Extract relevant information (topics)
Condense and construct summary
Present summary in NL
Coherent selection and expression not solved yet

42
Selective Analysis

Selective analysis process of conceptual
identification and text re-generation
Imitates how humans write abstracts.
Indicative selection The indicative terms are
used to find concepts, definitions, etc.
Informative selection Looks for informative
marker and matches a pattern.
Indicative generation Put in conceptual order,
merge, put in one paragraph.
Informative generation Provides more information
to the reader by filling in more templates.

43
Template

Shows what SumUM looks for to construct the
indicative abstract

44
Pattern Matching

174 patterns exist

45
Methodology

Corpus had 100 document abstract pairs from
computer and information science journals, which
were studied.
Studied how professional abstracts are written.
Mapped location of abstract sentences to the
document manually (using photocopies).
Interpreted sentences with 334 FSTs representing
linguistic and domain specific patterns.

46
Conceptual Information in Technical Documents

Some generic (author, research institution,
date)
Some discipline specific (algorithms in CS,
treatment in medicine)
Identified and classified with thesauri
55 concepts (research activity, article)
39 relations (studying, reporting, thinking)
52 types of information (background,
elaboration)
Extract the information ? sort ? edit (language
understanding and production like deduction,
generalization, paraphrase)
Remove peripheral linguistics (it appears that ?
apparently), concatenate, truncate, delete
phrases, etc.
Impersonalize (syntactic verb transformation)

47
Conceptual Information
48
Transformations in human generated summaries

Domain verbs (40), noun editing (38),
merge/split (38), complex reformulation (23),
no changes (11)
70 from intro, conclusion, title, captions
89 of sentences were edited

49
SumUM Architecture

Steps
Text segmentation
P.O.S. tagging
Partial syntactic and semantic analysis
Sentence classification
Template instantiation
Content selection
Text regeneration
Topic elaboration

Source Saggion, Horacio.
50
Evaluation

Compared SumUM to n-STEIN, Microsoft
AutoSummarize
Based on content and quality
Co-selection of sentences to include was low
(37) between humans, so its very subjective, no
ideal exists
Evaluation can be extrinsic or intrinsic
Extrinsic how well it helps to perform a task
Intrinsic comparison with source document how
many main ideas does it capture?
Parsing
Recall measures the of correct syntactic
constructions identified by algorithm compared to
the amount existing overall
Precision ratio of the of correct syntactic
constructions to the total number of constructions

51
Future

Anaphora resolution
Lexical cohesion (elaboration on topics)
Local discourse analysis (coherence)
SumUM does not demonstrate intelligent behavior
(question answering, paraphrase, anaphora
resolution)
Currently ignores enumerations
Currently overlooks paragraph structure

52
(No Transcript)
53
Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres
54
Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres

Additional challenges exist due to informality.
The following are addressed
coping with speech disfluencies
identifying the units for extraction
maintaining cross-speaker coherence

55
Automatic Summarization of Open-Domain Multiparty
Dialogues in Diverse Genres

Some issues not addressed
Topic segmentation
Anaphora resolution
Discourse structure detection
Speech recognition error compensation
Prosodic information integration

56
Process

Pre-process the speech to a transcript
Disfluency detection
Sentence-boundary detection
Distributed information tagging
Then text-based summarization approaches are
applied

57
Disfluency Detection

Disfluencies decrease readability and conciseness
of summaries
Goal is to tag the disfluencies so that the final
summarization step can ignore them

58
Sentence Boundary Detection

Most speech recognition processors split the
transcript into segments based on pauses in the
flow of sound
Some verbal sentences can contain pauses
internally
while several sentences could run together
without pause between them
Goal is to recognize both cases

59
Distributed Information

Most common case question-answer pairs.
Goal is to mark such groups as inseparable to the
final-stage summarizer.

60
Methodology

Corpus
As the corpora, it used 8 recorded dialogues
(including news shows which are formal dialogue
and recordings of phone calls and meetings which
are informal dialogue), which were transcribed
and tagged.
Used Penn TreeBank for training.
It was then annotated.

61
Annotations

Annotations
Annotation done by humans nucleus ideas and
satellite ideas done independently, then
collaboratively
Gold standard look at topic boundaries
determined by the majority of annotators
Disfluencies were also annotated (non-lexical um
uh and lexical like, you know) repetitions
(insertion, substitution, repetition),
interruptions
Detected with 3 things POS tagger, false-start
(abandoned, incomplete clause 10 of speech) and
repetition filter
Questions were annotated according to type Wh- or
yes/no
Back channel (is that right? Really?) and
rhetorical Qs not included

62
Tokenization

Tokenization
Removal of noises (human and nonhuman)
Expand contractions
Eliminate information on case and punctuation
Removal of stop words (using SMART, and a list of
closed class words)
Truncation (stemming)

63
Tagging speech

Components
POS tagger (Brill, rules-based)
Repetition finder
Decision tree (false starts)
Sentence tagging
Complete
Noncomplete
Turn tagging
Performance very good for sentence boundaries at
the end of turns
Question-answer pairs
Improves fluency significantly in the summary if
these are identified
More informative
More coherent

64
Decision Making

Weighting
Compute vectors for each sentence and the topical
segment. Compare. Those with most similarities
get promoted.
Uses Maximum Marginal Relevance algorithm to find
highly weighted sentences and prevent redundancy
so it is maximally similar to the segment and
maximally dissimilar to other sentences being
included in the summary.
Emphasis Factors
Lead emphasis (for stuff at beginning)
Q-A emphasis
False start de-emphasis
Speaker emphasis
Q-A linking
Ensures that the answer immediately follows the
question
Can do clean or direct from transcript summaries

65
Evaluation

Components evaluated individually
Two baselines
LEAD (first n sentences)
MMR (maximum marginal relevance)

66
Eval Disfluencies Boundaries

Each component individually increased the
relevance vector (as compared to the gold
standard) consistently
Both components combined also consistently had a
better performance effect than each one
individually
They had more effect on less-formal copora
(10-15 vs. 5)

67
Eval Distributed Info

This component only had a significant positive
effect on two of the corpora, each of which
contained many question-answer pairs.
Relevance vectors largely unaffected
Summary coherence is significantly improved
whenever question-answer pairs are present
(unmeasurable quantitatively)

68
Conclusion

System with all components applied consistently
performs better than LEAD and MMR baselines
(except in news corpus)
System makes most significant improvement in the
informal conversation corpora

69
Related Work

Restricted-domain dialogue summarization
Eg. Spoken news
Prosody-based emphasis detection
Authors own previous work

70
(No Transcript)
71
Demo of TS Systems

Microsofts AutoSummarize
MEAD
www.newsinessence.com

72
MEAD Demo

An example of statistically motivated text
summarization. Based on sentence extraction.
Written in Perl, uses XML.
Uses centroid, position, and overlap with first
sentence (often this is the title)
Does multi-document summarization
Computational Linguistics And Information
Retrieval (CLAIR) at U of Michigan
Led by one of the computer scientists that wrote
the introduction to the Computational Linguistics
journal we studied
http//tangra.si.umich.edu/clair/md/demo.cgi

73
Screenshot
http//articles.health.msn.com/id/100101065?GT163
05
74
NewsInEssence a deployment of MEAD

Interactive Multi-source News Summarization
system for finding and summarizing clusters of
related news articles from multiple sources on
the Web
2001, Dragomir Radev
NIE can start from a URL and retrieve documents
that are similar, or NIE can retrieve documents
that match a given
set of keywords
www.newsinessence.com

75
NIE

Creating Cluster keywords or use a seed article
as input
Sources Chicago Suntimes, Globe and Mail,
Guardian, International Herald Tribune, Newsday,
Reuters, San Francisco Chronicle, Seattle Post
Intelligencer, The Boston Herald

76
NIE
It picks up a lot of page navigation features
since they are prominently located
77
NIE 10 summary
Maybe itd do better if it ignored short phrases
Science, Travel it wastes its quota
78
Questions??

Thank you!

79
References

ACL Anthology A Digital Archive of Research
Papers in Computational Linguistics.
Computational Linguistics, Volume 28, Number 4,
December 2002. http//acl.ldc.upenn.edu/J/J02/
CLAIR. News in Essence. lthttp//www.newsinesse
nce.comgt
NewsInEssence. About. lthttp//lada.si.umich.edu
8080/clair/nie1/docs/about.htmlgt
NewsInEssence. Help. lthttp//lada.si.umich.edu8
080/clair/nie1/docs/help.htmlgt
Pullum, Geoff. Geoff Pullum's Six Golden Rules
of giving an academic presentation.
lthttp//people.ucsc.edu/pullum/goldenrules.htmlgt
Radev, Dragomir Hovy, Eduard McKeown, Kathleen.
Introduction to the Special Issue on
Summarization.
Computational Linguistics, Volume 28, Number 4,
December 2002.
Radev, Dragomir. BlairGoldensohn, Sasha Zhang,
Zhu. Experiments in Single and Multi Document
Summarization Using MEAD. lthttp//tangra.si.umich
.edu/radev/papers/duc01.pdfgt
Radev, Dragomir et al. MEAD Documentation
v3.07. Nov. 2002

80
References (2)

Simone Teufel Marc Moens.Articles Summarizing
Scientific Articles Experiments with Relevance
and Rhetorical Status Computational Linguistics,
Volume 28, Number 4, December 2002. Pg. 409-445
Text Summarization Project lthttp//www.site.uottaw
a.ca/tanka/ts.htmlgt
Saggion, Horacio Lapalme, Guy.Generating
Indicative-Informative Summaries with SumUM .
Computational Linguistics, Volume 28, Number 4,
December 2002. Pg. 497-526.
Saggion, Horacio. Génération automatique de
résumes par analyse sélectif. lthttp//www.dcs.she
f.ac.uk/saggion/TheThesis.psgt
Saggion, Horacio. SumUM Summarization at the
Université de Montréal lthttp//www.dcs.shef.ac.uk
/saggion/sumumweb01.htmlgt
Zechner, Klaus. Automatic Summarization of
Open-Domain Multiparty Dialogues in Diverse
Genres Computational Linguistics, Volume 28,
Number 4, December 2002. Pg. 447-485.