Multi-Document Summary Space:What do People Agree is Important?

About This Presentation

Title:

Multi-Document Summary Space:What do People Agree is Important?

Description:

Multi-Document Summary Space:What do People Agree is Important? John M. Conroy Institute for Defense Analyses Center for Computing Sciences Bowie, MD – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 40

Provided by: John1776

Learn more at: http://helper.ipam.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Multi-Document Summary Space:What do People Agree is Important?

1
Multi-Document Summary SpaceWhat do People Agree
is Important?

John M. Conroy
Institute for Defense Analyses
Center for Computing Sciences
Bowie, MD

2
Outline

Problem statement.
Human Summaries.
Oracle Estimates.
Algorithms.

3
Query-Based Multi-document Summarization

User types query.
Relevant documents are retrieved.
Retrieved documents are clustered.
Summaries for each cluster are displayed.

4
Example Query hurricane earthquake
5
columbia
6
michagan
7
Recent Evaluation and Problem Definition Efforts

Document Understanding Conferences
2001-2004 100 word generic summaries.
2005-2006 250 word focused summaries.
http//duc.nist.gov/
Multi-lingual Summarization Evaluation 2005-2006.
(MSE)
Given a cluster of translated documents and
English documents produce100 word.
http//www.isi.edu/cyl/MTSE2005/

8
Overview of Techniques

Linguistic Tools (find sentence boundaries, to
shorten sentences, extract features).
Part of speech.
Parsing.
Entity Extraction.
Bag of words, position in document.
Statistical Classifier.
Linear classifiers.
Bayesian methods, HMM, SVM, etc.
Redundancy Removal.
Maximum marginal relevance (MMR).
QR.

9
Sample Data

DUC 2005.
50 topics.
25 to 50 relevant documents per topic.
4 or 9 human summaries.

10
Linguistic Processing

Use heuristic patterns to find phrases/clauses/wor
ds to eliminate
Shallow processing
Value of full sentence elimination?

11
Linguistic Processing

Phrase elimination
Gerund phrases
Example
Suicide bombers targeted a crowded open-air
market Friday, setting off blasts that killed the
two assailants, injured 21 shoppers and passersby
and prompted the Israeli Cabinet to put off
action on .

12
Example Topic Description

Title Reasons for Train Wrecks
Narative What causes train wrecks and what can
be done to prevent them? Train wrecks are those
events that result in actual damage to the trains
themselves not just accidents where people are
killed or injured.
Type General

13
Example Human Summary

Train wrecks are caused by a number of factors
human, mechanical and equipment errors, spotty
maintenance, insufficient training, load
shifting, vandalism, and natural phenomenon. The
most common types of mechanical and equipment
errors are brake failures, signal light and gate
failures, track defects, and rail bed collapses.
Spotty maintenance is characterized by failure to
consistently inspect and repair equipment. Lack
of electricians and mechanics results in letting
equipment run down until someone complains.
Engineers are often unprepared to detect or
prevent operating problems because of the lack of
follow-up training needed to handle updated high
technology equipment. Load shiftings derail
trains when a curve is taken too fast or there is
a track defect. Natural phenomenon such as heavy
fog, torrential rain, or floods causes some
accidents. Vandalism in the form of leaving
switches open or stealing parts from them leads
to serious accidents. Human errors may be the
most common cause of train accidents. Cars and
trucks carelessly crossing or left on tracks
cause frequent accidents. Train crews often make
inaccurate tonnage measurements that cause
derailments or brake failures, fail to heed
single-track switching precautions, make faulty
car hook-ups, and, in some instances, operate
locomotives while under the influence of alcohol
or drugs. Some freak accidents occur when moving
trains are not warned about other trains stalled
on the tracks. Recommendations for preventing
accidents are increase the number of inspectors,
improve emergency training procedures, install
state-of-the-art warning, control, speed and
weight monitoring mechanisms, and institute
closer driver fitness supervision.

14
Another Example Topic

Title Human Toll of Tropical Storms
What has been the human toll in death or injury
of tropical storms in recent years? Where and
when have each of the storms caused human
casualties? What are the approximate total
number of casualties attributed to each of the
storms?
Granularity Specific

15
Example Human Summary

January 1989 through October 1994 tolled 641,257
tropical storm deaths and 5,277 injuries
world-wide.
In May 1991, Bangladesh suffered 500,000 deaths
140,000 in March 1993 and 110 deaths and 5,000
injuries in May 1994.
The Philippines had 29 deaths in July 1989 and
149 in October 30 in June 1990, 13 in August and
14 in November.
South Carolina had 18 deaths and two injuries in
October 1989 29 deaths in April 1990 and three
in October.
North Carolina had one death in July 1989 and
three in October 1990.
Louisiana had three deaths in July 1989 and two
deaths and 75 injuries in August 1992.
Georgia had three deaths in October 1990 and 19
in July 1994.
Florida had 15 in August 1992.
Alabama had one in July 1994.
Mississippi had five in July 1989.
Texas had four in July 1989 and two in October.
September 1989 Atlantic storms killed three.
The Bahamas had four in August 1992.
The Virgin Islands had five in December 1990.
Mexico had 19 in July 1993.
Martinique had six in October 1990 and 10
injuries in August 1993.
September 1993 Caribbean storms killed three
Puerto Ricans and 22 others.
China had 48 deaths and 190 injuries in September
1989, and 216 deaths in August 1990.
Taiwan had 30 in October 1994.

16
Inter-Human Word Agreement
17
Evaluation of Summaries

Ideally each machine summary would be judged by
multiple humans for
1. Responsiveness to query.
2. Cohesiveness, grammar, etc.
Reality This would take too much time!
Plan Use Metric which correlates at 90-97 with
human responsiveness judgments.

18
Recall Oriented Understanding for Gisting
Evaluation
19
ROUGE-1 Scores
20
ROUGE-2 Scores
21
Frequency and Summarization

Ani Nenkova, Columbia and Lucy Vanderwende,
Microsoft report
High frequency content words correlate with high
frequency words chosen by humans.
SumBasic, a simple method based on this
principle, produces state of the art generic
summaries, e.g., DUC 04 and MSE 05.
Van Halteren and Teufel 2003, Radev et. Al. 2003,
Copeck and Szpakowicz 2004.

22
What is Summary Space?

Is there enough information in the documents to
approach human performance as measured by ROUGE?
Do humans abstract so much that extracts dont
suffice?
Is a unigram distribution enough?

23
A Candidate

Suppose an oracle gave us
Pr(t)Probability that a human will choose term t
to be included in a summary.
t is a non-stop word term.
Estimate based on our data.
E.g., 0, 1/4, 1/2, 3/4, or 1 if 4 human summaries
are provided.

24
A Oracle Simple Score

Generate extracts
Score sentences by the expected percentage of
abstract terms they contain.
Discard any short sentences or any long
sentences.
Pivoted QR to remove redundancy.

25
The Oracle Pleases Everyone!
26
Approximate Pr(t)

Two bits of Information
Topic Description.
Extract query phrases.
Documents Retrieved.
Extract terms which are indicative or give the
signature of the documents.

27
Query Terms

Given Topic Description.
Tag it for part of speech.
Take any NN (noun), VB (verb), JJ (adjective), RB
(adverb), multi-word groupings of NNP.
E.g. train, wrecks, train wrecks, causes,
prevent, events, result, actual, actual
damage,trains, accidents, killed, injured.

28
Signature Terms

Term space-delimited string of characters from
a,b,c,,z, after text is lower cased and all
other characters and stop words are removed.
Need to restrict our attention to indicative
terms (signature terms).
Terms that occur more often then expected.

29
Signature Terms

Terms that occur more often than expected
Based on a 2?2 contingency table of relevance
counts.
Log-likelihood equivalent to mutual information.
Dunning 1993, Hovy Lin 2000.

30
Hypothesis Testing

H0 P(Cti)pP(Cti)
H1 P(Cti)p1?p2P(Cti)
ML Estimate p, p1, and p2

C C
ti O11 O12
ti O21 O22
31
Likelihood of H0 vs. H1 and Mutual Information
32
Example Signature Terms

accident accidents ammunition angeles avenue
beach bernardino blamed board boulevard boxcars
brake brakes braking cab car cargo cars caused cc
cd collided collision column conductor coroner
crash crew crews crossing curve derail derailed
desk driver edition emergency engineer engineers
equipment failures fe fog freight ft grade
holland injured injuries investigators killed
line loaded locomotives los maintenance
mechanical metro miles nn ntsb occurred pacific
page part passenger path photo pipeline rail
railroad railroads railway runaway safety san
santa scene seal shells sheriff signals southern
speed staff station switch track tracks train
trains transportation truck weight westminster
words workers wreck yard yesterday

33
An Approximation of Pr(t)

For a given data set and topic description
Let Q be the set of query terms.
Let S be the set of signature terms.
Estimate of Pr(t)(?Q (t) ?S(t))/2where
??(t)1 if t?A and 0 otherwise.

34
Our Approach

Use expected abstract word score to select
candidate sentences (2w).
Terms as sentence features
Terms t1, , tm ? Rm
Sentences s1, , sn ? Rn
Scaling each column scaled to score.
Use Pivoted QR to select sentences.

35
Redundancy Removal

Pivoted QR
Choose column with maximum norm (aj)
Subtract components along aj from remaining
columns, i.e., remaining columns are orthogonal
to the chosen column
Stop criteria chosen sentences (columns) ? w
(2w) words
Removes semantic redundancy

36
Results
37
Conclusions

Pr(t), the oracle score produces summaries which
please everyone.
A simple estimate of Pr(t) induced by query and
signature terms gives rise to a top scoring
system.

38
Future Work

Better estimates for Pr(t).
Pseudo-relevance feedback.
LSI or similar dimension reduction tricks?
Ordering of sentences for readability is
important. (with Dianne OLeary)
A 250 word summary has approximately 12
sentences.
Two directions in linguistic preprocessing
Eugene Charniaks parser. (with Bonnie Dorr and
David Zaijac)
Simple rule based (POS lite). (Judith
Schlesinger).

39
On Brevity

Write a Comment

User Comments (0)

About PowerShow.com

Multi-Document Summary Space:What do People Agree is Important? - PowerPoint PPT Presentation

Multi-Document Summary Space:What do People Agree is Important?

Multi-Document Summary Space:What do People Agree is Important? John M. Conroy Institute for Defense Analyses Center for Computing Sciences Bowie, MD – PowerPoint PPT presentation