Multi-Document Summary Space:What do People Agree is Important? - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-Document Summary Space:What do People Agree is Important?

Description:

Multi-Document Summary Space:What do People Agree is Important? John M. Conroy Institute for Defense Analyses Center for Computing Sciences Bowie, MD – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 40
Provided by: John1776
Category:

less

Transcript and Presenter's Notes

Title: Multi-Document Summary Space:What do People Agree is Important?


1
Multi-Document Summary SpaceWhat do People Agree
is Important?
  • John M. Conroy
  • Institute for Defense Analyses
  • Center for Computing Sciences
  • Bowie, MD

2
Outline
  • Problem statement.
  • Human Summaries.
  • Oracle Estimates.
  • Algorithms.

3
Query-Based Multi-document Summarization
  • User types query.
  • Relevant documents are retrieved.
  • Retrieved documents are clustered.
  • Summaries for each cluster are displayed.

4
Example Query hurricane earthquake
5
columbia
6
michagan
7
Recent Evaluation and Problem Definition Efforts
  • Document Understanding Conferences
  • 2001-2004 100 word generic summaries.
  • 2005-2006 250 word focused summaries.
  • http//duc.nist.gov/
  • Multi-lingual Summarization Evaluation 2005-2006.
    (MSE)
  • Given a cluster of translated documents and
    English documents produce100 word.
  • http//www.isi.edu/cyl/MTSE2005/

8
Overview of Techniques
  • Linguistic Tools (find sentence boundaries, to
    shorten sentences, extract features).
  • Part of speech.
  • Parsing.
  • Entity Extraction.
  • Bag of words, position in document.
  • Statistical Classifier.
  • Linear classifiers.
  • Bayesian methods, HMM, SVM, etc.
  • Redundancy Removal.
  • Maximum marginal relevance (MMR).
  • QR.

9
Sample Data
  • DUC 2005.
  • 50 topics.
  • 25 to 50 relevant documents per topic.
  • 4 or 9 human summaries.

10
Linguistic Processing
  • Use heuristic patterns to find phrases/clauses/wor
    ds to eliminate
  • Shallow processing
  • Value of full sentence elimination?

11
Linguistic Processing
  • Phrase elimination
  • Gerund phrases
  • Example
  • Suicide bombers targeted a crowded open-air
    market Friday, setting off blasts that killed the
    two assailants, injured 21 shoppers and passersby
    and prompted the Israeli Cabinet to put off
    action on .

12
Example Topic Description
  • Title Reasons for Train Wrecks
  • Narative What causes train wrecks and what can
    be done to prevent them? Train wrecks are those
    events that result in actual damage to the trains
    themselves not just accidents where people are
    killed or injured.
  • Type General

13
Example Human Summary
  • Train wrecks are caused by a number of factors
    human, mechanical and equipment errors, spotty
    maintenance, insufficient training, load
    shifting, vandalism, and natural phenomenon. The
    most common types of mechanical and equipment
    errors are brake failures, signal light and gate
    failures, track defects, and rail bed collapses.
    Spotty maintenance is characterized by failure to
    consistently inspect and repair equipment. Lack
    of electricians and mechanics results in letting
    equipment run down until someone complains.
    Engineers are often unprepared to detect or
    prevent operating problems because of the lack of
    follow-up training needed to handle updated high
    technology equipment. Load shiftings derail
    trains when a curve is taken too fast or there is
    a track defect. Natural phenomenon such as heavy
    fog, torrential rain, or floods causes some
    accidents. Vandalism in the form of leaving
    switches open or stealing parts from them leads
    to serious accidents. Human errors may be the
    most common cause of train accidents. Cars and
    trucks carelessly crossing or left on tracks
    cause frequent accidents. Train crews often make
    inaccurate tonnage measurements that cause
    derailments or brake failures, fail to heed
    single-track switching precautions, make faulty
    car hook-ups, and, in some instances, operate
    locomotives while under the influence of alcohol
    or drugs. Some freak accidents occur when moving
    trains are not warned about other trains stalled
    on the tracks. Recommendations for preventing
    accidents are increase the number of inspectors,
    improve emergency training procedures, install
    state-of-the-art warning, control, speed and
    weight monitoring mechanisms, and institute
    closer driver fitness supervision.

14
Another Example Topic
  • Title Human Toll of Tropical Storms
  • What has been the human toll in death or injury
    of tropical storms in recent years? Where and
    when have each of the storms caused human
    casualties? What are the approximate total
    number of casualties attributed to each of the
    storms?
  • Granularity Specific

15
Example Human Summary
  • January 1989 through October 1994 tolled 641,257
    tropical storm deaths and 5,277 injuries
    world-wide.
  • In May 1991, Bangladesh suffered 500,000 deaths
    140,000 in March 1993 and 110 deaths and 5,000
    injuries in May 1994.
  • The Philippines had 29 deaths in July 1989 and
    149 in October 30 in June 1990, 13 in August and
    14 in November.
  • South Carolina had 18 deaths and two injuries in
    October 1989 29 deaths in April 1990 and three
    in October.
  • North Carolina had one death in July 1989 and
    three in October 1990.
  • Louisiana had three deaths in July 1989 and two
    deaths and 75 injuries in August 1992.
  • Georgia had three deaths in October 1990 and 19
    in July 1994.
  • Florida had 15 in August 1992.
  • Alabama had one in July 1994.
  • Mississippi had five in July 1989.
  • Texas had four in July 1989 and two in October.
  • September 1989 Atlantic storms killed three.
  • The Bahamas had four in August 1992.
  • The Virgin Islands had five in December 1990.
  • Mexico had 19 in July 1993.
  • Martinique had six in October 1990 and 10
    injuries in August 1993.
  • September 1993 Caribbean storms killed three
    Puerto Ricans and 22 others.
  • China had 48 deaths and 190 injuries in September
    1989, and 216 deaths in August 1990.
  • Taiwan had 30 in October 1994.

16
Inter-Human Word Agreement
17
Evaluation of Summaries
  • Ideally each machine summary would be judged by
    multiple humans for
  • 1. Responsiveness to query.
  • 2. Cohesiveness, grammar, etc.
  • Reality This would take too much time!
  • Plan Use Metric which correlates at 90-97 with
    human responsiveness judgments.

18
Recall Oriented Understanding for Gisting
Evaluation
19
ROUGE-1 Scores
20
ROUGE-2 Scores
21
Frequency and Summarization
  • Ani Nenkova, Columbia and Lucy Vanderwende,
    Microsoft report
  • High frequency content words correlate with high
    frequency words chosen by humans.
  • SumBasic, a simple method based on this
    principle, produces state of the art generic
    summaries, e.g., DUC 04 and MSE 05.
  • Van Halteren and Teufel 2003, Radev et. Al. 2003,
    Copeck and Szpakowicz 2004.

22
What is Summary Space?
  • Is there enough information in the documents to
    approach human performance as measured by ROUGE?
  • Do humans abstract so much that extracts dont
    suffice?
  • Is a unigram distribution enough?

23
A Candidate
  • Suppose an oracle gave us
  • Pr(t)Probability that a human will choose term t
    to be included in a summary.
  • t is a non-stop word term.
  • Estimate based on our data.
  • E.g., 0, 1/4, 1/2, 3/4, or 1 if 4 human summaries
    are provided.

24
A Oracle Simple Score
  • Generate extracts
  • Score sentences by the expected percentage of
    abstract terms they contain.
  • Discard any short sentences or any long
    sentences.
  • Pivoted QR to remove redundancy.

25
The Oracle Pleases Everyone!
26
Approximate Pr(t)
  • Two bits of Information
  • Topic Description.
  • Extract query phrases.
  • Documents Retrieved.
  • Extract terms which are indicative or give the
    signature of the documents.

27
Query Terms
  • Given Topic Description.
  • Tag it for part of speech.
  • Take any NN (noun), VB (verb), JJ (adjective), RB
    (adverb), multi-word groupings of NNP.
  • E.g. train, wrecks, train wrecks, causes,
    prevent, events, result, actual, actual
    damage,trains, accidents, killed, injured.

28
Signature Terms
  • Term space-delimited string of characters from
    a,b,c,,z, after text is lower cased and all
    other characters and stop words are removed.
  • Need to restrict our attention to indicative
    terms (signature terms).
  • Terms that occur more often then expected.

29
Signature Terms
  • Terms that occur more often than expected
  • Based on a 2?2 contingency table of relevance
    counts.
  • Log-likelihood equivalent to mutual information.
  • Dunning 1993, Hovy Lin 2000.

30
Hypothesis Testing
  • H0 P(Cti)pP(Cti)
  • H1 P(Cti)p1?p2P(Cti)
  • ML Estimate p, p1, and p2

C C
ti O11 O12
ti O21 O22
31
Likelihood of H0 vs. H1 and Mutual Information
32
Example Signature Terms
  • accident accidents ammunition angeles avenue
    beach bernardino blamed board boulevard boxcars
    brake brakes braking cab car cargo cars caused cc
    cd collided collision column conductor coroner
    crash crew crews crossing curve derail derailed
    desk driver edition emergency engineer engineers
    equipment failures fe fog freight ft grade
    holland injured injuries investigators killed
    line loaded locomotives los maintenance
    mechanical metro miles nn ntsb occurred pacific
    page part passenger path photo pipeline rail
    railroad railroads railway runaway safety san
    santa scene seal shells sheriff signals southern
    speed staff station switch track tracks train
    trains transportation truck weight westminster
    words workers wreck yard yesterday

33
An Approximation of Pr(t)
  • For a given data set and topic description
  • Let Q be the set of query terms.
  • Let S be the set of signature terms.
  • Estimate of Pr(t)(?Q (t) ?S(t))/2where
    ??(t)1 if t?A and 0 otherwise.

34
Our Approach
  • Use expected abstract word score to select
    candidate sentences (2w).
  • Terms as sentence features
  • Terms t1, , tm ? Rm
  • Sentences s1, , sn ? Rn
  • Scaling each column scaled to score.
  • Use Pivoted QR to select sentences.

35
Redundancy Removal
  • Pivoted QR
  • Choose column with maximum norm (aj)
  • Subtract components along aj from remaining
    columns, i.e., remaining columns are orthogonal
    to the chosen column
  • Stop criteria chosen sentences (columns) ? w
    (2w) words
  • Removes semantic redundancy

36
Results
37
Conclusions
  • Pr(t), the oracle score produces summaries which
    please everyone.
  • A simple estimate of Pr(t) induced by query and
    signature terms gives rise to a top scoring
    system.

38
Future Work
  • Better estimates for Pr(t).
  • Pseudo-relevance feedback.
  • LSI or similar dimension reduction tricks?
  • Ordering of sentences for readability is
    important. (with Dianne OLeary)
  • A 250 word summary has approximately 12
    sentences.
  • Two directions in linguistic preprocessing
  • Eugene Charniaks parser. (with Bonnie Dorr and
    David Zaijac)
  • Simple rule based (POS lite). (Judith
    Schlesinger).

39
On Brevity
Write a Comment
User Comments (0)
About PowerShow.com