Methods for Automatic Evaluation of Sentence Extract Summaries - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Methods for Automatic Evaluation of Sentence Extract Summaries

Description:

Generating a 9 sentence summary from a 10 sentence document is very easy. ... Probability of generating a summary of a length m1 with accurate sentences l1 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 22
Provided by: gunt5
Category:

less

Transcript and Presenter's Notes

Title: Methods for Automatic Evaluation of Sentence Extract Summaries


1
Methods for Automatic Evaluation of Sentence
Extract Summaries
  • G.Ravindra, N.Balakrishnan, K.R.Ramakrishnan
  • Supercomputer Education Research Center
  • Department of Electrical Engineering
  • Indian Institute of Science
  • Bangalore-INDIA

2
Agenda
  • Introduction to Text Summarization
  • Need for summarization, types of summaries
  • Evaluating Extract Summaries
  • Challenges in manual and automatic evaluation
  • Fuzzy Summary Evaluation
  • Complexity Scores

3
What is Text Summarization
  • Reductive transformation of source text to
    summary text by content generalization and/or
    selection
  • Loss of information
  • What can be lost and what should not be lost
  • How much can be lost
  • What is the size of the summary
  • Types of Summaries
  • Extracts and Abstracts
  • Influence of genre on the performance of a
    summarization algorithm
  • Newswire stories are favorable to sentence
    position

4
Need for Summarization
  • Explosive growth in availability of digital
    textual data
  • Books in digital libraries, mailing-list
    archives, on-line news portals
  • Duplication of textual segments in books
  • E.g. 10 introductory books on quantum physics
    have a number of paragraphs common to all of them
    (syntactically different but semantically the
    same)
  • Hand-held devices
  • Small screens and limited memory
  • Low power devices and hence limited processing
    capability
  • E.g. Stream a book from a digital library to a
    hand-held device
  • Production of information is faster than
    consumption

5
Types of Summaries
  • Extracts
  • Text selection
  • E.g Paragraphs from books, sentences from
    editorials, phrases from e-mails
  • Application of statistical techniques
  • Abstracts
  • Text selection followed by generalization
  • Need for linguistic processing
  • E.g. Convert a sentence to a phrase
  • Generic Summaries
  • Independent of genre
  • Indicative Summaries
  • Gives a general idea as to the topic of
    discussion in the text being summarized
  • Informational Summaries
  • Serves as a surrogate to the original text

6
Evaluating Extract Summaries
  • Manual evaluation
  • Human judges are allowed to score a summary on a
    well defined scale based on a well defined
    criteria
  • Subject to judges understanding of the subject
  • Depends on judges opinions
  • Guidelines constrain opinions
  • Individual judges scores are combined to
    generate the final score
  • Re-evaluation might result in different scores
  • Logistic problems for researchers

7
Automatic Evaluation
  • Machine-based evaluation
  • Consistent over multiple runs
  • Fast, avoids logistic problems
  • Suitable for researchers experimenting with new
    algorithms
  • Flip-side
  • Not as accurate as human evaluation
  • Should be used as precursor to a detailed human
    evaluation
  • Algorithmically handles various sentence
    constructs and linguistic variants

8
Fuzzy Summary Evaluation FuSE
  • Proposing the use of Fuzzy union theory to
    quantify the similarity of two extract summaries
  • Similarity between the reference (human
    generated) summary and candidate (machine
    generated) summary is evaluated
  • Each sentence is a fuzzy set
  • Each sentence in the reference summary has a
    membership grade in every sentence of the
    candidate machine generated summary
  • Membership grade of a reference summary sentence
    in the candidate summary is the union of
    membership grades across all candidate summary
    sentences
  • Use membership grades to compute an f-score value
  • Membership grade is the hamming distance between
    two sentences based on collocations

9
Fuzzy F-score
Fuzzy Precision
Fuzzy Recall
Candidate summary sentence set
Reference summary sentence set
Union function
Membership grade of candidate sentence in
reference sentence
10
Choice of Union operator
  • Propose the use of Franks S-norm operator
  • Allows combining partial matches non-linearly
  • Membership grade of a sentence in a summary is
    dependent on its length
  • Automatically includes brevity-bonus into the
    scheme

11
Franks S-norm operator
Damping Coefficient
Mean of non-zero membership grades for a sentence
Sentence length
Length of the longest sentence
12
Characteristics of Franks base
13
Performance of FuSE for various sentence lengths
14
Dictionary-enhanced Fuzzy Summary
EvaluationDeFuSE
  • FuSE does not understand sentence similarity
    based on synonymy and hypernymy
  • Identifying synonymous words makes evaluation
    more accurate
  • Identifying hypernymous word relationships allows
    consideration of gross information during
    evaluation
  • Note Very deep hypernymy trees could result in
    topic drift and hence improper evaluation

15
Use of Word Net
16
Example Use of hypernymy
  • HURRICANE GILBERT DEVASTATED DOMINICAN REPUBLIC
    AND PARTS OF CUBA
  • (PHYSICAL PHENOMENON) GILBERT (DESTROY,RUIN)
    (REGION) AND PARTS OF (REGION)
  • TROPICAL STORM GILBERT DESTROYED PARTS OF HAVANA
  • TROPICAL (PHYSICAL PHENOMENON) GILBERT DESTROYED
    PARTS OF (REGION)

17
Complexity Score
  • Attempts to quantify the summarization algorithm
    based on the difficulty in generating a summary
    of a particular accuracy
  • Generating a 9 sentence summary from a 10
    sentence document is very easy.
  • An algorithm which randomly selects 9 sentences
    will have a worst case accuracy of 90
  • A complicated AINLP based algorithm cannot do
    any better
  • If a 2 sentence summary is to be generated from a
    10 sentence document, we have 45 possible
    candidates out of which one is accurate

18
Computing Complexity Score
  • Probability of generating a summary of a length
    m1 with accurate sentences l1 when human summary
    has h sentences and the document being summarized
    has n sentences

19
Complexity Score (Cont..)
  • To compare two summaries of equal length the
    performance of one relative to the baseline is
    given by

20
Complexity Score (Cont..)
  • Complexity in generating a 10 extract with 12
    correct sentences is higher than generating a 30
    extract with 12 correct sentences

21
Conclusion
  • Summary evaluation is as complicated as summary
    generation
  • Fuzzy schemes are ideal for evaluating extract
    summaries
  • Use of synonymy and hypernymy relations improve
    evaluation accuracy
  • Complexity score is a new way of looking at
    summary evaluation
Write a Comment
User Comments (0)
About PowerShow.com