Automatic summarizations goal - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Automatic summarizations goal

Description:

extract content from it ... It is a custom to speak of an extract of K% condensation: so K% of the input's ... on text-span extraction and ranking using a ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 34
Provided by: eda8
Category:

less

Transcript and Presenter's Notes

Title: Automatic summarizations goal


1
Automatic summarizations goal
  • Take an information source
  • extract content from it
  • present the most important content to the user in
    a condensed form and in a manner sensitive to the
    user's or application's needs.

2
Imagine everyday life without some form of
summarization
  • Newspaper headlines are summaries, written in a
    terse stylized language, of material in a news
    story.
  • The body of a news story may also contain a
    summary, e.g., a news story written so that a
    summary of the main events occurs at the
    beginning.
  • A preview or trailer of a show is a summary.
  • Abstracts of scientific articles are a
    traditional form of summary, written by the
    authors, or else by a professional abstractor
    following certain guidelines.
  • A table showing baseball statistics for a player
    over a season is very much a summary.

3
Other varieties of summaries include
  • Reviews (of books and movies)
  • Digests such as TV guides
  • Minutes of a meeting
  • A program for a conference
  • A weather forecast
  • A stock market bulletin
  • A resume
  • An obituary
  • An abridgment of a book
  • A map of a neighborhood
  • A library catalog of abstracts of articles in new
    journals
  • A web page listing resources in a particular
    subject area
  • A table of contents for a book or magazine
  • A summary that appears on the back cover of a
    book
  • A catalog of various products available from a
    vendor

4
Other varieties of summaries include
  • Almost any retrospective account of events could
    be a summary.

5
Summary Input/Output
  • A summary output can be in the form such as a
    picture, a movie, an audio segment
  • The input to be summarized may be in these
    different multimedia forms.

6
Some genres of summarization
  • The word summary is associated with a variety of
    meanings and is used in a variety of contexts

7
Genres of summarization Single/Multiple
document summary
  • Depending on the input, one can have single- or
    multiple-document summaries.

8
Genres of summarization Extract/Abstract
  • Depending on the output, one can have extract- or
    abstract-like summaries.

9
Some genres of summarization Extract
  • An extract is a summary consisting entirely of
    material copied from the input.
  • It is a custom to speak of an extract of K
    condensation so K of the input's
    words/sentences/paragraphs may appear in the
    extract.

10
Some genres of summarization Abstract
  • In contrast, an abstract is a summary at least
    some of whose material is not present in the
    input.
  • In general, abstracts offer the possibility of
    higher degrees of condensation a short abstract
    may offer more information than a longer extract.

11
Some genres of summarization Indicative/Informat
ive
  • Depending on the usage, a summary can be
    indicative or informative.

12
Some genres of summarization Indicative
  • An indicative summary can provide only an
    indication of the main topics in the input text.
  • Thus, an indicative abstract is aimed at helping
    the user to decide whether to read the
    information source, or not.
  • The main purpose is to suggest the contents of
    the document without giving away detail on the
    documents content.

13
Some genres of summarization Informative
  • An informative summary is meant to represent (and
    often replace) the original document.
  • Therefore it must contain all the pertinent
    information necessary to convey the core
    information and omit ancillary information.

14
Some genres of summarization Generic/Query-Orien
ted
  • Depending on the purpose, a summary can be
  • generic. i.e., it can reflect the author's point
    of view with respect to all important topics in
    the input text, or
  • it can be query oriented (also, user-focused or
    topic-focused), i.e., it can reflect only the
    topics in the input text that are specific to a
    given query.

15
Automatic Summarizations goal
16
Automatic Summarizations goal
  • Overall, there are a variety of different
    parameters to a summarization system (some of
    these have been discussed above)

17
Variety of Parameters to a summarization system
  • 1. compression rate (summary length / source
    length)
  • 2. audience (user focused vs. generic)
  • 3. relation to source (extract vs. abstract)
  • 4. function (indicative vs. informative)
  • 5. coherence (coherent vs. incoherent)
  • 6. span (single- vs. multi- document
    summarization)

18
Variety of Parameters to a summarization system
  • 7. language (monolingual, multilingual or
    cross-lingual)
  • 8. genre (special strategies for different
    varieties of text)
  • 9. media (type of media or their combination in
    input/output)
  • 10. linguistics space (3 dimensions level,
    elements, position)

19
Variety of Parameters to a summarization system
  • In any given application, the importance of these
    parameters will vary.
  • It is unlikely that any one summarizer will
    handle all of these parameters.

20
Overview of a Summarizer
21
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • This paper focuses on text-span extraction and
    ranking using a methodology that assigns weighted
    scores for both statistical and linguistic
    features in the text span.

22
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • An analysis illustrates that the weights assigned
    to a feature may differ according to the type of
    summary and corpus/document genre.
  • These weights can then be optimized (the article
    does not define in what way optimality is
    measured) for specific applications and genres.

23
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • To determine possible linguistic features to use
    in the scoring methodology, several syntactical
    and lexical characteristics of newswire summaries
    have been identified.
  • The statistical features used were those that
    have proven efficient in standard monolingual
    retrieval techniques.

24
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • The approach to text summarization described in
    the paper allows both generic and query-relevant
    summaries by scoring sentences with respect to
    both statistical and linguistic features.
  • For generic summarization, a centroid query
    vector is calculated using high frequency
    document words and the title of the document.

25
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • Each sentence is cored according to the following
    formula and then ordered in a summary according
    to rank order.

26
Summarizing Text Documents Sentence Selection
and Evaluation Metrics
  • Table 1 goes here..

27
Features and hints
  • Several features were inspected in summaries.
  • Conclusions and characteristics were identified

28
Hints and features
  • Summary length was independent of document
    length.

29
Hints and features which are more frequent in
summary sentences
  • Indefinite articles more frequently than the
    non-summary sentences.
  • Location names (not for all sources)
  • Named Entities (proper nouns) in general
  • Days of week

30
Hints and features which are more frequent in
non-summary sentences
  • Words and phrases in direct and indirect
    quotations
  • according, adding, said, and other verbs
    related to communication (75 more frequent in
    non-summaries)
  • Anaphoric references
  • Honorifics such as Dr. and Mrs.
  • Negation words
  • Auxiliary verbs such as no, dont and never
  • Numerals (either in digits or words)

31
Hints and features which are more frequent in
non-summary sentences
  • Conjunctions such as and, or, but, so,
    although and however
  • Prepositions such as at, by, for, of,
    in, to, and with

32
And the point is
  • There are many features (even more than reported
    here).
  • It is not clear how many of them are meaningful
    in the same way for other corpora and other
    domains.
  • It seems obvious that such features can be
    learned using supervised learning using similar
    input.

33
Evaluation
  • There is no widely agreed upon set of methods for
    carrying out summarization evaluation.
  • Ezra will now present the evaluation approach
    used in the article.
Write a Comment
User Comments (0)
About PowerShow.com