Title: Automatic summarizations goal
1Automatic summarizations goal
- Take an information source
- extract content from it
- present the most important content to the user in
a condensed form and in a manner sensitive to the
user's or application's needs.
2Imagine everyday life without some form of
summarization
- Newspaper headlines are summaries, written in a
terse stylized language, of material in a news
story. - The body of a news story may also contain a
summary, e.g., a news story written so that a
summary of the main events occurs at the
beginning. - A preview or trailer of a show is a summary.
- Abstracts of scientific articles are a
traditional form of summary, written by the
authors, or else by a professional abstractor
following certain guidelines. - A table showing baseball statistics for a player
over a season is very much a summary.
3Other varieties of summaries include
- Reviews (of books and movies)
- Digests such as TV guides
- Minutes of a meeting
- A program for a conference
- A weather forecast
- A stock market bulletin
- A resume
- An obituary
- An abridgment of a book
- A map of a neighborhood
- A library catalog of abstracts of articles in new
journals - A web page listing resources in a particular
subject area - A table of contents for a book or magazine
- A summary that appears on the back cover of a
book - A catalog of various products available from a
vendor
4Other varieties of summaries include
- Almost any retrospective account of events could
be a summary.
5Summary Input/Output
- A summary output can be in the form such as a
picture, a movie, an audio segment - The input to be summarized may be in these
different multimedia forms.
6Some genres of summarization
- The word summary is associated with a variety of
meanings and is used in a variety of contexts
7Genres of summarization Single/Multiple
document summary
- Depending on the input, one can have single- or
multiple-document summaries.
8Genres of summarization Extract/Abstract
- Depending on the output, one can have extract- or
abstract-like summaries.
9Some genres of summarization Extract
- An extract is a summary consisting entirely of
material copied from the input. - It is a custom to speak of an extract of K
condensation so K of the input's
words/sentences/paragraphs may appear in the
extract.
10Some genres of summarization Abstract
- In contrast, an abstract is a summary at least
some of whose material is not present in the
input. - In general, abstracts offer the possibility of
higher degrees of condensation a short abstract
may offer more information than a longer extract.
11Some genres of summarization Indicative/Informat
ive
- Depending on the usage, a summary can be
indicative or informative.
12Some genres of summarization Indicative
- An indicative summary can provide only an
indication of the main topics in the input text. - Thus, an indicative abstract is aimed at helping
the user to decide whether to read the
information source, or not. - The main purpose is to suggest the contents of
the document without giving away detail on the
documents content.
13Some genres of summarization Informative
- An informative summary is meant to represent (and
often replace) the original document. - Therefore it must contain all the pertinent
information necessary to convey the core
information and omit ancillary information.
14Some genres of summarization Generic/Query-Orien
ted
- Depending on the purpose, a summary can be
- generic. i.e., it can reflect the author's point
of view with respect to all important topics in
the input text, or - it can be query oriented (also, user-focused or
topic-focused), i.e., it can reflect only the
topics in the input text that are specific to a
given query.
15Automatic Summarizations goal
16Automatic Summarizations goal
- Overall, there are a variety of different
parameters to a summarization system (some of
these have been discussed above)
17Variety of Parameters to a summarization system
- 1. compression rate (summary length / source
length) - 2. audience (user focused vs. generic)
- 3. relation to source (extract vs. abstract)
- 4. function (indicative vs. informative)
- 5. coherence (coherent vs. incoherent)
- 6. span (single- vs. multi- document
summarization)
18Variety of Parameters to a summarization system
- 7. language (monolingual, multilingual or
cross-lingual) - 8. genre (special strategies for different
varieties of text) - 9. media (type of media or their combination in
input/output) - 10. linguistics space (3 dimensions level,
elements, position)
19Variety of Parameters to a summarization system
- In any given application, the importance of these
parameters will vary. - It is unlikely that any one summarizer will
handle all of these parameters.
20Overview of a Summarizer
21Summarizing Text Documents Sentence Selection
and Evaluation Metrics
- This paper focuses on text-span extraction and
ranking using a methodology that assigns weighted
scores for both statistical and linguistic
features in the text span.
22Summarizing Text Documents Sentence Selection
and Evaluation Metrics
- An analysis illustrates that the weights assigned
to a feature may differ according to the type of
summary and corpus/document genre. - These weights can then be optimized (the article
does not define in what way optimality is
measured) for specific applications and genres.
23Summarizing Text Documents Sentence Selection
and Evaluation Metrics
- To determine possible linguistic features to use
in the scoring methodology, several syntactical
and lexical characteristics of newswire summaries
have been identified. - The statistical features used were those that
have proven efficient in standard monolingual
retrieval techniques.
24Summarizing Text Documents Sentence Selection
and Evaluation Metrics
- The approach to text summarization described in
the paper allows both generic and query-relevant
summaries by scoring sentences with respect to
both statistical and linguistic features. - For generic summarization, a centroid query
vector is calculated using high frequency
document words and the title of the document.
25Summarizing Text Documents Sentence Selection
and Evaluation Metrics
- Each sentence is cored according to the following
formula and then ordered in a summary according
to rank order.
26Summarizing Text Documents Sentence Selection
and Evaluation Metrics
27Features and hints
- Several features were inspected in summaries.
- Conclusions and characteristics were identified
28Hints and features
- Summary length was independent of document
length.
29Hints and features which are more frequent in
summary sentences
- Indefinite articles more frequently than the
non-summary sentences. - Location names (not for all sources)
- Named Entities (proper nouns) in general
- Days of week
30Hints and features which are more frequent in
non-summary sentences
- Words and phrases in direct and indirect
quotations - according, adding, said, and other verbs
related to communication (75 more frequent in
non-summaries) - Anaphoric references
- Honorifics such as Dr. and Mrs.
- Negation words
- Auxiliary verbs such as no, dont and never
- Numerals (either in digits or words)
31Hints and features which are more frequent in
non-summary sentences
- Conjunctions such as and, or, but, so,
although and however - Prepositions such as at, by, for, of,
in, to, and with
32And the point is
- There are many features (even more than reported
here). - It is not clear how many of them are meaningful
in the same way for other corpora and other
domains. - It seems obvious that such features can be
learned using supervised learning using similar
input.
33Evaluation
- There is no widely agreed upon set of methods for
carrying out summarization evaluation. - Ezra will now present the evaluation approach
used in the article.