Title: STARCH
1Text Summarization in Data Mining
ConverSpeech LLC Palo Alto, California 94301, USA
2What is text data mining?
- Applying linguistic and statistical techniques to
natural-language text to - Identify and extract from the text valuable
information - Uncover in the text useful, interesting, possibly
unexpected - patterns of information
- relationships between pieces of information
3How can summaries help?
- Guide searching/browsing/readingtoo much
information out there! - Example biomedicine
- 11 million citations (articles) in MEDLINE alone!
- Hundreds of additional biological information
sources worldwide, many with textual annotations - Reveal similarities in content across documents
- Uncover information in document sets that exists
only at the aggregate level
4Three interesting questions...
- How are summaries automatically generated?
Summary ------------------------------------------
------
- What do they really tell you?
5Abstraction vs. Extraction...
- Is there a feasible alternative to simple
extraction?
time, business, day
- Instead of identifying key words and phrases in
the text
company, Nasdaq, working, capital, stock, share
identify key concepts, which may or may not be
expressed using words in the text
6What methods are used to evaluate extractive
summaries?
Standard methods compare sentences in summary
against
- Sentences extracted by human judges using
recall and precision
- Human-written abstracts (author or other) using
recall and precision on sub-sentence units
7A new method of summary evaluation
- Standard methods of summary evaluation
- neglect to measure something important
robustness
How sensitive is a summarizer to surface
perturbations in the text? Lets substitute
synonyms and change sentence order and see what
happens
8Measuring robustness...
firm and company are used interchangeably
substitute firm for company
What happens? Are the same (or equivalent)
sentences extracted? Is the summary stable?
9Measuring robustness...
Displace the first two paragraphs and tack them
on at the end of the article
What happens? Are the same (or equivalent)
sentences extracted? Is the summary stable?
10Three summarizers...
Substitutions
Reorderings
X
Stable?
Extraction summarizer 2
X
?
Stable?
Abstraction summarizer
?
?
Stable?
11Summarization in text data mining...
- Use summarizers that abstract from the text in
addition to extracting from the text.
The result is more robust, and potentially
uncovers unexpected information.
12Text Summarization in Data Mining
ConverSpeech LLC Palo Alto, California 94301, USA