STARCH - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

STARCH

Description:

Identify and extract from the text valuable information ... Let's substitute synonyms and change sentence order and see what happens ... Substitute synonyms: ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 13
Provided by: seanbec
Category:
Tags: starch | synonyms

less

Transcript and Presenter's Notes

Title: STARCH


1
Text Summarization in Data Mining
ConverSpeech LLC Palo Alto, California 94301, USA
2
What is text data mining?
  • Applying linguistic and statistical techniques to
    natural-language text to
  • Identify and extract from the text valuable
    information
  • Uncover in the text useful, interesting, possibly
    unexpected
  • patterns of information
  • relationships between pieces of information

3
How can summaries help?
  • Guide searching/browsing/readingtoo much
    information out there!
  • Example biomedicine
  • 11 million citations (articles) in MEDLINE alone!
  • Hundreds of additional biological information
    sources worldwide, many with textual annotations
  • Reveal similarities in content across documents
  • Uncover information in document sets that exists
    only at the aggregate level

4
Three interesting questions...
  • How are summaries automatically generated?
  • Are they any good?

Summary ------------------------------------------
------
  • What do they really tell you?

5
Abstraction vs. Extraction...
  • Is there a feasible alternative to simple
    extraction?

time, business, day
  • Instead of identifying key words and phrases in
    the text

company, Nasdaq, working, capital, stock, share
identify key concepts, which may or may not be
expressed using words in the text
6
What methods are used to evaluate extractive
summaries?
Standard methods compare sentences in summary
against
  • Sentences extracted by human judges using
    recall and precision
  • Human-written abstracts (author or other) using
    recall and precision on sub-sentence units

7
A new method of summary evaluation
  • Standard methods of summary evaluation
  • are human intensive
  • neglect to measure something important

robustness
How sensitive is a summarizer to surface
perturbations in the text? Lets substitute
synonyms and change sentence order and see what
happens
8
Measuring robustness...
  • Substitute synonyms

firm and company are used interchangeably
substitute firm for company
What happens? Are the same (or equivalent)
sentences extracted? Is the summary stable?
9
Measuring robustness...
  • Reorder sentences

Displace the first two paragraphs and tack them
on at the end of the article
What happens? Are the same (or equivalent)
sentences extracted? Is the summary stable?
10
Three summarizers...
Substitutions
  • Extraction summarizer 1

Reorderings
X
Stable?
Extraction summarizer 2
X
?
Stable?
Abstraction summarizer
?
?
Stable?
11
Summarization in text data mining...
  • Use summarizers that abstract from the text in
    addition to extracting from the text.

The result is more robust, and potentially
uncovers unexpected information.
12
Text Summarization in Data Mining
ConverSpeech LLC Palo Alto, California 94301, USA
Write a Comment
User Comments (0)
About PowerShow.com