CSC 9010: Text Mining Summarization Lab - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CSC 9010: Text Mining Summarization Lab

Description:

There is an online demo that we will be using. ... Natural Language query: query in English. Information from TextAnalyst Help system ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 13
Provided by: BEN764
Category:

less

Transcript and Presenter's Notes

Title: CSC 9010: Text Mining Summarization Lab


1
CSC 9010 Text MiningSummarization Lab
  • Dr. Paula Matuszek
  • Paula_A_Matuszek_at_glaxosmithkline.com
  • (610) 270-6851

2
Summarization Tools
  • We will try out two summarization tools
  • Mead(tangra.si.umich.edu/clair/meaddemo/demo.cgi)
  • Text summarization project under development at
    the University of Michigan. There is an online
    demo that we will be using.
  • TextAnalyst(www.megaputer.com/products/ta/index.ph
    p3)
  • Application marketed by Megaputer, which is a
    data-mining/text mining company. An evaluation
    copy is installed on our lab machines. We will
    spend most of our time here.

3
M E A D
4
Text Analyst (Extracted primarily from
TextAnalyst 2.1 Help System)
  • TextAnalyst is a natural language text analysis
    software tool which provides a number of
    capabilities.
  • document summarization
  • topic structure extraction
  • document navigation
  • "natural language search"
  • The basis of the tool is a network of terms found
    in a document and the relations between them.
  • Words which occur often together are considered
    related
  • http//www.megaputer.com/tech/wp/tm.php3 for
    additional information.

5
Definitions
  • Concept Refers to a word or words (term or
    terms) TextAnalyst identifies as significant in
    your text. Concepts appear as hyperlinks in text
    and as list items in tree structures.
  • Text Refers to a document you load in
    TextAnalyst. Both .TXT and RTF file formats are
    acceptable.
  • Semantic network A tree structure of concepts
    from your text and the relationships between
    them. This is a concise representation of your
    text.
  • Knowledge base The collection of your text, the
    semantic network related to your text, any edits
    you made, the results of your analyses, and
    hyperlinks within your text.
  • Information from TextAnalyst Help system

6
More Definitions
  • Semantic search Semantic search is synonymous
    with Natural Language Query. You type a question
    in conventional, common English, and TextAnalyst
    returns results for your examination.
  • Semantic weight The semantic weight of a
    concept is a measure of its importance in your
    document. This is the number closest to a concept
    in a tree structure when measuring semantic
    weight. The semantic weight of the relationship
    between a concept and its parent concept is the
    leftmost number in a pair when measuring semantic
    weight. This number shows the measure of the
    strength of the relationship between the concept
    and its parent.
  • Information from TextAnalyst Help system

7
Semantic Network
  • Concepts and relationships among them
  • Common Natural Language Representation Tom gave
    Mary a rose.

Give verb gave
Mary
Giver
Tom
Recipient
Gift
Rose
8
Neural Net
  • Machine Learning Algorithms
  • Input Layer of nodes
  • Output Layer of nodes
  • Zero or more hidden layers
  • Weighted Links among nodes
  • Learning methods
  • back propagation
  • others

9
TextAnalyst Algorithms
  • Preprocessing (language-specific)
  • Eliminate stop words
  • Stem
  • Statistical Analysis
  • proprietary neural network algorithm
  • word frequencies
  • word combination frequencies
  • joint occurrence of words within sentences
  • yields network of term strengths and relation
    strengths

10
Capabilities
  • Semantic Analysis tree structure of concepts
    and relations
  • Navigation concepts in tree are linked to
    occurrences in text
  • Summarization identify "most important"
    sentences
  • Natural Language query query in English
  • Information from TextAnalyst Help system

11
Capabilities 2
  • Knowledge base development maintain semantic
    network, links, related dictionaries, etc
  • Topic Structure View
  • Cluster View (especially for multuple documents)
  • Dictionary development add, delete terms from
    automated dictionary
  • Focused analysis narrow terms in search
  • Information from TextAnalyst Help system

12
LAB
  • Using the set of documents from assignment 2,
    create summaries using both MEAD and TextAnalyst.
  • Explore some of the different parameters,
    including using multiple documents.
  • How well did each tool do with single documents?
    Multiple documents?
  • Do you think your documents could have been
    well-summarized by extracting sentences?
Write a Comment
User Comments (0)
About PowerShow.com