Summarization of XML Documents - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Summarization of XML Documents

Description:

Generator. SUMMARY. GENERATOR. RANKING UNIT. Tag Ranker. Text ... A fully automated XML summary generator. Ranking of tags and text based on the ranking model ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 26
Provided by: mpiin
Category:

less

Transcript and Presenter's Notes

Title: Summarization of XML Documents


1
Summarization of XML Documents
  • K Sarath Kumar

2
Outline
  • Motivation
  • System for XML Summarization
  • Ranking Model and Summary Generation
  • Example Summaries
  • Conclusion and Future Work

3
Motivation
XML Document Collection (eg IMDB)
XML Document
  • Types of XML Document Summaries
  • Generic summary summarizes entire contents of
    the document.
  • Query-biased summary summarizes those parts of
    the document which are relevant to users query.

4
  • Aims
  • We aim at summaries which are
  • Generated Automatically
  • Highly constrained by size
  • Highly Informative
  • High Coverage
  • Challenges
  • Structure is as important as text
  • Varying text length

5
System for XML Summarization
Summary Size
Corpus Statistics
SUMMARY GENERATOR
RANKING UNIT
Ranked Tag units
Info Unit Generator
Tag Ranker
XML Doc
Tag Units
Summary
Text Ranker
Text Units
Ranked Text units
6
Information Units of an XML Document
7
Ranking Unit
I. Tag Ranking
8
II. Text Ranking
  • Two categories of text
  • Entities
  • Regular text

9
  • Ranking is done based on context of occurrence.
  • - No redundancy in tag context (E.g. actor
    names, genre)
  • Redundancy in tag context (E.g. plots, goofs,
    trivia items)

Tag context
Document context
Corpus context
10
Correlated tags and text
Often find related tag units siblings of each
other E.g. Actor and Role
Inclusion Principle
Case 1
Case 2
11
Generation of Summary
Consider the following tag rank table
To generate a summary with 30 tags, 15 actor
tags, 9 keyword tags and 6 trivia would be
required.
12
Generating the summary with 30 tags
13
Few Example Summaries
Titanic.xml - Summaries
14
(No Transcript)
15
(No Transcript)
16
Thanks!
17
Appendix
Informativeness
18
Coverage
19
Ranking Model
I. TAG RANKER
Mixture Model of Typicality and Specialty
  • Typicality How typical is the tag in the
    corpus?

20
Specialty How unusually frequent/infrequent is
the tag in the current
document compared to an average
document of the corpus?
21
  • Text with redundancy in tag context

Sort terms by frequencies and take top m terms
as centroid query
Relevance
Similarity Calculated using Maximum marginal
relevance(MMR)
Finally,
22
Text without redundancy in tag context
Redundancy at tag level
No redundancy at tag level
is set empirically
23
  • A Relative Count Matrix is constructed
  • Given two tags Ti and Tj, the relative
    importance of Tj with that of higher ranked Tj is
    calculated by dividing them both by P(TjD)
    (shows how many Tj tags are worth one Ti)
  • Tj is considered only after P(TiD)/P(TjD)
    number of Ti tags have been considered.
  • Extending the above concept, a matrix with
    relative counts can be formed.

24
Oceans Eleven.xml - Summaries
25
Generating the summary with 30 tags
Write a Comment
User Comments (0)
About PowerShow.com