Summarization of XML Documents

About This Presentation

Title:

Summarization of XML Documents

Description:

Generator. SUMMARY. GENERATOR. RANKING UNIT. Tag Ranker. Text ... A fully automated XML summary generator. Ranking of tags and text based on the ranking model ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 26

Provided by: mpiin

Category:

more less

Transcript and Presenter's Notes

Title: Summarization of XML Documents

1
Summarization of XML Documents

K Sarath Kumar

2
Outline

Motivation
System for XML Summarization
Ranking Model and Summary Generation
Example Summaries
Conclusion and Future Work

3
Motivation
XML Document Collection (eg IMDB)
XML Document

Types of XML Document Summaries
Generic summary summarizes entire contents of
the document.
Query-biased summary summarizes those parts of
the document which are relevant to users query.

Aims
We aim at summaries which are
Generated Automatically
Highly constrained by size
Highly Informative
High Coverage
Challenges
Structure is as important as text
Varying text length

5
System for XML Summarization
Summary Size
Corpus Statistics
SUMMARY GENERATOR
RANKING UNIT
Ranked Tag units
Info Unit Generator
Tag Ranker
XML Doc
Tag Units
Summary
Text Ranker
Text Units
Ranked Text units
6
Information Units of an XML Document
7
Ranking Unit
I. Tag Ranking
8
II. Text Ranking

Two categories of text
Entities
Regular text

Ranking is done based on context of occurrence.
- No redundancy in tag context (E.g. actor
names, genre)
Redundancy in tag context (E.g. plots, goofs,
trivia items)

Tag context
Document context
Corpus context
10
Correlated tags and text
Often find related tag units siblings of each
other E.g. Actor and Role
Inclusion Principle
Case 1
Case 2
11
Generation of Summary
Consider the following tag rank table
To generate a summary with 30 tags, 15 actor
tags, 9 keyword tags and 6 trivia would be
required.
12
Generating the summary with 30 tags
13
Few Example Summaries
Titanic.xml - Summaries
14
(No Transcript)
15
(No Transcript)
16
Thanks!
17
Appendix
Informativeness
18
Coverage
19
Ranking Model
I. TAG RANKER
Mixture Model of Typicality and Specialty

Typicality How typical is the tag in the
corpus?

20
Specialty How unusually frequent/infrequent is
the tag in the current
document compared to an average
document of the corpus?
21

Text with redundancy in tag context

Sort terms by frequencies and take top m terms
as centroid query
Relevance
Similarity Calculated using Maximum marginal
relevance(MMR)
Finally,
22
Text without redundancy in tag context
Redundancy at tag level
No redundancy at tag level
is set empirically
23

A Relative Count Matrix is constructed
Given two tags Ti and Tj, the relative
importance of Tj with that of higher ranked Tj is
calculated by dividing them both by P(TjD)
(shows how many Tj tags are worth one Ti)
Tj is considered only after P(TiD)/P(TjD)
number of Ti tags have been considered.
Extending the above concept, a matrix with
relative counts can be formed.

24
Oceans Eleven.xml - Summaries
25
Generating the summary with 30 tags

Write a Comment

User Comments (0)