Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization - PowerPoint PPT Presentation

About This Presentation

Title:

Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization

Description:

Freely skewed. d1 d2 d3 d4. Freely skewed = Only add a new ... Freely skewed model. Empirical and theoretical properties of TSGs (e.g., in-degree distribution) ... – PowerPoint PPT presentation

Number of Views:205

Avg rating:3.0/5.0

Slides: 31

Provided by: tanyeefanm

Category:

more less

Transcript and Presenter's Notes

Title: Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization

1
Timestamped GraphsEvolutionary Models of Text
for Multi-document Summarization

Ziheng Lin and Min-Yen Kan
Department of Computer ScienceNational
University of Singapore, Singapore

2
Summarization

Traditionally, heuristics for extractive
summarization
Cue/stigma phrases
Sentence position (relative to document,
section, paragraph)
Sentence length
TFIDF, TF scores
Similarity (with title, context, query)
With the advent of machine learning, heuristic
weights for different features are tuned by
supervised learning
In last few years, graphical representations of
text have shed new light on the summarization
problem

3
Prestige as sentence selection

One motivation of using graphical methods was to
model the problem as finding prestige of nodes in
a social network
PageRank used random walk to smooth the effect
of non-local context
HITS and SALSA to model hubs and authorities
In summarization, lead to TextRank and LexRank
Contrast with previous graphical approaches
(Salton et al. 1994)
Did we leave anything out of our representation
for summarization?
Yes, the notion of an evolving network

4
Social networks change!

Natural evolving networks (Dorogovtsev and
Mendes, 2001)
Citation networks New papers can cite old ones,
but the old network is static
The Web new pages are added with an old page
connecting it to the web graph, old pages may
update links

5
Evolutionary models for summarization

Writers and readers often follow conventional
rhetorical styles - articles are not written or
read in an arbitrary way
Consider the evolution of texts using a very
simplistic model
Writers write from the first sentence onwards in
a text
Readers read from the first sentence onwards of
a text
A simple model sentences get added
incrementally to the graph

6
Timestamped Graph Construction

Approach
These assumptions suggest us to iteratively add
sentences into the graph in chronological order.
At each iteration, consider which edges to add
to the graph.
For single document simple and straightforward
add 1st sentence, followed by the 2nd, and so
forth, until the last sentence is added
For multi-document treat it as multiple
instances of single documents, which evolve in
parallel i.e., add 1st sentences of all
documents, followed by all 2nd sentences, and so
forth
Doesnt really model chronological ordering
between articles, fix later

7
Timestamped Graph Construction

Model
Documents as columns
di document i
Sentences as rows
sj jth sentence of document

8
Timestamped Graph Construction

A multi document example

doc3
doc2
doc1
sent1
sent2
sent3
9

An example TSG DUC 2007 D0703A-A

10
Timestamped Graph Construction

These are just one instance of TSGs
Lets generalize and formalize them
Def A timestamped graph algorithm tsg(M) is a
9-tuple (d, e, u, f,s, t, i, s, t) that
specifies a resulting algorithm that takes as
input the set of texts M and outputs a graph G

Input text transformation function
Properties of nodes
Properties of edges
11
Edge properties (d, e, u, f)

Edge Direction (d)
Forward, backward, or undirected
Edge Number (e)
number of edges to instantiate per timestep
Edge Weight (u)
weighted or unweighted edges
Inter-document factor (f)
penalty factor for links between documents in
multi-document sets.

12
Node properties (s, t, i, s)

Vertex selection function s(u, G)
One strategy among those nodes not yet
connected to u in G, choose the one with highest
similarity according to u
Similarity functions Jaccard, cosine, concept
links (Ye et al.. 2005)
Text unit type (t)
Most extractive algorithms use sentences as
elementary units
Node increment factor (i)
How many nodes get added at each timestep
Skew degree (s)
Models how nodes in multi-document graphs are
added
Skew degree how many iterations to wait before
adding the 1st sentence of the next document
Lets illustrate this

13
Skew Degree Examples

time(d1) lt time(d2) lt time(d3) lt time(d4)

d1 d2 d3 d4
d1 d2 d3 d4
Freely skewed Only add a new document when it
would be linked by some node using vertex
function s
Skewed by 1
Skewed by 2
14
Input text transformation function (t)

Document Segmentation Function (t)
Problem observed in some clusters where some
documents in a multi-document cluster are very
long
Takes many timestamps to introduce all of the
sentences, causing too many edges to be drawn
?(G) segments long documents into several sub
docs
Solution is too hacked hope to investigate
more in current and future work

d5b
d5a
d5
15
Timestamped Graph Construction

Representations
We can model a number of different algorithms
using this 9-tuple formalism
(d, e, u, f, s, t, i, s, t)
The given toy example
(f, 1, 0, 1, max-cosine-based, sentence, 1, 0,
null)
LexRank graphs
(u, N, 1, 1, cosine-based, sentence, Lmax, 0,
null)
N total number of sentences in the cluster
Lmax the max document length
i.e., all sentences are added into the graph in
one timestep, each connected to all others, and
cosine scores are given to edge weights

16
TSG-based summarization

Methodology
Evaluation
Analysis

17
System Overview

Sentence splitting
Detect and mark sentence boundaries
Annotate each sentence with the doc ID and the
sentence number
E.g., XIE19980304.0061 4 March 1998 from Xinhua
News XIE19980304.0061-14 the 14th sentence of
this document
Graph construction
Construct TSG in this phase

18
System Overview

Sentence Ranking
Apply topic-sensitive random walk on the graph
to redistribute the weights of the nodes
Sentence extraction
Extract the top-ranked sentences
Two different modified MMR re-rankers are used,
depending on whether it is main or update task

19
Evaluation

Dataset DUC 2005, 2006 and 2007.
Evaluation tool ROUGE n-gram based automatic
evaluation
Each dataset contains 50 or 45 clusters, each
cluster contains a query and 25 documents
Evaluate on some parameters
Do different e values affect the summarization
process?
How do topic-sensitivity and edge weighting
perform in running PageRank?
How does skewing the graph affect the information
flow in the graph?

20
Evaluation on number of edges (e)

Tried different e values
Optimal performance e 2
At e 1, graph is too loosely connected, not
suitable for PageRank ? very low performance
At e N, a LexRank system

e 2
e 2
N
N
N
21
Evaluation (other edge parameters)

PageRank generic vs topic-sensitive
Edge weight (u) unweighted vs weighted
Optimal performance topic-sensitive PageRank
and weighted edges

Topic-sensitive Weighted edges ROUGE-1 ROUGE-2
No No 0.39358 0.07690
Yes No 0.39443 0.07838
No Yes 0.39823 0.08072
Yes Yes 0.39845 0.08282
22
Evaluation on skew degree (s)

Different skew degrees s 0, 1 and 2
Optimal performance s 1
s 2 introduces a delay interval that is too
large
Need to try freely skewed graphs

Skew degree ROUGE-1 ROUGE-2
0 0.36982 0.07580
1 0.37268 0.07682
2 0.36998 0.07489
23
Holistic Evaluation in DUC

We participated in DUC 2007 with an
extractive-based TSG system
Main task 12th for ROUGE-2, 10th for ROUGE-SU4
among 32 systems
Update task 3rd for ROUGE-2, 4th for ROUGE-SU4
among 24 systems
Used a modified version of maximal marginal
relevance to penalize links in previously read
articles
Extension of inter-document factor (f)
TSG formalism better tailored to deal with
update / incremental text tasks
New method that may be competitive with current
approaches
Other top scoring systems may do sentence
compression, not just extraction

24
Conclusion

Proposed a timestamped graph model for text
understanding and summarization
Adds sentences one at a time
Parameterized model with nine variables
Canonicalizes representation for several graph
based summarization algorithms
Future Work
Freely skewed model
Empirical and theoretical properties of TSGs
(e.g., in-degree distribution)

25
Backup Slides

25 Minute talk total
26 Apr 2007, 1150-1215

26
Differences for main and update task processing

Main task
Construct a TSG for input cluster
Run topic-sensitive PageRank on the TSG
Apply first modified version of MMR to extract
sentences

Update task
Cluster A
Construct a TSG for cluster A
Run topic-sensitive PageRank on the TSG
Apply the second modified version of MMR to
extract sentences
Cluster B
Construct a TSG for clusters A and B
Run topic-sensitive PageRank on the TSG only
retain sentences from B
Apply the second modified version of MMR to
extract sentences
Cluster C
Construct a TSG for clusters A, B and C
Run topic-sensitive PageRank on the TSG only
retain sentences from C
Apply the second modified version of MMR to
extract sentences

27
Sentence Ranking

Once a timestamped graph is built, we want to
compute an prestige score for each node
PageRank use an iterative method that allows
the weights of the nodes to redistribute until
stability is reached
Similarities as edges ? weighted edges query ?
topic-sensitive

Topic sensitive (Q) portion
Standard random walk term
28
Sentence Extraction Main task

Original MMR integrates a penalty of the
maximal similarity of the candidate document and
one selected document
Ye et al. (2005) introduced a modified MMR
integrates a penalty of the total similarity of
the candidate sentence and all selected sentences
Score(s) PageRank score of s S selected
sentences
This is used in the main task

Penalty All previous sentence similarity
29
Sentence Extraction Update task

Update task assumes readers already read previous
cluster(s)
implies we should not select sentences that have
redundant information with previous cluster(s)
Propose a modified MMR for the update task
consider the total similarity of the candidate
sentence with all selected sentences and
sentences in previously-read cluster(s)
P contains some top-ranked sentences in previous
cluster(s)

Previous cluster overlap
30
References

Günes Erkan and Dragomir R. Radev. 2004.
LexRank Graph-based centrality as salience in
text summari-zation. Journal of Artificial
Intelligence Research, (22).
Rada Mihalcea and Paul Tarau. 2004. TextRank
Bring-ing order into texts. In Proceedings of
EMNLP 2004.
S.N. Dorogovtsev and J.F.F. Mendes. 2001.
Evolution of networks. Submitted to Advances in
Physics on 6th March 2001.
Sergey Brin and Lawrence Page. 1998. The anatomy
of a large-scale hypertextual Web search engine.
Com-puter Networks and ISDN Systems, 30(1-7).
Jon M. Kleinberg. 1999. Authoritative sources in
a hy-perlinked environment. In Proceedings of
ACM-SIAM Symposium on Discrete Algorithms, 1999.
Shiren Ye, Long Qiu, Tat-Seng Chua, and Min-Yen
Kan. 2005. NUS at DUC 2005 Understanding
docu-ments via concepts links. In Proceedings of
DUC 2005.