Title: Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization
1Timestamped GraphsEvolutionary Models of Text
for Multi-document Summarization
- Ziheng Lin and Min-Yen Kan
- Department of Computer ScienceNational
University of Singapore, Singapore
2Summarization
- Traditionally, heuristics for extractive
summarization - Cue/stigma phrases
- Sentence position (relative to document,
section, paragraph) - Sentence length
- TFIDF, TF scores
- Similarity (with title, context, query)
- With the advent of machine learning, heuristic
weights for different features are tuned by
supervised learning - In last few years, graphical representations of
text have shed new light on the summarization
problem
3Prestige as sentence selection
- One motivation of using graphical methods was to
model the problem as finding prestige of nodes in
a social network - PageRank used random walk to smooth the effect
of non-local context - HITS and SALSA to model hubs and authorities
- In summarization, lead to TextRank and LexRank
- Contrast with previous graphical approaches
(Salton et al. 1994) - Did we leave anything out of our representation
for summarization? - Yes, the notion of an evolving network
4Social networks change!
- Natural evolving networks (Dorogovtsev and
Mendes, 2001) - Citation networks New papers can cite old ones,
but the old network is static - The Web new pages are added with an old page
connecting it to the web graph, old pages may
update links
5Evolutionary models for summarization
- Writers and readers often follow conventional
rhetorical styles - articles are not written or
read in an arbitrary way - Consider the evolution of texts using a very
simplistic model - Writers write from the first sentence onwards in
a text - Readers read from the first sentence onwards of
a text - A simple model sentences get added
incrementally to the graph
6Timestamped Graph Construction
- Approach
- These assumptions suggest us to iteratively add
sentences into the graph in chronological order. - At each iteration, consider which edges to add
to the graph. - For single document simple and straightforward
add 1st sentence, followed by the 2nd, and so
forth, until the last sentence is added - For multi-document treat it as multiple
instances of single documents, which evolve in
parallel i.e., add 1st sentences of all
documents, followed by all 2nd sentences, and so
forth - Doesnt really model chronological ordering
between articles, fix later
7Timestamped Graph Construction
- Model
- Documents as columns
- di document i
- Sentences as rows
- sj jth sentence of document
8Timestamped Graph Construction
doc3
doc2
doc1
sent1
sent2
sent3
9- An example TSG DUC 2007 D0703A-A
10Timestamped Graph Construction
- These are just one instance of TSGs
- Lets generalize and formalize them
- Def A timestamped graph algorithm tsg(M) is a
9-tuple (d, e, u, f,s, t, i, s, t) that
specifies a resulting algorithm that takes as
input the set of texts M and outputs a graph G
Input text transformation function
Properties of nodes
Properties of edges
11Edge properties (d, e, u, f)
- Edge Direction (d)
- Forward, backward, or undirected
-
- Edge Number (e)
- number of edges to instantiate per timestep
-
- Edge Weight (u)
- weighted or unweighted edges
-
- Inter-document factor (f)
- penalty factor for links between documents in
multi-document sets.
12Node properties (s, t, i, s)
- Vertex selection function s(u, G)
- One strategy among those nodes not yet
connected to u in G, choose the one with highest
similarity according to u - Similarity functions Jaccard, cosine, concept
links (Ye et al.. 2005) -
- Text unit type (t)
- Most extractive algorithms use sentences as
elementary units -
- Node increment factor (i)
- How many nodes get added at each timestep
-
- Skew degree (s)
- Models how nodes in multi-document graphs are
added - Skew degree how many iterations to wait before
adding the 1st sentence of the next document - Lets illustrate this
13Skew Degree Examples
- time(d1) lt time(d2) lt time(d3) lt time(d4)
d1 d2 d3 d4
d1 d2 d3 d4
Freely skewed Only add a new document when it
would be linked by some node using vertex
function s
Skewed by 1
Skewed by 2
14Input text transformation function (t)
- Document Segmentation Function (t)
- Problem observed in some clusters where some
documents in a multi-document cluster are very
long - Takes many timestamps to introduce all of the
sentences, causing too many edges to be drawn - ?(G) segments long documents into several sub
docs - Solution is too hacked hope to investigate
more in current and future work
d5b
d5a
d5
15Timestamped Graph Construction
- Representations
- We can model a number of different algorithms
using this 9-tuple formalism - (d, e, u, f, s, t, i, s, t)
- The given toy example
- (f, 1, 0, 1, max-cosine-based, sentence, 1, 0,
null) - LexRank graphs
- (u, N, 1, 1, cosine-based, sentence, Lmax, 0,
null) - N total number of sentences in the cluster
Lmax the max document length - i.e., all sentences are added into the graph in
one timestep, each connected to all others, and
cosine scores are given to edge weights
16TSG-based summarization
- Methodology
- Evaluation
- Analysis
17System Overview
- Sentence splitting
- Detect and mark sentence boundaries
- Annotate each sentence with the doc ID and the
sentence number - E.g., XIE19980304.0061 4 March 1998 from Xinhua
News XIE19980304.0061-14 the 14th sentence of
this document - Graph construction
- Construct TSG in this phase
18System Overview
- Sentence Ranking
- Apply topic-sensitive random walk on the graph
to redistribute the weights of the nodes - Sentence extraction
- Extract the top-ranked sentences
- Two different modified MMR re-rankers are used,
depending on whether it is main or update task
19Evaluation
- Dataset DUC 2005, 2006 and 2007.
- Evaluation tool ROUGE n-gram based automatic
evaluation - Each dataset contains 50 or 45 clusters, each
cluster contains a query and 25 documents - Evaluate on some parameters
- Do different e values affect the summarization
process? - How do topic-sensitivity and edge weighting
perform in running PageRank? - How does skewing the graph affect the information
flow in the graph?
20Evaluation on number of edges (e)
- Tried different e values
- Optimal performance e 2
- At e 1, graph is too loosely connected, not
suitable for PageRank ? very low performance - At e N, a LexRank system
e 2
e 2
N
N
N
21Evaluation (other edge parameters)
- PageRank generic vs topic-sensitive
- Edge weight (u) unweighted vs weighted
- Optimal performance topic-sensitive PageRank
and weighted edges
Topic-sensitive Weighted edges ROUGE-1 ROUGE-2
No No 0.39358 0.07690
Yes No 0.39443 0.07838
No Yes 0.39823 0.08072
Yes Yes 0.39845 0.08282
22Evaluation on skew degree (s)
- Different skew degrees s 0, 1 and 2
- Optimal performance s 1
- s 2 introduces a delay interval that is too
large - Need to try freely skewed graphs
Skew degree ROUGE-1 ROUGE-2
0 0.36982 0.07580
1 0.37268 0.07682
2 0.36998 0.07489
23Holistic Evaluation in DUC
- We participated in DUC 2007 with an
extractive-based TSG system - Main task 12th for ROUGE-2, 10th for ROUGE-SU4
among 32 systems - Update task 3rd for ROUGE-2, 4th for ROUGE-SU4
among 24 systems - Used a modified version of maximal marginal
relevance to penalize links in previously read
articles - Extension of inter-document factor (f)
- TSG formalism better tailored to deal with
update / incremental text tasks - New method that may be competitive with current
approaches - Other top scoring systems may do sentence
compression, not just extraction
24Conclusion
- Proposed a timestamped graph model for text
understanding and summarization - Adds sentences one at a time
- Parameterized model with nine variables
- Canonicalizes representation for several graph
based summarization algorithms - Future Work
- Freely skewed model
- Empirical and theoretical properties of TSGs
(e.g., in-degree distribution)
25Backup Slides
- 25 Minute talk total
- 26 Apr 2007, 1150-1215
26Differences for main and update task processing
- Main task
- Construct a TSG for input cluster
- Run topic-sensitive PageRank on the TSG
- Apply first modified version of MMR to extract
sentences
- Update task
- Cluster A
- Construct a TSG for cluster A
- Run topic-sensitive PageRank on the TSG
- Apply the second modified version of MMR to
extract sentences - Cluster B
- Construct a TSG for clusters A and B
- Run topic-sensitive PageRank on the TSG only
retain sentences from B - Apply the second modified version of MMR to
extract sentences - Cluster C
- Construct a TSG for clusters A, B and C
- Run topic-sensitive PageRank on the TSG only
retain sentences from C - Apply the second modified version of MMR to
extract sentences
27Sentence Ranking
- Once a timestamped graph is built, we want to
compute an prestige score for each node - PageRank use an iterative method that allows
the weights of the nodes to redistribute until
stability is reached - Similarities as edges ? weighted edges query ?
topic-sensitive
Topic sensitive (Q) portion
Standard random walk term
28Sentence Extraction Main task
- Original MMR integrates a penalty of the
maximal similarity of the candidate document and
one selected document - Ye et al. (2005) introduced a modified MMR
integrates a penalty of the total similarity of
the candidate sentence and all selected sentences -
- Score(s) PageRank score of s S selected
sentences - This is used in the main task
Penalty All previous sentence similarity
29Sentence Extraction Update task
- Update task assumes readers already read previous
cluster(s) - implies we should not select sentences that have
redundant information with previous cluster(s) - Propose a modified MMR for the update task
- consider the total similarity of the candidate
sentence with all selected sentences and
sentences in previously-read cluster(s) - P contains some top-ranked sentences in previous
cluster(s)
Previous cluster overlap
30References
- Günes Erkan and Dragomir R. Radev. 2004.
LexRank Graph-based centrality as salience in
text summari-zation. Journal of Artificial
Intelligence Research, (22). - Rada Mihalcea and Paul Tarau. 2004. TextRank
Bring-ing order into texts. In Proceedings of
EMNLP 2004. - S.N. Dorogovtsev and J.F.F. Mendes. 2001.
Evolution of networks. Submitted to Advances in
Physics on 6th March 2001. - Sergey Brin and Lawrence Page. 1998. The anatomy
of a large-scale hypertextual Web search engine.
Com-puter Networks and ISDN Systems, 30(1-7). - Jon M. Kleinberg. 1999. Authoritative sources in
a hy-perlinked environment. In Proceedings of
ACM-SIAM Symposium on Discrete Algorithms, 1999. - Shiren Ye, Long Qiu, Tat-Seng Chua, and Min-Yen
Kan. 2005. NUS at DUC 2005 Understanding
docu-ments via concepts links. In Proceedings of
DUC 2005.