Using Random Walks for Questionfocused Sentence Retrieval - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Using Random Walks for Questionfocused Sentence Retrieval

Description:

[CNN, 4/18/02, 12:22] ... Breaking news; various sources. 341 total question and answer sets. Question annotation ... retrieval from dynamic news stories ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 27

Provided by: jahnaott

Category:

more less

Transcript and Presenter's Notes

Title: Using Random Walks for Questionfocused Sentence Retrieval

1
Using Random Walks for Question-focused Sentence
Retrieval
Jahna Otterbacher, Gunes Erkan, Dragomir
Radev Computational Linguistics and Information
Retrieval (CLAIR) http//tangra.si.umich.edu/clair
2
Milan plane crash
CNN, 4/18/02, 1222 A small plane has hit a
skyscraper in central Milan, setting the top
floors of the 30-story building on fire, an
Italian journalist told CNN. The crash by the
Piper tourist plane into the 26th floor occurred
at 550 p.m. (1450 GMT) on Thursday, said
journalist Desideria Cavina.
MSNBC, 4/18/02, 1242 A small airplane crashed
into a government building in heart of Milan,
setting the top floors on fire, Italian police
reported. There were no immediate reports on
casualties as rescue workers attempted to clear
the area in the citys financial district.
FoxNews, 4/18/02, 1244 A small plane crashed
into the tallest building in downtown Milan
Thursday evening, causing smoke to pour out of
the top floors of the skyscraper. Officials said
the crash appeared to have been an accident and
that only the pilot was aboard.There were reports
that one person had died.
3
Where was the planes destination?
1 The plane was destined for Italys capital
Rome. 2 The plane was in route from Locarno in
Switzerland, to its destination, Rome, Italy.
3 The pilot was on a 20-minute flight from
Locarno, Switzerland to Milan. 4 The plane had
taken off from Locarno, Switzerland and was
heading to Milans Linate airport.
Answer extraction
1 Italys capital Rome 2 Rome, Italy
3 Milan
4 Milans Linate airport
4
Where was the planes destination?
D1S4 The plane was destined for Italys capital
Rome. D3S7 The plane was in route from Locarno
in Switzerland, to its destination, Rome,
Italy. D4S4 The pilot was on a 20-minute flight
from Locarno, Switzerland to Milan. D5S5 The
aircraft had taken off from Locarno, Switzerland,
and was heading to Milans Linate airport.
5
Where was the planes destination?
The plane was destined for Italys capital Rome.
The plane was in route from Locarno in
Switzerland, to its destination, Rome,
Italy. The pilot was on a 20-minute flight from
Locarno, Switzerland to Milan. The aircraft had
taken off from Locarno, Switzerland, and was
heading to Milans Linate airport.
6
Where was the planes destination?
The plane was destined for Italys capital Rome.
RELEVANT The plane was in route from Locarno in
Switzerland, to its destination, Rome,
Italy. RELEVANT The pilot was on a 20-minute
flight from Locarno, Switzerland to Milan. NOT
RELEVANT The aircraft had taken off from
Locarno, Switzerland, and was heading to
Milans Linate airport. NOT RELEVANT
7
Where was the planes destination?
The plane was destined for Italys capital
Rome. RELEVANT The plane was in route from
Locarno in Switzerland, to its destination, Rome,
Italy. RELEVANT The pilot was on a 20-minute
flight from Locarno, Switzerland to
Milan. RELEVANT The aircraft had taken off from
Locarno, Switzerland, and was heading to
Milans Linate airport. RELEVANT
8
Graph-based methods

Sentence similarity graphs
Nodes are sentences
Edges between similar sentences
Find important sentences in graph
Random walks on the Markov chain
View each sentence as state
Normalized adjacency matrix is stochastic
Stationary distribution long-run probability of
ending up in a given state

9
Graph-based methods

LexRank Erkan Radev, 04
Ranking sentences for generic, MDS
Random walker moves to adjacent node or takes a
random jump
Biased LexRank
Ranking sentences for question-focused retrieval
Random walker moves to adjacent node or jumps to
a node similar to the question

10
LexRank
Square adjacency matrix
threshold 0.10
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
0.12
11
LexRank
Square adjacency matrix
threshold 0.10
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
0.12
12
LexRank

PageRank Brin Page, 98
B normalized adjacency matrix
U uniform matrix
d random walker jumps to any state determined
empirically
p obtained by power method

13
Biased LexRank
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
Q
0.12
0.25
14
Biased LexRank

Relevance to the question
Allan et al., 2003

15
Biased LexRank

Mixture model
C all sentences in the cluster
sim(s,v) IDF-weighted cosine similarity
d question bias
A for a given i all elements in ith column are
proportional to rel(iq)
B each B(i,j) is proportional to sim(i,j)

16
Biased LexRank Experiments

Baseline model
Consider similarity to question only
3 phases
Training determined which ranges of d and a to
use for our task
Development/testing re-evaluated all
combinations that outperformed the baseline
Testing evaluated best model

17
Corpus

20 clusters (1136)
Breaking news various sources
341 total question and answer sets
Question annotation
Read all documents
Generate a list of key questions important to
understanding the story
Factual concern a single fact

18
Corpus

Sentence relevance judgments
Read documents in chronological order
Does sentence X contain the answer to question Y?
Interjudge agreement
2 judges annotated first 9 news stories
Kappa Carletta, 96 0.68

19
Examples

Milan plane crash 9 articles from 5 sources
How many people were in the building?
How many people were injured?
DC sniper 8 articles from 6 sources
What kinds of weapons were used in the killings?
How many people were attacked?

20
Evaluation Method

System extract with top-ranking 20 sentences
Multiple answers for a given question may
perform further processing
Compare to human extracts - all sentences chosen
by at least one judge
Total Reciprocal Document Rank (TRDR) Radev et
al., 00
Total of the reciprocal ranks of all relevant
sentences found by the system
Mean Reciprocal Rank (MRR) Voorhees Tice, 00
also reported
Average TRDR and MRR over all questions in a
given data set

21
Training Phase

Notation LRa,d
a is similarity threshold d is question bias
Effect of a and d on TRDR
High question bias is needed (d ? 0.90)
Similarity threshold between 0.14 and 0.20
If a is too high, LR does not find sentences that
are lexically diverse

22
Dev/Test Phase

Re-evaluate the top 4 configurations

23
Test Phase
24
Conclusions

Considering inter-sentence similarity can improve
retrieval
If baseline finds seed sentences, LexRank can be
used to propagate relevance
Tie breaker in cases where many or few
sentences are similar to the input query
Additional applications
Retrieval of sentences describing
protein-to-protein interactions in biomedical
articles

25
Future Work