Using Random Walks for Questionfocused Sentence Retrieval - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Using Random Walks for Questionfocused Sentence Retrieval

Description:

[CNN, 4/18/02, 12:22] ... Breaking news; various sources. 341 total question and answer sets. Question annotation ... retrieval from dynamic news stories ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 27
Provided by: jahnaott
Category:

less

Transcript and Presenter's Notes

Title: Using Random Walks for Questionfocused Sentence Retrieval


1
Using Random Walks for Question-focused Sentence
Retrieval
Jahna Otterbacher, Gunes Erkan, Dragomir
Radev Computational Linguistics and Information
Retrieval (CLAIR) http//tangra.si.umich.edu/clair
2
Milan plane crash
CNN, 4/18/02, 1222 A small plane has hit a
skyscraper in central Milan, setting the top
floors of the 30-story building on fire, an
Italian journalist told CNN. The crash by the
Piper tourist plane into the 26th floor occurred
at 550 p.m. (1450 GMT) on Thursday, said
journalist Desideria Cavina.
MSNBC, 4/18/02, 1242 A small airplane crashed
into a government building in heart of Milan,
setting the top floors on fire, Italian police
reported. There were no immediate reports on
casualties as rescue workers attempted to clear
the area in the citys financial district.
FoxNews, 4/18/02, 1244 A small plane crashed
into the tallest building in downtown Milan
Thursday evening, causing smoke to pour out of
the top floors of the skyscraper. Officials said
the crash appeared to have been an accident and
that only the pilot was aboard.There were reports
that one person had died.
3
Where was the planes destination?
1 The plane was destined for Italys capital
Rome. 2 The plane was in route from Locarno in
Switzerland, to its destination, Rome, Italy.
3 The pilot was on a 20-minute flight from
Locarno, Switzerland to Milan. 4 The plane had
taken off from Locarno, Switzerland and was
heading to Milans Linate airport.
Answer extraction
1 Italys capital Rome 2 Rome, Italy
3 Milan
4 Milans Linate airport
4
Where was the planes destination?
D1S4 The plane was destined for Italys capital
Rome. D3S7 The plane was in route from Locarno
in Switzerland, to its destination, Rome,
Italy. D4S4 The pilot was on a 20-minute flight
from Locarno, Switzerland to Milan. D5S5 The
aircraft had taken off from Locarno, Switzerland,
and was heading to Milans Linate airport.
5
Where was the planes destination?
The plane was destined for Italys capital Rome.
The plane was in route from Locarno in
Switzerland, to its destination, Rome,
Italy. The pilot was on a 20-minute flight from
Locarno, Switzerland to Milan. The aircraft had
taken off from Locarno, Switzerland, and was
heading to Milans Linate airport.
6
Where was the planes destination?
The plane was destined for Italys capital Rome.
RELEVANT The plane was in route from Locarno in
Switzerland, to its destination, Rome,
Italy. RELEVANT The pilot was on a 20-minute
flight from Locarno, Switzerland to Milan. NOT
RELEVANT The aircraft had taken off from
Locarno, Switzerland, and was heading to
Milans Linate airport. NOT RELEVANT
7
Where was the planes destination?
The plane was destined for Italys capital
Rome. RELEVANT The plane was in route from
Locarno in Switzerland, to its destination, Rome,
Italy. RELEVANT The pilot was on a 20-minute
flight from Locarno, Switzerland to
Milan. RELEVANT The aircraft had taken off from
Locarno, Switzerland, and was heading to
Milans Linate airport. RELEVANT
8
Graph-based methods
  • Sentence similarity graphs
  • Nodes are sentences
  • Edges between similar sentences
  • Find important sentences in graph
  • Random walks on the Markov chain
  • View each sentence as state
  • Normalized adjacency matrix is stochastic
  • Stationary distribution long-run probability of
    ending up in a given state

9
Graph-based methods
  • LexRank Erkan Radev, 04
  • Ranking sentences for generic, MDS
  • Random walker moves to adjacent node or takes a
    random jump
  • Biased LexRank
  • Ranking sentences for question-focused retrieval
  • Random walker moves to adjacent node or jumps to
    a node similar to the question

10
LexRank
Square adjacency matrix
threshold 0.10
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
0.12
11
LexRank
Square adjacency matrix
threshold 0.10
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
0.12
12
LexRank
  • PageRank Brin Page, 98
  • B normalized adjacency matrix
  • U uniform matrix
  • d random walker jumps to any state determined
    empirically
  • p obtained by power method

13
Biased LexRank
0.12
0.12
S1
S6
S8
0.12
0.25
0.12
0.25
0.14
S2
S7
0.30
0.12
S5
0.12
0.12
S3
S4
Q
0.12
0.25
14
Biased LexRank
  • Relevance to the question
  • Allan et al., 2003

15
Biased LexRank
  • Mixture model
  • C all sentences in the cluster
  • sim(s,v) IDF-weighted cosine similarity
  • d question bias
  • A for a given i all elements in ith column are
    proportional to rel(iq)
  • B each B(i,j) is proportional to sim(i,j)

16
Biased LexRank Experiments
  • Baseline model
  • Consider similarity to question only
  • 3 phases
  • Training determined which ranges of d and a to
    use for our task
  • Development/testing re-evaluated all
    combinations that outperformed the baseline
  • Testing evaluated best model

17
Corpus
  • 20 clusters (1136)
  • Breaking news various sources
  • 341 total question and answer sets
  • Question annotation
  • Read all documents
  • Generate a list of key questions important to
    understanding the story
  • Factual concern a single fact

18
Corpus
  • Sentence relevance judgments
  • Read documents in chronological order
  • Does sentence X contain the answer to question Y?
  • Interjudge agreement
  • 2 judges annotated first 9 news stories
  • Kappa Carletta, 96 0.68

19
Examples
  • Milan plane crash 9 articles from 5 sources
  • How many people were in the building?
  • How many people were injured?
  • DC sniper 8 articles from 6 sources
  • What kinds of weapons were used in the killings?
  • How many people were attacked?

20
Evaluation Method
  • System extract with top-ranking 20 sentences
  • Multiple answers for a given question may
    perform further processing
  • Compare to human extracts - all sentences chosen
    by at least one judge
  • Total Reciprocal Document Rank (TRDR) Radev et
    al., 00
  • Total of the reciprocal ranks of all relevant
    sentences found by the system
  • Mean Reciprocal Rank (MRR) Voorhees Tice, 00
    also reported
  • Average TRDR and MRR over all questions in a
    given data set

21
Training Phase
  • Notation LRa,d
  • a is similarity threshold d is question bias
  • Effect of a and d on TRDR
  • High question bias is needed (d ? 0.90)
  • Similarity threshold between 0.14 and 0.20
  • If a is too high, LR does not find sentences that
    are lexically diverse

22
Dev/Test Phase
  • Re-evaluate the top 4 configurations

23
Test Phase
24
Conclusions
  • Considering inter-sentence similarity can improve
    retrieval
  • If baseline finds seed sentences, LexRank can be
    used to propagate relevance
  • Tie breaker in cases where many or few
    sentences are similar to the input query
  • Additional applications
  • Retrieval of sentences describing
    protein-to-protein interactions in biomedical
    articles

25
Future Work
  • Experiment with mixed LR strategies
  • Given features of the input question or sentences
  • Motivated the problem of sentence retrieval from
    dynamic news stories
  • LexRank method does not specifically address the
    time-dependency issue

26
Thank You
Write a Comment
User Comments (0)
About PowerShow.com