A Support Vector Method for Optimizing Average Precision - PowerPoint PPT Presentation

About This Presentation

Title:

A Support Vector Method for Optimizing Average Precision

Description:

Training involves finding w which minimizes. subject to ... Finding Most Violated Constraint. Start with perfect ranking ... Find the best feasible ranking of ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 29

Provided by: Yison4

Category:

more less

Transcript and Presenter's Notes

Title: A Support Vector Method for Optimizing Average Precision

1
A Support Vector Method for Optimizing Average
Precision

SIGIR 2007
Yisong Yue
Cornell University
In Collaboration With
Thomas Finley, Filip Radlinski, Thorsten Joachims
(Cornell University)

2
Motivation

Learn to Rank Documents
Optimize for IR performance measures
Mean Average Precision
Leverage Structural SVMs
Tsochantaridis et al. 2005

3
MAP vs Accuracy

Average precision is the average of the precision
scores at the rank locations of each relevant
document.
Ex has average precision
Mean Average Precision (MAP) is the mean of the
Average Precision scores for a group of queries.
A machine learning algorithm optimizing for
Accuracy might learn a very different model than
optimizing for MAP.
Ex has average precision
of about 0.64, but has a max accuracy of 0.8 vs
0.6 in above ranking.

4
Recent Related Work

Greedy Local Search
Metzler Croft 2005 optimized for MAP, used
gradient descent, expensive for large number of
features.
Caruana et al. 2004 iteratively built an
ensemble to greedily improve arbitrary
performance measures.
Surrogate Performance Measures
Burges et al. 2005 used neural nets
optimizing for cross entropy.
Cao et al. 2006 used SVMs optimizing for
modified ROC-Area.
Relaxations
Xu Li 2007 used Boosting with exponential
loss relaxation

5
Conventional SVMs

Input examples denoted by x (high dimensional
point)
Output targets denoted by y (either 1 or -1)
SVMs learns a hyperplane w, predictions are
sign(wTx)
Training involves finding w which minimizes
subject to
The sum of slacks upper bounds the
accuracy loss

6
Conventional SVMs

Input examples denoted by x (high dimensional
point)
Output targets denoted by y (either 1 or -1)
SVMs learns a hyperplane w, predictions are
sign(wTx)
Training involves finding w which minimizes
subject to
The sum of slacks upper bounds the
accuracy loss

7
Adapting to Average Precision

Let x denote the set of documents/query examples
for a query
Let y denote a (weak) ranking (each yij 2
-1,0,1)
Same objective function
Constraints are defined for each incorrect
labeling y over the set of documents x.
Joint discriminant score for the correct labeling
at least as large as incorrect labeling plus the
performance loss.

8
Adapting to Average Precision

Maximize
subject to
where
and
Sum of slacks upper bound MAP loss.
After learning w, a prediction is made by sorting
on wTxi

9
Adapting to Average Precision

Maximize
subject to
where
and
Sum of slacks upper bound MAP loss.
After learning w, a prediction is made by sorting
on wTxi

10
Too Many Constraints!

For Average Precision, the true labeling is a
ranking where the relevant documents are all
ranked in the front, e.g.,
An incorrect labeling would be any other ranking,
e.g.,
This ranking has Average Precision of about 0.8
with ?(y,y) ¼ 0.2
Exponential number of rankings, thus an
exponential number of constraints!

11
Structural SVM Training

STEP 1 Solve the SVM objective function using
only the current working set of constraints.
STEP 2 Using the model learned in STEP 1, find
the most violated constraint from the exponential
set of constraints.
STEP 3 If the constraint returned in STEP 2 is
more violated than the most violated constraint
the working set by some small constant, add that
constraint to the working set.
Repeat STEP 1-3 until no additional constraints
are added. Return the most recent model that was
trained in STEP 1.

STEP 1-3 is guaranteed to loop for at most a
polynomial number of iterations. Tsochantaridis
et al. 2005
12
Illustrative Example

Original SVM Problem
Exponential constraints
Most are dominated by a small set of important
constraints

Structural SVM Approach
Repeatedly finds the next most violated
constraint
until set of constraints is a good
approximation.

13
Illustrative Example

Original SVM Problem
Exponential constraints
Most are dominated by a small set of important
constraints

Structural SVM Approach
Repeatedly finds the next most violated
constraint
until set of constraints is a good
approximation.

14
Illustrative Example

Original SVM Problem
Exponential constraints
Most are dominated by a small set of important
constraints

Structural SVM Approach
Repeatedly finds the next most violated
constraint
until set of constraints is a good
approximation.

15
Illustrative Example

Original SVM Problem
Exponential constraints
Most are dominated by a small set of important
constraints

Structural SVM Approach
Repeatedly finds the next most violated
constraint
until set of constraints is a good
approximation.

16
Finding Most Violated Constraint

Structural SVM is an oracle framework.
Requires subroutine to find the most violated
constraint.
Dependent on formulation of loss function and
joint feature representation.
Exponential number of constraints!
Efficient algorithm in the case of optimizing MAP.

17
Finding Most Violated Constraint

Observation
MAP is invariant on the order of documents within
a relevance class
Swapping two relevant or non-relevant documents
does not change MAP.
Joint SVM score is optimized by sorting by
document score, wTx
Reduces to finding an interleaving
between two sorted lists of documents

18
Finding Most Violated Constraint

Start with perfect ranking
Consider swapping adjacent relevant/non-relevant
documents

?
19
Finding Most Violated Constraint

Start with perfect ranking
Consider swapping adjacent relevant/non-relevant
documents
Find the best feasible ranking of the
non-relevant document

?
20
Finding Most Violated Constraint

Start with perfect ranking
Consider swapping adjacent relevant/non-relevant
documents
Find the best feasible ranking of the
non-relevant document
Repeat for next non-relevant document

?
21
Finding Most Violated Constraint

Start with perfect ranking
Consider swapping adjacent relevant/non-relevant
documents
Find the best feasible ranking of the
non-relevant document
Repeat for next non-relevant document
Never want to swap past previous non-relevant
document

?
22
Finding Most Violated Constraint

Start with perfect ranking
Consider swapping adjacent relevant/non-relevant
documents
Find the best feasible ranking of the
non-relevant document
Repeat for next non-relevant document
Never want to swap past previous non-relevant
document
Repeat until all non-relevant documents have been
considered

?
23
Quick Recap

SVM Formulation
SVMs optimize a tradeoff between model complexity
and MAP loss
Exponential number of constraints (one for each
incorrect ranking)
Structural SVMs finds a small subset of important
constraints
Requires sub-procedure to find most violated
constraint
Find Most Violated Constraint
Loss function invariant to re-ordering of
relevant documents
SVM score imposes an ordering of the relevant
documents
Finding interleaving of two sorted lists
Loss function has certain monotonic properties
Efficient algorithm

24
Experiments

Used TREC 9 10 Web Track corpus.
Features of document/query pairs computed from
outputs of existing retrieval functions.
(Indri Retrieval Functions TREC
Submissions)
Goal is to learn a recombination of outputs which
improves mean average precision.

25
(No Transcript)
26
(No Transcript)
27
Moving Forward

Approach also works (in theory) for other
measures.
Some promising results when optimizing for NDCG
(with only 1 level of relevance).
Currently working on optimizing for NDCG with
multiple levels of relevance.
Preliminary MRR results not as promising.

28
Conclusions