Title: A Support Vector Method for Optimizing Average Precision
1A Support Vector Method for Optimizing Average
Precision
- SIGIR 2007
- Yisong Yue
- Cornell University
- In Collaboration With
- Thomas Finley, Filip Radlinski, Thorsten Joachims
- (Cornell University)
2Motivation
- Learn to Rank Documents
- Optimize for IR performance measures
- Mean Average Precision
- Leverage Structural SVMs
- Tsochantaridis et al. 2005
3MAP vs Accuracy
- Average precision is the average of the precision
scores at the rank locations of each relevant
document. - Ex has average precision
- Mean Average Precision (MAP) is the mean of the
Average Precision scores for a group of queries. - A machine learning algorithm optimizing for
Accuracy might learn a very different model than
optimizing for MAP. - Ex has average precision
of about 0.64, but has a max accuracy of 0.8 vs
0.6 in above ranking.
4Recent Related Work
- Greedy Local Search
- Metzler Croft 2005 optimized for MAP, used
gradient descent, expensive for large number of
features. - Caruana et al. 2004 iteratively built an
ensemble to greedily improve arbitrary
performance measures. - Surrogate Performance Measures
- Burges et al. 2005 used neural nets
optimizing for cross entropy. - Cao et al. 2006 used SVMs optimizing for
modified ROC-Area. - Relaxations
- Xu Li 2007 used Boosting with exponential
loss relaxation
5Conventional SVMs
- Input examples denoted by x (high dimensional
point) - Output targets denoted by y (either 1 or -1)
- SVMs learns a hyperplane w, predictions are
sign(wTx) - Training involves finding w which minimizes
- subject to
- The sum of slacks upper bounds the
accuracy loss
6Conventional SVMs
- Input examples denoted by x (high dimensional
point) - Output targets denoted by y (either 1 or -1)
- SVMs learns a hyperplane w, predictions are
sign(wTx) - Training involves finding w which minimizes
- subject to
- The sum of slacks upper bounds the
accuracy loss
7Adapting to Average Precision
- Let x denote the set of documents/query examples
for a query - Let y denote a (weak) ranking (each yij 2
-1,0,1) - Same objective function
- Constraints are defined for each incorrect
labeling y over the set of documents x. - Joint discriminant score for the correct labeling
at least as large as incorrect labeling plus the
performance loss. -
8Adapting to Average Precision
- Maximize
- subject to
- where
- and
- Sum of slacks upper bound MAP loss.
- After learning w, a prediction is made by sorting
on wTxi
9Adapting to Average Precision
- Maximize
- subject to
- where
- and
- Sum of slacks upper bound MAP loss.
- After learning w, a prediction is made by sorting
on wTxi
10Too Many Constraints!
- For Average Precision, the true labeling is a
ranking where the relevant documents are all
ranked in the front, e.g., - An incorrect labeling would be any other ranking,
e.g., - This ranking has Average Precision of about 0.8
with ?(y,y) ¼ 0.2 - Exponential number of rankings, thus an
exponential number of constraints!
11Structural SVM Training
- STEP 1 Solve the SVM objective function using
only the current working set of constraints. - STEP 2 Using the model learned in STEP 1, find
the most violated constraint from the exponential
set of constraints. - STEP 3 If the constraint returned in STEP 2 is
more violated than the most violated constraint
the working set by some small constant, add that
constraint to the working set. - Repeat STEP 1-3 until no additional constraints
are added. Return the most recent model that was
trained in STEP 1.
STEP 1-3 is guaranteed to loop for at most a
polynomial number of iterations. Tsochantaridis
et al. 2005
12Illustrative Example
- Original SVM Problem
- Exponential constraints
- Most are dominated by a small set of important
constraints
- Structural SVM Approach
- Repeatedly finds the next most violated
constraint - until set of constraints is a good
approximation.
13Illustrative Example
- Original SVM Problem
- Exponential constraints
- Most are dominated by a small set of important
constraints
- Structural SVM Approach
- Repeatedly finds the next most violated
constraint - until set of constraints is a good
approximation.
14Illustrative Example
- Original SVM Problem
- Exponential constraints
- Most are dominated by a small set of important
constraints
- Structural SVM Approach
- Repeatedly finds the next most violated
constraint - until set of constraints is a good
approximation.
15Illustrative Example
- Original SVM Problem
- Exponential constraints
- Most are dominated by a small set of important
constraints
- Structural SVM Approach
- Repeatedly finds the next most violated
constraint - until set of constraints is a good
approximation.
16Finding Most Violated Constraint
- Structural SVM is an oracle framework.
- Requires subroutine to find the most violated
constraint. - Dependent on formulation of loss function and
joint feature representation. - Exponential number of constraints!
- Efficient algorithm in the case of optimizing MAP.
17Finding Most Violated Constraint
- Observation
- MAP is invariant on the order of documents within
a relevance class - Swapping two relevant or non-relevant documents
does not change MAP. - Joint SVM score is optimized by sorting by
document score, wTx - Reduces to finding an interleaving
- between two sorted lists of documents
18Finding Most Violated Constraint
- Start with perfect ranking
- Consider swapping adjacent relevant/non-relevant
documents
?
19Finding Most Violated Constraint
- Start with perfect ranking
- Consider swapping adjacent relevant/non-relevant
documents - Find the best feasible ranking of the
non-relevant document
?
20Finding Most Violated Constraint
- Start with perfect ranking
- Consider swapping adjacent relevant/non-relevant
documents - Find the best feasible ranking of the
non-relevant document - Repeat for next non-relevant document
?
21Finding Most Violated Constraint
- Start with perfect ranking
- Consider swapping adjacent relevant/non-relevant
documents - Find the best feasible ranking of the
non-relevant document - Repeat for next non-relevant document
- Never want to swap past previous non-relevant
document
?
22Finding Most Violated Constraint
- Start with perfect ranking
- Consider swapping adjacent relevant/non-relevant
documents - Find the best feasible ranking of the
non-relevant document - Repeat for next non-relevant document
- Never want to swap past previous non-relevant
document - Repeat until all non-relevant documents have been
considered
?
23Quick Recap
- SVM Formulation
- SVMs optimize a tradeoff between model complexity
and MAP loss - Exponential number of constraints (one for each
incorrect ranking) - Structural SVMs finds a small subset of important
constraints - Requires sub-procedure to find most violated
constraint - Find Most Violated Constraint
- Loss function invariant to re-ordering of
relevant documents - SVM score imposes an ordering of the relevant
documents - Finding interleaving of two sorted lists
- Loss function has certain monotonic properties
- Efficient algorithm
24Experiments
- Used TREC 9 10 Web Track corpus.
- Features of document/query pairs computed from
outputs of existing retrieval functions. - (Indri Retrieval Functions TREC
Submissions) - Goal is to learn a recombination of outputs which
improves mean average precision.
25(No Transcript)
26(No Transcript)
27Moving Forward
- Approach also works (in theory) for other
measures. - Some promising results when optimizing for NDCG
(with only 1 level of relevance). - Currently working on optimizing for NDCG with
multiple levels of relevance. - Preliminary MRR results not as promising.
28Conclusions
- Principled approach to optimizing average
precision. - (avoids difficult to control heuristics)
- Performs at least as well as alternative SVM
methods. - Can be generalized to a large class of rank-based
performance measures. - Software available at http//svmrank.yisongyue.com