A Support Vector Method for Optimizing Average Precision - PowerPoint PPT Presentation

About This Presentation
Title:

A Support Vector Method for Optimizing Average Precision

Description:

Training involves finding w which minimizes. subject to ... Finding Most Violated Constraint. Start with perfect ranking ... Find the best feasible ranking of ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 29
Provided by: Yison4
Category:

less

Transcript and Presenter's Notes

Title: A Support Vector Method for Optimizing Average Precision


1
A Support Vector Method for Optimizing Average
Precision
  • SIGIR 2007
  • Yisong Yue
  • Cornell University
  • In Collaboration With
  • Thomas Finley, Filip Radlinski, Thorsten Joachims
  • (Cornell University)

2
Motivation
  • Learn to Rank Documents
  • Optimize for IR performance measures
  • Mean Average Precision
  • Leverage Structural SVMs
  • Tsochantaridis et al. 2005

3
MAP vs Accuracy
  • Average precision is the average of the precision
    scores at the rank locations of each relevant
    document.
  • Ex has average precision
  • Mean Average Precision (MAP) is the mean of the
    Average Precision scores for a group of queries.
  • A machine learning algorithm optimizing for
    Accuracy might learn a very different model than
    optimizing for MAP.
  • Ex has average precision
    of about 0.64, but has a max accuracy of 0.8 vs
    0.6 in above ranking.

4
Recent Related Work
  • Greedy Local Search
  • Metzler Croft 2005 optimized for MAP, used
    gradient descent, expensive for large number of
    features.
  • Caruana et al. 2004 iteratively built an
    ensemble to greedily improve arbitrary
    performance measures.
  • Surrogate Performance Measures
  • Burges et al. 2005 used neural nets
    optimizing for cross entropy.
  • Cao et al. 2006 used SVMs optimizing for
    modified ROC-Area.
  • Relaxations
  • Xu Li 2007 used Boosting with exponential
    loss relaxation

5
Conventional SVMs
  • Input examples denoted by x (high dimensional
    point)
  • Output targets denoted by y (either 1 or -1)
  • SVMs learns a hyperplane w, predictions are
    sign(wTx)
  • Training involves finding w which minimizes
  • subject to
  • The sum of slacks upper bounds the
    accuracy loss

6
Conventional SVMs
  • Input examples denoted by x (high dimensional
    point)
  • Output targets denoted by y (either 1 or -1)
  • SVMs learns a hyperplane w, predictions are
    sign(wTx)
  • Training involves finding w which minimizes
  • subject to
  • The sum of slacks upper bounds the
    accuracy loss

7
Adapting to Average Precision
  • Let x denote the set of documents/query examples
    for a query
  • Let y denote a (weak) ranking (each yij 2
    -1,0,1)
  • Same objective function
  • Constraints are defined for each incorrect
    labeling y over the set of documents x.
  • Joint discriminant score for the correct labeling
    at least as large as incorrect labeling plus the
    performance loss.

8
Adapting to Average Precision
  • Maximize
  • subject to
  • where
  • and
  • Sum of slacks upper bound MAP loss.
  • After learning w, a prediction is made by sorting
    on wTxi

9
Adapting to Average Precision
  • Maximize
  • subject to
  • where
  • and
  • Sum of slacks upper bound MAP loss.
  • After learning w, a prediction is made by sorting
    on wTxi

10
Too Many Constraints!
  • For Average Precision, the true labeling is a
    ranking where the relevant documents are all
    ranked in the front, e.g.,
  • An incorrect labeling would be any other ranking,
    e.g.,
  • This ranking has Average Precision of about 0.8
    with ?(y,y) ¼ 0.2
  • Exponential number of rankings, thus an
    exponential number of constraints!

11
Structural SVM Training
  • STEP 1 Solve the SVM objective function using
    only the current working set of constraints.
  • STEP 2 Using the model learned in STEP 1, find
    the most violated constraint from the exponential
    set of constraints.
  • STEP 3 If the constraint returned in STEP 2 is
    more violated than the most violated constraint
    the working set by some small constant, add that
    constraint to the working set.
  • Repeat STEP 1-3 until no additional constraints
    are added. Return the most recent model that was
    trained in STEP 1.

STEP 1-3 is guaranteed to loop for at most a
polynomial number of iterations. Tsochantaridis
et al. 2005
12
Illustrative Example
  • Original SVM Problem
  • Exponential constraints
  • Most are dominated by a small set of important
    constraints
  • Structural SVM Approach
  • Repeatedly finds the next most violated
    constraint
  • until set of constraints is a good
    approximation.

13
Illustrative Example
  • Original SVM Problem
  • Exponential constraints
  • Most are dominated by a small set of important
    constraints
  • Structural SVM Approach
  • Repeatedly finds the next most violated
    constraint
  • until set of constraints is a good
    approximation.

14
Illustrative Example
  • Original SVM Problem
  • Exponential constraints
  • Most are dominated by a small set of important
    constraints
  • Structural SVM Approach
  • Repeatedly finds the next most violated
    constraint
  • until set of constraints is a good
    approximation.

15
Illustrative Example
  • Original SVM Problem
  • Exponential constraints
  • Most are dominated by a small set of important
    constraints
  • Structural SVM Approach
  • Repeatedly finds the next most violated
    constraint
  • until set of constraints is a good
    approximation.

16
Finding Most Violated Constraint
  • Structural SVM is an oracle framework.
  • Requires subroutine to find the most violated
    constraint.
  • Dependent on formulation of loss function and
    joint feature representation.
  • Exponential number of constraints!
  • Efficient algorithm in the case of optimizing MAP.

17
Finding Most Violated Constraint
  • Observation
  • MAP is invariant on the order of documents within
    a relevance class
  • Swapping two relevant or non-relevant documents
    does not change MAP.
  • Joint SVM score is optimized by sorting by
    document score, wTx
  • Reduces to finding an interleaving
  • between two sorted lists of documents

18
Finding Most Violated Constraint
  • Start with perfect ranking
  • Consider swapping adjacent relevant/non-relevant
    documents

?
19
Finding Most Violated Constraint
  • Start with perfect ranking
  • Consider swapping adjacent relevant/non-relevant
    documents
  • Find the best feasible ranking of the
    non-relevant document

?
20
Finding Most Violated Constraint
  • Start with perfect ranking
  • Consider swapping adjacent relevant/non-relevant
    documents
  • Find the best feasible ranking of the
    non-relevant document
  • Repeat for next non-relevant document

?
21
Finding Most Violated Constraint
  • Start with perfect ranking
  • Consider swapping adjacent relevant/non-relevant
    documents
  • Find the best feasible ranking of the
    non-relevant document
  • Repeat for next non-relevant document
  • Never want to swap past previous non-relevant
    document

?
22
Finding Most Violated Constraint
  • Start with perfect ranking
  • Consider swapping adjacent relevant/non-relevant
    documents
  • Find the best feasible ranking of the
    non-relevant document
  • Repeat for next non-relevant document
  • Never want to swap past previous non-relevant
    document
  • Repeat until all non-relevant documents have been
    considered

?
23
Quick Recap
  • SVM Formulation
  • SVMs optimize a tradeoff between model complexity
    and MAP loss
  • Exponential number of constraints (one for each
    incorrect ranking)
  • Structural SVMs finds a small subset of important
    constraints
  • Requires sub-procedure to find most violated
    constraint
  • Find Most Violated Constraint
  • Loss function invariant to re-ordering of
    relevant documents
  • SVM score imposes an ordering of the relevant
    documents
  • Finding interleaving of two sorted lists
  • Loss function has certain monotonic properties
  • Efficient algorithm

24
Experiments
  • Used TREC 9 10 Web Track corpus.
  • Features of document/query pairs computed from
    outputs of existing retrieval functions.
  • (Indri Retrieval Functions TREC
    Submissions)
  • Goal is to learn a recombination of outputs which
    improves mean average precision.

25
(No Transcript)
26
(No Transcript)
27
Moving Forward
  • Approach also works (in theory) for other
    measures.
  • Some promising results when optimizing for NDCG
    (with only 1 level of relevance).
  • Currently working on optimizing for NDCG with
    multiple levels of relevance.
  • Preliminary MRR results not as promising.

28
Conclusions
  • Principled approach to optimizing average
    precision.
  • (avoids difficult to control heuristics)
  • Performs at least as well as alternative SVM
    methods.
  • Can be generalized to a large class of rank-based
    performance measures.
  • Software available at http//svmrank.yisongyue.com
Write a Comment
User Comments (0)
About PowerShow.com