Optimizing FMeasure with Support Vector Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Optimizing FMeasure with Support Vector Machines

Description:

Classification algorithms often evaluated by test set accuracy ... Approximate with sigmoid: Can we connect with standard SVM? Slide 15 ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 19
Provided by: davidrm2
Category:

less

Transcript and Presenter's Notes

Title: Optimizing FMeasure with Support Vector Machines


1
Optimizing F-Measurewith Support Vector Machines
David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS
2003 Tuesday, May 13, 2003
Carleton College
2
Overview
  • Classification algorithms often evaluated by test
    set accuracy
  • Test set accuracy can be a poor measure when one
    of the classes is rare
  • Support Vector Machines (SVMs) are designed to
    optimize test set accuracy
  • SVMs have been used in an ad-hoc manner on
    datasets with rare classes
  • Our new results current ad-hoc heuristic
    techniques can be theoretically justified.

3
Roadmap
  • The Traditional SVM and variants
  • Precision, Recall, and F-measure metrics
  • The F-measure Maximizing SVM
  • Equivalence of traditional SVM and F-measure SVM
    (for the right parameters)
  • Implications and Conclusions

4
The Classification Problem
A
A-
  • margin

Separating Surface
5
The Classification Problem
  • Given m points in the n dimensional space Rn
  • Each point represented as xi
  • Membership of each point Ai in the classes A or
    A- is specified by yi 1
  • Separate by two bounding planes such that
  • More succinctlyfor i1,2,,m.

6
Misclassification Count SVM
  • () is the step function (1 if ? gt 0, 0
    otherwise)
  • Push the planes apart, and minimize number of
    misclassified points.
  • C balances two competing objectives
  • Minimizing w 0 w pushes planes apart
  • Problem NP-complete, objective non-differentiable

7
Approx Misclassification Count SVM
  • where we use some differentiable approximation,
    such as
  • ? gt 0 is an arbitrary fixed constant that
    determines closeness of approximation.
  • This is still difficult to solve.

8
Standard Soft Margin SVM
  • Push the planes apart, and minimize distance of
    misclassified points.
  • We minimize total distances from misclassified
    points to bounding planes, not actual number of
    them.
  • Much more tractable, does quite well in
    optimizing accuracy
  • Does poorly when one class is rare

9
Weighted Standard SVM
  • Push the planes apart, and minimize weighted
    distance of misclassified points.
  • Allows one to choose different C values for the
    two classes.
  • Often used to weight rare class more heavily.
  • How do we measure success when one class is rare?
    Assuming that A is the rare class

10
Measures of success
  • Precision and Recall are better descriptors when
    one class is rare.

11
F-measure
  • F-measure commonly used average of precision
    and recall
  • Can C and C- in the weighted SVM be balanced to
    optimize F-measure?
  • Can we start over and invent an SVM to optimize
    F-measure?

12
Constructing an F-measure SVM
  • How do we appropriately represent F-measure in an
    SVM?
  • Substitute P and R into F
  • Thus to maximize F-measure, we minimize

13
Constructing an F-measure SVM
  • Want to minimize
  • FP misclassified A-FN misclassified A
  • New F-measure maximizing SVM

14
The F-measure Maximizing SVM
  • Approximate with sigmoid
  • Can we connect with standard SVM?

15
Weighted misclassification count SVM
F-measure maximizing SVM
  • How do these two formulations relate?
  • We show
  • Pick a parameter C.
  • Find classifier to optimize F-measure SVM.
  • There exist parameters C and C- such that
    misclassification counting SVM has same solution.
  • Proof and formulas to obtain C and C- in paper.

16
Implications of result
  • Since there exist C, C- to yield same solution
    as F-measure maximizing SVM, finding best C and
    C- for the weighted standard SVM is the right
    thing to do.(modulo approximations)
  • In practice, common trick is to choose C, C-
    such thatThis heuristic seems reasonable but
    is not optimal. (Good first guess?)

17
Implications of result
  • Suppose that SVM fails to provide good F-measure
    for a given problem, for a wide range of C and
    C- values.
  • Q Is there another SVM formulation that would
    yield better F-measure?A Our evidence suggests
    not.
  • Q Is there another SVM formulation that would
    find best possible F-measure more directly?A
    Yes, the F-measure maximizing SVM.

18
Conclusions / Summary
  • We provide theoretical evidence that standard
    heuristic practices in using SVMs for optimizing
    F-measure are reasonable.
  • We provide a framework for continued research in
    F-measure maximizing SVMs.
  • All our results apply directly to SVMs with
    kernels (see paper).
  • Future work attacking F-measure maximizing SVM
    directly to find faster algorithms.

19
The Classification Problem
A
A-
  • Which line is the better classifier?

20
The Classification Problem
A
A-
  • margin

Separating Surface
21
Hard Margin SVM
  • Push the planes as far apart as possible, while
    maintaining points on proper sides of bounding
    planes.
  • Distance between planes
  • Minimizing w 0 w pushes planes apart.
  • What if there are no planes that correctly
    separate classes?
Write a Comment
User Comments (0)
About PowerShow.com