Title: Optimizing FMeasure with Support Vector Machines
1Optimizing F-Measurewith Support Vector Machines
David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS
2003 Tuesday, May 13, 2003
Carleton College
2Overview
- Classification algorithms often evaluated by test
set accuracy - Test set accuracy can be a poor measure when one
of the classes is rare - Support Vector Machines (SVMs) are designed to
optimize test set accuracy - SVMs have been used in an ad-hoc manner on
datasets with rare classes - Our new results current ad-hoc heuristic
techniques can be theoretically justified.
3Roadmap
- The Traditional SVM and variants
- Precision, Recall, and F-measure metrics
- The F-measure Maximizing SVM
- Equivalence of traditional SVM and F-measure SVM
(for the right parameters) - Implications and Conclusions
4The Classification Problem
A
A-
Separating Surface
5The Classification Problem
- Given m points in the n dimensional space Rn
- Each point represented as xi
- Membership of each point Ai in the classes A or
A- is specified by yi 1 - Separate by two bounding planes such that
- More succinctlyfor i1,2,,m.
6Misclassification Count SVM
- () is the step function (1 if ? gt 0, 0
otherwise) - Push the planes apart, and minimize number of
misclassified points. - C balances two competing objectives
- Minimizing w 0 w pushes planes apart
- Problem NP-complete, objective non-differentiable
7Approx Misclassification Count SVM
- where we use some differentiable approximation,
such as
- ? gt 0 is an arbitrary fixed constant that
determines closeness of approximation. - This is still difficult to solve.
8Standard Soft Margin SVM
- Push the planes apart, and minimize distance of
misclassified points. - We minimize total distances from misclassified
points to bounding planes, not actual number of
them. - Much more tractable, does quite well in
optimizing accuracy - Does poorly when one class is rare
9Weighted Standard SVM
- Push the planes apart, and minimize weighted
distance of misclassified points. - Allows one to choose different C values for the
two classes. - Often used to weight rare class more heavily.
- How do we measure success when one class is rare?
Assuming that A is the rare class
10Measures of success
- Precision and Recall are better descriptors when
one class is rare.
11F-measure
- F-measure commonly used average of precision
and recall - Can C and C- in the weighted SVM be balanced to
optimize F-measure? - Can we start over and invent an SVM to optimize
F-measure?
12Constructing an F-measure SVM
- How do we appropriately represent F-measure in an
SVM? - Substitute P and R into F
- Thus to maximize F-measure, we minimize
13Constructing an F-measure SVM
- Want to minimize
- FP misclassified A-FN misclassified A
- New F-measure maximizing SVM
14The F-measure Maximizing SVM
- Approximate with sigmoid
- Can we connect with standard SVM?
15Weighted misclassification count SVM
F-measure maximizing SVM
- How do these two formulations relate?
- We show
- Pick a parameter C.
- Find classifier to optimize F-measure SVM.
- There exist parameters C and C- such that
misclassification counting SVM has same solution. - Proof and formulas to obtain C and C- in paper.
16Implications of result
- Since there exist C, C- to yield same solution
as F-measure maximizing SVM, finding best C and
C- for the weighted standard SVM is the right
thing to do.(modulo approximations) - In practice, common trick is to choose C, C-
such thatThis heuristic seems reasonable but
is not optimal. (Good first guess?)
17Implications of result
- Suppose that SVM fails to provide good F-measure
for a given problem, for a wide range of C and
C- values. - Q Is there another SVM formulation that would
yield better F-measure?A Our evidence suggests
not. - Q Is there another SVM formulation that would
find best possible F-measure more directly?A
Yes, the F-measure maximizing SVM.
18Conclusions / Summary
- We provide theoretical evidence that standard
heuristic practices in using SVMs for optimizing
F-measure are reasonable. - We provide a framework for continued research in
F-measure maximizing SVMs. - All our results apply directly to SVMs with
kernels (see paper). - Future work attacking F-measure maximizing SVM
directly to find faster algorithms.
19The Classification Problem
A
A-
- Which line is the better classifier?
20The Classification Problem
A
A-
Separating Surface
21Hard Margin SVM
- Push the planes as far apart as possible, while
maintaining points on proper sides of bounding
planes. - Distance between planes
- Minimizing w 0 w pushes planes apart.
- What if there are no planes that correctly
separate classes?