Title: Considering Cost Asymmetry in Learning Classifiers
1Considering Cost Asymmetry in Learning
Classifiers
by Bach, Heckerman and Horvitz
Presented by Chunping Wang Machine Learning
Group, Duke University May 21, 2007
2Outline
- Introduction
- SVM with Asymmetric Cost
- SVM Regularization Path (Hastie et al., 2005)
- Path with Cost Asymmetry
- Results
- Conclusions
3Introduction (1)
Binary classification
real-valued predictors
binary response
A classifier could be defined as
based on a linear decision function
Parameters
4Introduction (2)
- Two types of misclassification
- false negative cost
- false positive cost
Expected cost
In terms of 0-1 loss function
Real loss function but Non-convex
Non-differentiable
5Introduction (3)
Convex loss functions surrogates for the 0-1
loss function (for training purpose)
6Introduction (4)
Empirical cost given n labeled data points
Objective function
asymmetry
regularization
Since convex surrogates of the 0-1 loss function
are used for training, the cost asymmetries for
training and testing are mismatched.
Motivation efficiently look at many training
asymmetries even if the testing asymmetry is
given.
7SVM with Asymmetric Cost (1)
hinge loss
SVM with asymmetric cost
where
8SVM with Asymmetric Cost (2)
The Lagrangian with dual variables
Karush-Kuhn-Tucker (KKT) conditions
9SVM with Asymmetric Cost (3)
The dual problem
where
A quadratic optimization problem given a cost
structure Computation will be intractable for the
whole space
Following the SVM regularization path algorithm
(Hastie et al., 2005), the authors deal with
(1)-(3) and KKT conditions instead of the dual
problem.
10SVM Regularization Path (1)
- Define active sets of data points
- Margin
- Left of margin
- Right of margin
KKT conditions
SVM regularization path
The cost is symmetric and thus searching is along
the axis.
11SVM Regularization Path (2)
Initialization ( )
Consider sufficiently large (C is very
small), all the points are in L
with
Decrease
Remain
One or more positive and negative examples hit
the margin simultaneously
12SVM Regularization Path (3)
Initialization ( )
Define
The critical condition for first two points
hitting the margin
For , this initial condition keeps
the same except the definition of .
13SVM Regularization Path (4)
- The path decrease , changes only for
except that one of the following events
happens - A point from L or R has entered M
- A point in M has left the set to join either R
or L
consider only the points on the margin
where is some function of
,
Therefore, the for points on the margin
proceed linearly in the function changes in a
piecewise-inverse manner in
14SVM Regularization Path (4)
- The path decrease , changes only for
except that one of the following events
happens - A point from L or R has entered M
- A point in M has left the set to join either R
or L
consider only the points on the margin
where is some function of
,
Therefore, the for points on the margin
proceed linearly in the function changes in a
piecewise-inverse manner in .
15SVM Regularization Path (5)
- Update regularization
- Update active sets and solutions
-
-
- Stopping condition
- In the separable case, we terminate when
L become empty - In the non-separable case, we terminate
when
for all the possible events
16Path with Cost Asymmetry (1)
Exploration in the 2-d space
Path initialization start at situations when all
points are in L
Follow the updating procedure in the 1-d case
along the line
Regularization is changing and the cost asymmetry
is fixed.
Among all the classifiers, find the best one
, given users cost function
Paths starting from
17Path with Cost Asymmetry (2)
Produce ROC
Collecting R lines in the direction of
, we can build three ROC curves
18Results (1)
- For 1000 testing asymmetries , three methods
are compared - one take as training cost asymmetry
- int vary the intercept of one and build an
ROC, then select the optimal classifier - all select the optimal classifier from the
ROC obtained by varying both the training
asymmetry and the intercept.
- Use a nested cross-validation
- The outer cross-validation produce overall
accuracy estimates for the classifier - The inner cross-validation select optimal
classifier parameters (training asymmetry and/or
intercept).
19Results (2)
20Conclusions
- An efficient algorithm is presented to build ROC
curves by varying the training cost asymmetries
for SVMs. - The main contribution is generalizing the SVM
regularization path (Hastie et al., 2005) from a
1-d axis to a 2-d plane. - Because of the usage of a convex surrogate,
using the testing asymmetry for training leads to
non-optimal classifier. - Results show advantages of considering more
training asymmetries.