Considering Cost Asymmetry in Learning Classifiers

About This Presentation

Title:

Considering Cost Asymmetry in Learning Classifiers

Description:

Title: PowerPoint Presentation Author: Hui Li Last modified by: cw36 Created Date: 5/17/2005 7:36:42 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 21

Provided by: Hui46

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Considering Cost Asymmetry in Learning Classifiers

1
Considering Cost Asymmetry in Learning
Classifiers
by Bach, Heckerman and Horvitz
Presented by Chunping Wang Machine Learning
Group, Duke University May 21, 2007
2
Outline

Introduction
SVM with Asymmetric Cost
SVM Regularization Path (Hastie et al., 2005)
Path with Cost Asymmetry
Results
Conclusions

3
Introduction (1)
Binary classification
real-valued predictors
binary response
A classifier could be defined as
based on a linear decision function
Parameters
4
Introduction (2)

Two types of misclassification
false negative cost
false positive cost

Expected cost
In terms of 0-1 loss function
Real loss function but Non-convex
Non-differentiable
5
Introduction (3)
Convex loss functions surrogates for the 0-1
loss function (for training purpose)
6
Introduction (4)
Empirical cost given n labeled data points
Objective function
asymmetry
regularization
Since convex surrogates of the 0-1 loss function
are used for training, the cost asymmetries for
training and testing are mismatched.
Motivation efficiently look at many training
asymmetries even if the testing asymmetry is
given.
7
SVM with Asymmetric Cost (1)
hinge loss
SVM with asymmetric cost
where
8
SVM with Asymmetric Cost (2)
The Lagrangian with dual variables
Karush-Kuhn-Tucker (KKT) conditions
9
SVM with Asymmetric Cost (3)
The dual problem
where
A quadratic optimization problem given a cost
structure Computation will be intractable for the
whole space
Following the SVM regularization path algorithm
(Hastie et al., 2005), the authors deal with
(1)-(3) and KKT conditions instead of the dual
problem.
10
SVM Regularization Path (1)

Define active sets of data points
Margin
Left of margin
Right of margin

KKT conditions
SVM regularization path
The cost is symmetric and thus searching is along
the axis.
11
SVM Regularization Path (2)
Initialization ( )
Consider sufficiently large (C is very
small), all the points are in L
with
Decrease
Remain
One or more positive and negative examples hit
the margin simultaneously
12
SVM Regularization Path (3)
Initialization ( )
Define
The critical condition for first two points
hitting the margin
For , this initial condition keeps
the same except the definition of .
13
SVM Regularization Path (4)

The path decrease , changes only for
except that one of the following events
happens
A point from L or R has entered M
A point in M has left the set to join either R
or L

consider only the points on the margin
where is some function of
,
Therefore, the for points on the margin
proceed linearly in the function changes in a
piecewise-inverse manner in
14
SVM Regularization Path (4)

The path decrease , changes only for
except that one of the following events
happens
A point from L or R has entered M
A point in M has left the set to join either R
or L

Update regularization
Update active sets and solutions
Stopping condition
In the separable case, we terminate when
L become empty
In the non-separable case, we terminate
when

for all the possible events
16
Path with Cost Asymmetry (1)
Exploration in the 2-d space
Path initialization start at situations when all
points are in L
Follow the updating procedure in the 1-d case
along the line
Regularization is changing and the cost asymmetry
is fixed.
Among all the classifiers, find the best one
, given users cost function
Paths starting from
17
Path with Cost Asymmetry (2)
Produce ROC
Collecting R lines in the direction of
, we can build three ROC curves
18
Results (1)

For 1000 testing asymmetries , three methods
are compared
one take as training cost asymmetry
int vary the intercept of one and build an
ROC, then select the optimal classifier
all select the optimal classifier from the
ROC obtained by varying both the training
asymmetry and the intercept.

Use a nested cross-validation
The outer cross-validation produce overall
accuracy estimates for the classifier
The inner cross-validation select optimal
classifier parameters (training asymmetry and/or
intercept).

19
Results (2)
20
Conclusions

An efficient algorithm is presented to build ROC
curves by varying the training cost asymmetries
for SVMs.
The main contribution is generalizing the SVM
regularization path (Hastie et al., 2005) from a
1-d axis to a 2-d plane.
Because of the usage of a convex surrogate,
using the testing asymmetry for training leads to
non-optimal classifier.
Results show advantages of considering more
training asymmetries.