Considering Cost Asymmetry in Learning Classifiers - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Considering Cost Asymmetry in Learning Classifiers

Description:

Title: PowerPoint Presentation Author: Hui Li Last modified by: cw36 Created Date: 5/17/2005 7:36:42 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 21
Provided by: Hui46
Category:

less

Transcript and Presenter's Notes

Title: Considering Cost Asymmetry in Learning Classifiers


1
Considering Cost Asymmetry in Learning
Classifiers
by Bach, Heckerman and Horvitz
Presented by Chunping Wang Machine Learning
Group, Duke University May 21, 2007
2
Outline
  • Introduction
  • SVM with Asymmetric Cost
  • SVM Regularization Path (Hastie et al., 2005)
  • Path with Cost Asymmetry
  • Results
  • Conclusions

3
Introduction (1)
Binary classification
real-valued predictors
binary response
A classifier could be defined as
based on a linear decision function
Parameters
4
Introduction (2)
  • Two types of misclassification
  • false negative cost
  • false positive cost

Expected cost
In terms of 0-1 loss function
Real loss function but Non-convex
Non-differentiable
5
Introduction (3)
Convex loss functions surrogates for the 0-1
loss function (for training purpose)
6
Introduction (4)
Empirical cost given n labeled data points
Objective function
asymmetry
regularization
Since convex surrogates of the 0-1 loss function
are used for training, the cost asymmetries for
training and testing are mismatched.
Motivation efficiently look at many training
asymmetries even if the testing asymmetry is
given.
7
SVM with Asymmetric Cost (1)
hinge loss
SVM with asymmetric cost
where
8
SVM with Asymmetric Cost (2)
The Lagrangian with dual variables
Karush-Kuhn-Tucker (KKT) conditions
9
SVM with Asymmetric Cost (3)
The dual problem
where
A quadratic optimization problem given a cost
structure Computation will be intractable for the
whole space
Following the SVM regularization path algorithm
(Hastie et al., 2005), the authors deal with
(1)-(3) and KKT conditions instead of the dual
problem.
10
SVM Regularization Path (1)
  • Define active sets of data points
  • Margin
  • Left of margin
  • Right of margin

KKT conditions
SVM regularization path
The cost is symmetric and thus searching is along
the axis.
11
SVM Regularization Path (2)
Initialization ( )
Consider sufficiently large (C is very
small), all the points are in L
with
Decrease
Remain
One or more positive and negative examples hit
the margin simultaneously
12
SVM Regularization Path (3)
Initialization ( )
Define
The critical condition for first two points
hitting the margin
For , this initial condition keeps
the same except the definition of .
13
SVM Regularization Path (4)
  • The path decrease , changes only for
    except that one of the following events
    happens
  • A point from L or R has entered M
  • A point in M has left the set to join either R
    or L

consider only the points on the margin
where is some function of
,
Therefore, the for points on the margin
proceed linearly in the function changes in a
piecewise-inverse manner in
14
SVM Regularization Path (4)
  • The path decrease , changes only for
    except that one of the following events
    happens
  • A point from L or R has entered M
  • A point in M has left the set to join either R
    or L

consider only the points on the margin
where is some function of
,
Therefore, the for points on the margin
proceed linearly in the function changes in a
piecewise-inverse manner in .
15
SVM Regularization Path (5)
  • Update regularization
  • Update active sets and solutions
  • Stopping condition
  • In the separable case, we terminate when
    L become empty
  • In the non-separable case, we terminate
    when

for all the possible events
16
Path with Cost Asymmetry (1)
Exploration in the 2-d space
Path initialization start at situations when all
points are in L
Follow the updating procedure in the 1-d case
along the line
Regularization is changing and the cost asymmetry
is fixed.
Among all the classifiers, find the best one
, given users cost function
Paths starting from
17
Path with Cost Asymmetry (2)
Produce ROC
Collecting R lines in the direction of
, we can build three ROC curves
18
Results (1)
  • For 1000 testing asymmetries , three methods
    are compared
  • one take as training cost asymmetry
  • int vary the intercept of one and build an
    ROC, then select the optimal classifier
  • all select the optimal classifier from the
    ROC obtained by varying both the training
    asymmetry and the intercept.
  • Use a nested cross-validation
  • The outer cross-validation produce overall
    accuracy estimates for the classifier
  • The inner cross-validation select optimal
    classifier parameters (training asymmetry and/or
    intercept).

19
Results (2)
20
Conclusions
  • An efficient algorithm is presented to build ROC
    curves by varying the training cost asymmetries
    for SVMs.
  • The main contribution is generalizing the SVM
    regularization path (Hastie et al., 2005) from a
    1-d axis to a 2-d plane.
  • Because of the usage of a convex surrogate,
    using the testing asymmetry for training leads to
    non-optimal classifier.
  • Results show advantages of considering more
    training asymmetries.
Write a Comment
User Comments (0)
About PowerShow.com