Minimal Kernel Classifiers

About This Presentation

Title:

Minimal Kernel Classifiers

Description:

Quadratic programming (QP) formulation. Linear programming (LP) formulation ... Linear Programming Formulation. Use the 1-norm instead of the 2-norm: min. s.t. ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 26

Provided by: olvilman9

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Minimal Kernel Classifiers

1
Minimal Kernel Classifiers
Informs 2002San Jose, California, Nov 17-20,
2002

Glenn Fung
Olvi Mangasarian
Alexander Smola

Data Mining Institute University of Wisconsin -
Madison
2
Outline of Talk

Linear Support Vector Machines (SVM)
Linear separating surface
Quadratic programming (QP) formulation
Linear programming (LP) formulation
Nonlinear Support Vector Machines
Nonlinear kernel separating surface
LP formulation
The Minimal Kernel Classifier (MKC)
The pound loss function ()
MKC Algorithm
Numerical experiments
Conclusion

3
What is a Support Vector Machine?

An optimally defined surface
Linear or nonlinear in the input space
Linear in a higher dimensional feature space
Implicitly defined by a kernel function

4
What are Support Vector Machines Used For?

Classification
Regression Data Fitting
Supervised Unsupervised Learning

5
Generalized Support Vector Machines2-Category
Linearly Separable Case
A
A-
6
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
7
Support Vector Machine FormulationAlgebra of
2-Category Linearly Separable Case
8
QP Support Vector Machine Formulation
9
Support Vector MachinesLinear Programming
Formulation

Use the 1-norm instead of the 2-norm

This is equivalent to the following linear
program

10
Nonlinear Kernel LP Formulation
11
The Nonlinear Classifier

Where K is a nonlinear kernel, e.g.

12
Nonlinear PSVM Spiral Dataset94 Red Dots 94
White Dots
13
Model Simplification

Why? Minimizes number of kernel functions used.

Simplifies separating surface.

Goal 2 Minimize number of active constraints.

Why? Reduces data dependence.
Useful for massive incremental classification.

14
Model Simplification Goal 1Simplifying
Separating Surface
15
Model Simplification Goal 2Minimize Data
Dependence

By KKT conditions

Hence
16
Achieving Model SimplificationMinimal Kernel
Classifier Formulation
17
The (Pound) Loss Function
18
Approximating the Pound Loss Function
19
Minimal Kernel Classifier as a Concave
Minimization Problem

That can be effectively solved using the finite
Successive Linearization Algorithm (SLA)
(Magasarian 1996)

20
Minimal Kernel Algorithm (SLA)
21
Minimal Kernel Algorithm (SLA)

Each iteration of the algorithm solves a
Linear program.
The algorithm terminates in a finite number of
iterations (typically 5 to 7 iterations).
Solution obtained satisfies the Minimum
Principle necessary optimality condition.

22
(No Transcript)
23
Checkerboard Separating Surface of Kernel
Functions27 of Active Constraints 30 o
24
Numerical ExperimentsResults for six public
datasets
25
Conclusion

A finite algorithm generating a classifier that
depends on a fraction of input data only.
Important for fast online testing of unseen data,
e.g. fraud or intrusion detection .
Useful for incremental training of massive data
Overall algorithm consists of solving 5 to 7
LPs.
Kernel data dependence reduced up to 98.8 of
the data used by a standard SVM.
Testing time reduction 98.2.
MKC testing set correctness comparable to that of
a more complex standard SVM.