Title: Minimal Kernel Classifiers
1 Minimal Kernel Classifiers
Informs 2002San Jose, California, Nov 17-20,
2002
- Glenn Fung
- Olvi Mangasarian
- Alexander Smola
Data Mining Institute University of Wisconsin -
Madison
2Outline of Talk
- Linear Support Vector Machines (SVM)
- Linear separating surface
- Quadratic programming (QP) formulation
- Linear programming (LP) formulation
- Nonlinear Support Vector Machines
- Nonlinear kernel separating surface
- LP formulation
- The Minimal Kernel Classifier (MKC)
- The pound loss function ()
- MKC Algorithm
- Numerical experiments
- Conclusion
3What is a Support Vector Machine?
- An optimally defined surface
- Linear or nonlinear in the input space
- Linear in a higher dimensional feature space
- Implicitly defined by a kernel function
4What are Support Vector Machines Used For?
- Classification
- Regression Data Fitting
- Supervised Unsupervised Learning
5Generalized Support Vector Machines2-Category
Linearly Separable Case
A
A-
6Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
7Support Vector Machine FormulationAlgebra of
2-Category Linearly Separable Case
8QP Support Vector Machine Formulation
9Support Vector MachinesLinear Programming
Formulation
- Use the 1-norm instead of the 2-norm
- This is equivalent to the following linear
program
10Nonlinear Kernel LP Formulation
11The Nonlinear Classifier
- Where K is a nonlinear kernel, e.g.
12Nonlinear PSVM Spiral Dataset94 Red Dots 94
White Dots
13Model Simplification
- Why? Minimizes number of kernel functions used.
- Simplifies separating surface.
- Goal 2 Minimize number of active constraints.
- Why? Reduces data dependence.
- Useful for massive incremental classification.
14Model Simplification Goal 1Simplifying
Separating Surface
15Model Simplification Goal 2Minimize Data
Dependence
Hence
16Achieving Model SimplificationMinimal Kernel
Classifier Formulation
17The (Pound) Loss Function
18Approximating the Pound Loss Function
19Minimal Kernel Classifier as a Concave
Minimization Problem
- That can be effectively solved using the finite
- Successive Linearization Algorithm (SLA)
- (Magasarian 1996)
20Minimal Kernel Algorithm (SLA)
21Minimal Kernel Algorithm (SLA)
- Each iteration of the algorithm solves a
Linear program. - The algorithm terminates in a finite number of
iterations (typically 5 to 7 iterations). - Solution obtained satisfies the Minimum
Principle necessary optimality condition.
22(No Transcript)
23Checkerboard Separating Surface of Kernel
Functions27 of Active Constraints 30 o
24Numerical ExperimentsResults for six public
datasets
25Conclusion
- A finite algorithm generating a classifier that
depends on a fraction of input data only. - Important for fast online testing of unseen data,
e.g. fraud or intrusion detection . - Useful for incremental training of massive data
- Overall algorithm consists of solving 5 to 7
LPs. - Kernel data dependence reduced up to 98.8 of
the data used by a standard SVM. - Testing time reduction 98.2.
- MKC testing set correctness comparable to that of
a more complex standard SVM.