Title: Reduced Data Classifiers via Support Vector Machines SIAM International Conference on Data Mining Chicago April 5-7, 2001
1 Reduced Data Classifiersvia Support Vector
MachinesSIAM International Conference on Data
Mining Chicago April 5-7, 2001
- O. L. Mangasarian Y. J. Lee
Data Mining Institute University of Wisconsin -
Madison
Second Annual Review June 1, 2001
2Key Objective
3Outline of Talk
- What is a support vector machine (SVM)?
- What is a smooth support vector machine (SSVM)?
- An SVM solvable without optimization software
(LP,QP)
- Difficulties with nonlinear SVM classifiers
- Storage Classifier depends on almost entire
dataset
- Reduced Support Vector Machines (RSVMs)
- Speeds computation reduces storage
4What is a Support Vector Machine?
- An optimally defined surface
- Typically nonlinear in the input space
- Linear in a higher dimensional space
- Implicitly defined by a kernel function
5What are Support Vector Machines Used For?
- Classification
- Regression Data Fitting
- Supervised Unsupervised Learning
(Will concentrate on classification)
6Geometry of the Classification Problem2-Category
Linearly Separable Case
A
A-
7Support Vector MachinesAlgebra of 2-Category
Linearly Separable Case
8Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
9Support Vector Machine Formulation
10SVM as an Unconstrained Minimization Problem
11Smoothing the Plus Function Integrate the
Sigmoid Function
12SSVM The Smooth Support Vector Machine
13Nonlinear Smooth Support Vector Machine
Nonlinear Separating Surface (Instead of Linear
Surface
- Use Newton algorithm to solve the problem
- Nonlinear separating surface depends on entire
dataset
14Examples of Kernels
15Difficulties with Nonlinear SVM for Large
Problems
- Separating surface depends on almost entire
dataset
- Need to store the entire dataset after solving
the problem
16Overcoming Computational Storage
DifficultiesUse a Rectangular Kernel
17Reduced Support Vector Machine AlgorithmNonlinear
Separating Surface
18How to Choose in RSVM?
19 A Nonlinear Kernel ApplicationCheckerboard
Training Set 1000 Points in Separate 486
Asterisks from 514 Dots
20Conventional SVM Result on Checkerboard Using 50
Randomly Selected Points Out of 1000
21RSVM Result on Checkerboard Using SAME 50 Random
Points Out of 1000
22RSVM on Moderately Sized Problems(Best Test Set
Correctness , CPU seconds)
23RSVM on Large UCI Adult DatasetStandard
Deviation over 50 Runs 0.001
24 CPU Times on UCI Adult DatasetRSVM, SMO and
PCGC with a Gaussian Kernel
25CPU Time Comparison on UCI DatasetRSVM, SMO and
PCGC with a Gaussian Kernel
Time( CPU sec. )
Training Set Size
26Conclusion
- RSVM An effective classifier for large datasets
- Classifier uses 10 or less of dataset
- Can handle massive datasets
- Much faster than other algorithms
- Applicable to all nonlinear kernel problems