Title: RSVM: Reduced Support Vector Machines
1 RSVM Reduced Support Vector Machines
- Y.-J. Lee O. L. Mangasarian
University of Wisconsin-Madison
First SIAM International Conference on Data
Mining Chicago, April 6, 2001
2Outline of Talk
- What is a support vector machine (SVM)
classifier?
- The smooth support vector machine (SSVM)
- A new SVM solvable without an optimization
package
- Difficulties with nonlinear SVMs
- Storage Separating surface depends on almost
entire dataset
- Reduced Support Vector Machines (RSVMs)
- Speeds computation reduces storage
3What is a Support Vector Machine?
- An optimally defined surface
- Typically nonlinear in the input space
- Linear in a higher dimensional space
- Implicitly defined by a kernel function
4What are Support Vector Machines Used For?
- Classification
- Regression Data Fitting
- Supervised Unsupervised Learning
(Will concentrate on classification)
5Geometry of the Classification Problem2-Category
Linearly Separable Case
A
A-
6Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
7Support Vector Machines Formulation
8SVM as an Unconstrained Minimization Problem
9SSVM The Smooth Support Vector Machine
10Nonlinear Smooth Support Vector Machine
Nonlinear Separating Surface
- Use Newton algorithm to solve the problem
- Nonlinear separating surface depends on entire
dataset
11Examples of Kernels
12Difficulties with Nonlinear SVM for Large
Problems
- Separating surface depends on almost entire
dataset
- Need to store the entire dataset after solving
the problem
13Overcoming Computational Storage
DifficultiesUse a Rectangular Kernel
14Reduced Support Vector Machine AlgorithmNonlinear
Separating Surface
15How to Choose in RSVM?
16 A Nonlinear Kernel ApplicationCheckerboard
Training Set 1000 Points in Separate 486
Asterisks from 514 Dots
17Conventional SVM Result on Checkerboard Using 50
Randomly Selected Points Out of 1000
18RSVM Result on Checkerboard Using SAME 50 Random
Points Out of 1000
19RSVM on Moderate Sized Problems(Best Test Set
Correctness , CPU seconds)
Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58
BUPA Liver 345 x 6 , 35 74.86 2.68 73.62 32.61 68.95 2.04
Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13
Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64
Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87
Mushroom 8124 x 22, 215 89.04 466.20 N/A N/A 83.90 221.50
20RSVM on Large UCI Adult DatasetStandard
Deviation over 50 Runs 0.001
Average Correctness Standard Deviation, 50 Runs Average Correctness Standard Deviation, 50 Runs Average Correctness Standard Deviation, 50 Runs Average Correctness Standard Deviation, 50 Runs Average Correctness Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2
(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0
(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5
(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2
(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0
21 CPU Times on UCI Adult DatasetRSVM, SMO and
PCGC with a Gaussian Kernel
Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds Adult Dataset Training Set Size vs. CPU Time in Seconds
Size 3185 4781 6414 11221 16101 22697 32562
RSVM 44.2 83.6 123.4 227.8 342.5 587.4 980.2
SMO 66.2 146.6 258.8 781.4 1784.4 4126.4 7749.6
PCGC 380.5 1137.2 2530.6 11910.6 Ran out of memory Ran out of memory Ran out of memory
22CPU Time Comparison on UCI DatasetRSVM, SMO and
PCGC with a Gaussian Kernel
Time( CPU sec. )
Training Set Size
23Conclusion
- RSVM An effective classifier for large datasets
- Classifier uses 10 or less of dataset
- Can handle massive datasets
- Much faster than other algorithms
- Applicable to all nonlinear kernel problems