Title: Nonlinear Data Discrimination via Generalized Support Vector Machines
1Nonlinear Data Discriminationvia Generalized
Support Vector Machines
- David R. Musicant and Olvi L. Mangasarian
- University of Wisconsin - Madison
www.cs.wisc.edu/musicant
2Outline
- The linear support vector machine (SVM)
- Linear kernel
- Generalized support vector machine (GSVM)
- Nonlinear indefinite kernel
- Linear Programming Formulation of GSVM
- MINOS
- Quadratic Programming Formulation of GSVM
- Successive Overrelaxation (SOR)
- Numerical comparisons
- Conclusions
3The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case
A
A-
Separating Surface
4The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case
- Given m points in the n dimensional space Rn
- Represented by an m x n matrix A
- Membership of each point Ai in the classes 1 or
-1 is specified by - An m x m diagonal matrix D with along its
diagonal
5Preliminary Attempt at the (Linear) Support
Vector MachineRobust Linear Programming
- Solve the following mathematical program
- where y nonnegative error (slack) vector
- Note y 0 if convex hulls of A and A- do not
intersect.
6The (Linear) Support Vector MachineMaximize
Margin Between Separating Planes
A
A-
7The (Linear) Support Vector Machine Formulation
- Solve the following mathematical program
- where y nonnegative error (slack) vector
- Note y 0 if convex hulls of A and A- do not
intersect.
8GSVM Generalized Support Vector MachineLinear
Programming Formulation
9Examples of Kernels
- Examples
- Polynomial Kernel
- denotes componentwise exponentiation as in
MATLAB - Radial Basis Kernel
- Neural Network Kernel
- denotes the step functioncomponentwise.
10A Nonlinear Kernel ApplicationCheckerboard
Training Set 1000 Points in R2Separate 486
Asterisks from 514 Dots
11Previous Work
12Polynomial Kernel
13Large Margin Classifier (SOR) Reformulation in
Space
A
A-
14(SOR) Linear Support Vector MachineQuadratic
Programming Formulation
- Solve the following mathematical program
- The quadratic term here maximizes the distance
between the bounding planes in the space
15Introducing a Nonlinear Kernel
- The Wolfe Dual for the SOR Linear SVM is
- Linear separating surface
16SVM Optimality Conditions
- Define
- Then dual SVM becomes much simpler!
- Gradient Projection necessary sufficient
optimality condition
- denotes projecting u onto the region
17SOR Algorithm Convergence
- Above optimality conditions lead to the SOR
algorithm
- Remember, optimality conditions are expressed as
- SOR Linear Convergence Luo-Tseng 1993
- The iterates of the SOR algorithm
converge R-linearly to a solution of the
dual problem - The objective function values
converge Q-linearlyto
18Numerical Testing
- Comparison of Linear Nonlinear Kernels using
- Linear Programming
- Quadratic Programming - SOR Formulations
- Data Sets
- UCI Liver Disorders 345 points in R6
- Bell Labs Checkerboard 1000 points in R2
- Gaussian Synthetic 1000 points in R32
- SCDS Synthetic 1 million points in R32
- Massive Synthetic 10 million points in R32
- Machines
- Cluster of 4 Sun Enterprise E6000 machines each
consisting of 16 UltraSPARC II 250 MHz Processors
with 2 Gig RAM - Total 64 Processors, 8 Gig RAM
19Comparison of Linear Nonlinear SVMsLinear
Programming Generated
- Nonlinear kernels yield better training and
testing set correctness
20SOR Results
- Comparison of linear and nonlinear kernels
- Examples of training on massive data
- 1 million point dataset generated by SCDS
generator - Trained completely in 9.7 hours
- Tuning set reached 99.7 of final accuracy in 0.3
hours - 10 million point randomly generated dataset
- Tuning set reached 95 of final accuracy in 14.3
hours - Under 10,000 iterations
21Conclusions
- Linear programming and successive overrelaxation
can generate complex nonlinear separating
surfaces via GSVMs - Nonlinear separating surfaces improve
generalization over linear ones - SOR can handle very large problems not (easily)
solveable by other methods - SOR scales up with virtually no changes
- Future directions
- Parallel SOR for very large problems not resident
in memory - Massive multicategory discrimination via SOR
- Support vector regression
22Questions?