Title: Mathematical Programming in Support Vector Machines
1Mathematical Programming in Support Vector
Machines
- Olvi L. Mangasarian
- University of Wisconsin - Madison
High Performance Computation for Engineering
Systems Seminar MIT October 4, 2000
2What is a Support Vector Machine?
- An optimally defined surface
- Typically nonlinear in the input space
- Linear in a higher dimensional space
- Implicitly defined by a kernel function
3What are Support Vector Machines Used For?
- Classification
- Regression Data Fitting
- Supervised Unsupervised Learning
(Will concentrate on classification)
4Example of Nonlinear ClassifierCheckerboard
Classifier
5Outline of Talk
- Generalized support vector machines (SVMs)
- Completely general kernel allows complex
classification (No Mercer condition!) - Smooth support vector machines
- Smooth solve SVM by a fast Newton method
- Lagrangian support vector machines
- Very fast simple iterative scheme-
- One matrix inversion No LP. No QP.
- Reduced support vector machines
- Handle large datasets with nonlinear kernels
6Generalized Support Vector Machines2-Category
Linearly Separable Case
A
A-
7Generalized Support Vector MachinesAlgebra of
2-Category Linearly Separable Case
8Generalized Support Vector MachinesMaximizing
the Margin between Bounding Planes
A
A-
9Generalized Support Vector MachinesThe Linear
Support Vector Machine Formulation
10Breast Cancer Diagnosis Application97 Tenfold
Cross Validation Correctness780 Samples494
Benign, 286 Malignant
11Another Application Disputed Federalist
PapersBosch Smith 199856 Hamilton, 50
Madison, 12 Disputed
12Generalized Support Vector Machine
Motivation(Nonlinear Kernel Without Mercer
Condition)
13SSVM Smooth Support Vector Machine(SVM as
Unconstrained Minimization Problem)
Changing to 2-norm and measuring margin in(
) space
14Smoothing the Plus Function Integrate the
Sigmoid Function
15SSVM The Smooth Support Vector Machine
Smoothing the Plus Function
16Newton Minimize a sequence of quadratic
approximations to the strongly convex objective
function, i.e. solve a sequence of linear
equations in n1 variables. (Small dimensional
input space.)
Armijo Shorten distance between successive
iterates so as to generate sufficient decrease in
objective function. (In computational reality,
not needed!)
Global Quadratic Convergence Starting from any
point, the iterates guaranteed to converge to
the unique solution at a quadratic rate, i.e.
errors get squared. (Typically, 6 to 8
iterations without an Armijo.)
17SSVM with a Nonlinear Kernel Nonlinear
Separating Surface in Input Space
18Examples of Kernels Generate Nonlinear
Separating Surfaces in Input Space
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25LSVM Lagrangian Support Vector MachineDual of
SVM
26LSVM Lagrangian Support Vector MachineDual SVM
as Symmetric Linear Complementarity Problem
27LSVM AlgorithmSimple Linearly Convergent One
Small Matrix Inversion
Key Idea Sherman-Morrison-Woodbury formula
allows the inversion inversion of an extremely
large m-by-m matrix Q by merely inverting a much
smaller n-by-n matrix as follows
28LSVM Algorithm Linear Kernel11 Lines of MATLAB
Code
function it, opt, w, gamma svml(A,D,nu,itmax,t
ol) lsvm with SMW for min 1/2u'Qu-e'u s.t.
ugt0, QI/nuHH', HDA -e Input A, D, nu,
itmax, tol Output it, opt, w, gamma it, opt,
w, gamma svml(A,D,nu,itmax,tol)
m,nsize(A)alpha1.9/nueones(m,1)HDA
-eit0 SHinv((speye(n1)/nuH'H))
unu(1-S(H'e))olduu1 while itltitmax
norm(oldu-u)gttol z(1pl(((u/nuH(H'u))-alph
au)-1)) olduu unu(z-S(H'z))
itit1 end optnorm(u-oldu)wA'Dugamma
-e'Dufunction pl pl(x) pl (abs(x)x)/2
29LSVM Algorithm Linear KernelComputational
Results
- 2 Million random points in 10 dimensional space
- Classified in 6.7 minutes in 6 iterations e-5
accuracy
- 250 MHz UltraSPARC II with 2 gigabyte memory
- 32562 points in 123-dimensional space (UCI Adult
Dataset)
- Classified in141 seconds 55 iterations to 85
correctness
- 400 MHz Pentium II with 2 gigabyte memory
30LSVM Nonlinear KernelFormulation
31LSVM Algorithm Nonlinear Kernel Application 100
Iterations, 58 Seconds on Pentium II, 95.9
Accuracy
32Reduced Support Vector Machines (RSVM) Large
Nonlinear Kernel Classification Problems
- RSVM can solve very large problems
33Conventional SVM Result on Checkerboard Using 50
Random Points Out of 1000
34RSVM Result on Checkerboard Using SAME 50 Random
Points Out of 1000
35RSVM on Large Classification ProblemsStandard
Error over 50 Runs 0.001 to 0.002RSVM Time
1.24 (Random Points Time)
36Conclusion
- Mathematical Programming plays an essential role
in SVMs
- New formulations
- Generalized SVMs
- New algorithm-generating concepts
- Smoothing (SSVM)
- Implicit Lagrangian (LSVM)
37Future Research
- Concave minimization
- Concurrent feature data selection
- Multiple-instance problems
- SVMs as complementarity problems
- Kernel methods in nonlinear programming
- Multicategory classification algorithms
38Talk Papers Available on Web