Mathematical Programming in Support Vector Machines presentation

About This Presentation

Transcript and Presenter's Notes

Title: Mathematical Programming in Support Vector Machines

1
Mathematical Programming in Support Vector
Machines

Olvi L. Mangasarian
University of Wisconsin - Madison

High Performance Computation for Engineering
Systems Seminar MIT October 4, 2000
2
What is a Support Vector Machine?

An optimally defined surface
Typically nonlinear in the input space
Linear in a higher dimensional space
Implicitly defined by a kernel function

3
What are Support Vector Machines Used For?

Classification
Regression Data Fitting
Supervised Unsupervised Learning

(Will concentrate on classification)
4
Example of Nonlinear ClassifierCheckerboard
Classifier
5
Outline of Talk

Generalized support vector machines (SVMs)
Completely general kernel allows complex
classification (No Mercer condition!)
Smooth support vector machines
Smooth solve SVM by a fast Newton method
Lagrangian support vector machines
Very fast simple iterative scheme-
One matrix inversion No LP. No QP.
Reduced support vector machines
Handle large datasets with nonlinear kernels

6
Generalized Support Vector Machines2-Category
Linearly Separable Case
A
A-
7
Generalized Support Vector MachinesAlgebra of
2-Category Linearly Separable Case
8
Generalized Support Vector MachinesMaximizing
the Margin between Bounding Planes
A
A-
9
Generalized Support Vector MachinesThe Linear
Support Vector Machine Formulation
10
Breast Cancer Diagnosis Application97 Tenfold
Cross Validation Correctness780 Samples494
Benign, 286 Malignant
11
Another Application Disputed Federalist
PapersBosch Smith 199856 Hamilton, 50
Madison, 12 Disputed
12
Generalized Support Vector Machine
Motivation(Nonlinear Kernel Without Mercer
Condition)
13
SSVM Smooth Support Vector Machine(SVM as
Unconstrained Minimization Problem)
Changing to 2-norm and measuring margin in(
) space
14
Smoothing the Plus Function Integrate the
Sigmoid Function
15
SSVM The Smooth Support Vector Machine
Smoothing the Plus Function
16
Newton Minimize a sequence of quadratic
approximations to the strongly convex objective
function, i.e. solve a sequence of linear
equations in n1 variables. (Small dimensional
input space.)
Armijo Shorten distance between successive
iterates so as to generate sufficient decrease in
objective function. (In computational reality,
not needed!)
Global Quadratic Convergence Starting from any
point, the iterates guaranteed to converge to
the unique solution at a quadratic rate, i.e.
errors get squared. (Typically, 6 to 8
iterations without an Armijo.)
17
SSVM with a Nonlinear Kernel Nonlinear
Separating Surface in Input Space
18
Examples of Kernels Generate Nonlinear
Separating Surfaces in Input Space
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
LSVM Lagrangian Support Vector MachineDual of
SVM
26
LSVM Lagrangian Support Vector MachineDual SVM
as Symmetric Linear Complementarity Problem
27
LSVM AlgorithmSimple Linearly Convergent One
Small Matrix Inversion
Key Idea Sherman-Morrison-Woodbury formula
allows the inversion inversion of an extremely
large m-by-m matrix Q by merely inverting a much
smaller n-by-n matrix as follows
28
LSVM Algorithm Linear Kernel11 Lines of MATLAB
Code
function it, opt, w, gamma svml(A,D,nu,itmax,t
ol) lsvm with SMW for min 1/2u'Qu-e'u s.t.
ugt0, QI/nuHH', HDA -e Input A, D, nu,
itmax, tol Output it, opt, w, gamma it, opt,
w, gamma svml(A,D,nu,itmax,tol)
m,nsize(A)alpha1.9/nueones(m,1)HDA
-eit0 SHinv((speye(n1)/nuH'H))
unu(1-S(H'e))olduu1 while itltitmax
norm(oldu-u)gttol z(1pl(((u/nuH(H'u))-alph
au)-1)) olduu unu(z-S(H'z))
itit1 end optnorm(u-oldu)wA'Dugamma
-e'Dufunction pl pl(x) pl (abs(x)x)/2
29
LSVM Algorithm Linear KernelComputational
Results

2 Million random points in 10 dimensional space

Classified in 6.7 minutes in 6 iterations e-5
accuracy

250 MHz UltraSPARC II with 2 gigabyte memory

CPLEX ran out of memory

32562 points in 123-dimensional space (UCI Adult
Dataset)

Classified in141 seconds 55 iterations to 85
correctness

400 MHz Pentium II with 2 gigabyte memory

30
LSVM Nonlinear KernelFormulation
31
LSVM Algorithm Nonlinear Kernel Application 100
Iterations, 58 Seconds on Pentium II, 95.9
Accuracy
32
Reduced Support Vector Machines (RSVM) Large
Nonlinear Kernel Classification Problems

RSVM can solve very large problems

33
Conventional SVM Result on Checkerboard Using 50
Random Points Out of 1000
34
RSVM Result on Checkerboard Using SAME 50 Random
Points Out of 1000
35
RSVM on Large Classification ProblemsStandard
Error over 50 Runs 0.001 to 0.002RSVM Time
1.24 (Random Points Time)
36
Conclusion

Mathematical Programming plays an essential role
in SVMs

Theory

New formulations
Generalized SVMs

New algorithm-generating concepts
Smoothing (SSVM)
Implicit Lagrangian (LSVM)

Algorithms

Fast SSVM

Massive LSVM, RSVM

37
Future Research

Theory

Concave minimization
Concurrent feature data selection
Multiple-instance problems

SVMs as complementarity problems

Kernel methods in nonlinear programming

Algorithms

Multicategory classification algorithms

38
Talk Papers Available on Web

www.cs.wisc.edu/olvi

Write a Comment

User Comments (0)

About PowerShow.com

Mathematical Programming in Support Vector Machines PowerPoint PPT Presentation