Mathematical Programming in Support Vector Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Mathematical Programming in Support Vector Machines

Description:

Completely general kernel allows complex classification (No Mercer condition! ... (Nonlinear Kernel Without Mercer Condition) Linear SVM: Linear separating ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 39
Provided by: olvilman9
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Mathematical Programming in Support Vector Machines


1
Mathematical Programming in Support Vector
Machines
  • Olvi L. Mangasarian
  • University of Wisconsin - Madison

High Performance Computation for Engineering
Systems Seminar MIT October 4, 2000
2
What is a Support Vector Machine?
  • An optimally defined surface
  • Typically nonlinear in the input space
  • Linear in a higher dimensional space
  • Implicitly defined by a kernel function

3
What are Support Vector Machines Used For?
  • Classification
  • Regression Data Fitting
  • Supervised Unsupervised Learning

(Will concentrate on classification)
4
Example of Nonlinear ClassifierCheckerboard
Classifier
5
Outline of Talk
  • Generalized support vector machines (SVMs)
  • Completely general kernel allows complex
    classification (No Mercer condition!)
  • Smooth support vector machines
  • Smooth solve SVM by a fast Newton method
  • Lagrangian support vector machines
  • Very fast simple iterative scheme-
  • One matrix inversion No LP. No QP.
  • Reduced support vector machines
  • Handle large datasets with nonlinear kernels

6
Generalized Support Vector Machines2-Category
Linearly Separable Case
A
A-
7
Generalized Support Vector MachinesAlgebra of
2-Category Linearly Separable Case
8
Generalized Support Vector MachinesMaximizing
the Margin between Bounding Planes
A
A-
9
Generalized Support Vector MachinesThe Linear
Support Vector Machine Formulation
10
Breast Cancer Diagnosis Application97 Tenfold
Cross Validation Correctness780 Samples494
Benign, 286 Malignant
11
Another Application Disputed Federalist
PapersBosch Smith 199856 Hamilton, 50
Madison, 12 Disputed
12
Generalized Support Vector Machine
Motivation(Nonlinear Kernel Without Mercer
Condition)
13
SSVM Smooth Support Vector Machine(SVM as
Unconstrained Minimization Problem)
Changing to 2-norm and measuring margin in(
) space
14
Smoothing the Plus Function Integrate the
Sigmoid Function
15
SSVM The Smooth Support Vector Machine
Smoothing the Plus Function
16
Newton Minimize a sequence of quadratic
approximations to the strongly convex objective
function, i.e. solve a sequence of linear
equations in n1 variables. (Small dimensional
input space.)
Armijo Shorten distance between successive
iterates so as to generate sufficient decrease in
objective function. (In computational reality,
not needed!)
Global Quadratic Convergence Starting from any
point, the iterates guaranteed to converge to
the unique solution at a quadratic rate, i.e.
errors get squared. (Typically, 6 to 8
iterations without an Armijo.)
17
SSVM with a Nonlinear Kernel Nonlinear
Separating Surface in Input Space
18
Examples of Kernels Generate Nonlinear
Separating Surfaces in Input Space
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
LSVM Lagrangian Support Vector MachineDual of
SVM
26
LSVM Lagrangian Support Vector MachineDual SVM
as Symmetric Linear Complementarity Problem
27
LSVM AlgorithmSimple Linearly Convergent One
Small Matrix Inversion
Key Idea Sherman-Morrison-Woodbury formula
allows the inversion inversion of an extremely
large m-by-m matrix Q by merely inverting a much
smaller n-by-n matrix as follows
28
LSVM Algorithm Linear Kernel11 Lines of MATLAB
Code
function it, opt, w, gamma svml(A,D,nu,itmax,t
ol) lsvm with SMW for min 1/2u'Qu-e'u s.t.
ugt0, QI/nuHH', HDA -e Input A, D, nu,
itmax, tol Output it, opt, w, gamma it, opt,
w, gamma svml(A,D,nu,itmax,tol)
m,nsize(A)alpha1.9/nueones(m,1)HDA
-eit0 SHinv((speye(n1)/nuH'H))
unu(1-S(H'e))olduu1 while itltitmax
norm(oldu-u)gttol z(1pl(((u/nuH(H'u))-alph
au)-1)) olduu unu(z-S(H'z))
itit1 end optnorm(u-oldu)wA'Dugamma
-e'Dufunction pl pl(x) pl (abs(x)x)/2
29
LSVM Algorithm Linear KernelComputational
Results
  • 2 Million random points in 10 dimensional space
  • Classified in 6.7 minutes in 6 iterations e-5
    accuracy
  • 250 MHz UltraSPARC II with 2 gigabyte memory
  • CPLEX ran out of memory
  • 32562 points in 123-dimensional space (UCI Adult
    Dataset)
  • Classified in141 seconds 55 iterations to 85
    correctness
  • 400 MHz Pentium II with 2 gigabyte memory

30
LSVM Nonlinear KernelFormulation
31
LSVM Algorithm Nonlinear Kernel Application 100
Iterations, 58 Seconds on Pentium II, 95.9
Accuracy
32
Reduced Support Vector Machines (RSVM) Large
Nonlinear Kernel Classification Problems
  • RSVM can solve very large problems

33
Conventional SVM Result on Checkerboard Using 50
Random Points Out of 1000
34
RSVM Result on Checkerboard Using SAME 50 Random
Points Out of 1000
35
RSVM on Large Classification ProblemsStandard
Error over 50 Runs 0.001 to 0.002RSVM Time
1.24 (Random Points Time)
36
Conclusion
  • Mathematical Programming plays an essential role
    in SVMs
  • Theory
  • New formulations
  • Generalized SVMs
  • New algorithm-generating concepts
  • Smoothing (SSVM)
  • Implicit Lagrangian (LSVM)
  • Algorithms
  • Fast SSVM
  • Massive LSVM, RSVM

37
Future Research
  • Theory
  • Concave minimization
  • Concurrent feature data selection
  • Multiple-instance problems
  • SVMs as complementarity problems
  • Kernel methods in nonlinear programming
  • Algorithms
  • Multicategory classification algorithms

38
Talk Papers Available on Web
  • www.cs.wisc.edu/olvi
Write a Comment
User Comments (0)
About PowerShow.com