Nonlinear Data Discrimination via Generalized Support Vector Machines - PowerPoint PPT Presentation

About This Presentation

Title:

Nonlinear Data Discrimination via Generalized Support Vector Machines

Description:

Nonlinear Data Discrimination. via Generalized Support Vector Machines ... The Discrimination Problem. The Fundamental 2-Category Linearly Separable Case ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 23

Provided by: musi3

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Nonlinear Data Discrimination via Generalized Support Vector Machines

1
Nonlinear Data Discriminationvia Generalized
Support Vector Machines

David R. Musicant and Olvi L. Mangasarian
University of Wisconsin - Madison

www.cs.wisc.edu/musicant
2
Outline

The linear support vector machine (SVM)
Linear kernel
Generalized support vector machine (GSVM)
Nonlinear indefinite kernel
Linear Programming Formulation of GSVM
MINOS
Quadratic Programming Formulation of GSVM
Successive Overrelaxation (SOR)
Numerical comparisons
Conclusions

3
The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case
A
A-
Separating Surface
4
The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case

Given m points in the n dimensional space Rn
Represented by an m x n matrix A
Membership of each point Ai in the classes 1 or
-1 is specified by
An m x m diagonal matrix D with along its
diagonal

5
Preliminary Attempt at the (Linear) Support
Vector MachineRobust Linear Programming

Solve the following mathematical program

where y nonnegative error (slack) vector
Note y 0 if convex hulls of A and A- do not
intersect.

6
The (Linear) Support Vector MachineMaximize
Margin Between Separating Planes
A
A-
7
The (Linear) Support Vector Machine Formulation

Solve the following mathematical program

where y nonnegative error (slack) vector
Note y 0 if convex hulls of A and A- do not
intersect.

8
GSVM Generalized Support Vector MachineLinear
Programming Formulation
9
Examples of Kernels

Examples
Polynomial Kernel
denotes componentwise exponentiation as in
MATLAB
Radial Basis Kernel
Neural Network Kernel
denotes the step functioncomponentwise.

10
A Nonlinear Kernel ApplicationCheckerboard
Training Set 1000 Points in R2Separate 486
Asterisks from 514 Dots
11
Previous Work
12
Polynomial Kernel
13
Large Margin Classifier (SOR) Reformulation in
Space
A
A-
14
(SOR) Linear Support Vector MachineQuadratic
Programming Formulation

Solve the following mathematical program

The quadratic term here maximizes the distance
between the bounding planes in the space

15
Introducing a Nonlinear Kernel

The Wolfe Dual for the SOR Linear SVM is

Linear separating surface

16
SVM Optimality Conditions

Define
Then dual SVM becomes much simpler!

Gradient Projection necessary sufficient
optimality condition

denotes projecting u onto the region

17
SOR Algorithm Convergence

Above optimality conditions lead to the SOR
algorithm

Remember, optimality conditions are expressed as

SOR Linear Convergence Luo-Tseng 1993
The iterates of the SOR algorithm
converge R-linearly to a solution of the
dual problem
The objective function values
converge Q-linearlyto

18
Numerical Testing

Comparison of Linear Nonlinear Kernels using
Linear Programming
Quadratic Programming - SOR Formulations
Data Sets
UCI Liver Disorders 345 points in R6
Bell Labs Checkerboard 1000 points in R2
Gaussian Synthetic 1000 points in R32
SCDS Synthetic 1 million points in R32
Massive Synthetic 10 million points in R32
Machines
Cluster of 4 Sun Enterprise E6000 machines each
consisting of 16 UltraSPARC II 250 MHz Processors
with 2 Gig RAM
Total 64 Processors, 8 Gig RAM

19
Comparison of Linear Nonlinear SVMsLinear
Programming Generated

Nonlinear kernels yield better training and
testing set correctness

20
SOR Results

Comparison of linear and nonlinear kernels

Examples of training on massive data
1 million point dataset generated by SCDS
generator
Trained completely in 9.7 hours
Tuning set reached 99.7 of final accuracy in 0.3
hours
10 million point randomly generated dataset
Tuning set reached 95 of final accuracy in 14.3
hours
Under 10,000 iterations

21
Conclusions

Linear programming and successive overrelaxation
can generate complex nonlinear separating
surfaces via GSVMs
Nonlinear separating surfaces improve
generalization over linear ones
SOR can handle very large problems not (easily)
solveable by other methods
SOR scales up with virtually no changes
Future directions
Parallel SOR for very large problems not resident
in memory
Massive multicategory discrimination via SOR
Support vector regression

22
Questions?

Write a Comment

User Comments (0)