Nonlinear Data Discrimination via Generalized Support Vector Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Nonlinear Data Discrimination via Generalized Support Vector Machines

Description:

Nonlinear Data Discrimination. via Generalized Support Vector Machines ... The Discrimination Problem. The Fundamental 2-Category Linearly Separable Case ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 23
Provided by: musi3
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Nonlinear Data Discrimination via Generalized Support Vector Machines


1
Nonlinear Data Discriminationvia Generalized
Support Vector Machines
  • David R. Musicant and Olvi L. Mangasarian
  • University of Wisconsin - Madison

www.cs.wisc.edu/musicant
2
Outline
  • The linear support vector machine (SVM)
  • Linear kernel
  • Generalized support vector machine (GSVM)
  • Nonlinear indefinite kernel
  • Linear Programming Formulation of GSVM
  • MINOS
  • Quadratic Programming Formulation of GSVM
  • Successive Overrelaxation (SOR)
  • Numerical comparisons
  • Conclusions

3
The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case
A
A-
Separating Surface
4
The Discrimination ProblemThe Fundamental
2-Category Linearly Separable Case
  • Given m points in the n dimensional space Rn
  • Represented by an m x n matrix A
  • Membership of each point Ai in the classes 1 or
    -1 is specified by
  • An m x m diagonal matrix D with along its
    diagonal

5
Preliminary Attempt at the (Linear) Support
Vector MachineRobust Linear Programming
  • Solve the following mathematical program
  • where y nonnegative error (slack) vector
  • Note y 0 if convex hulls of A and A- do not
    intersect.

6
The (Linear) Support Vector MachineMaximize
Margin Between Separating Planes
A
A-
7
The (Linear) Support Vector Machine Formulation
  • Solve the following mathematical program
  • where y nonnegative error (slack) vector
  • Note y 0 if convex hulls of A and A- do not
    intersect.

8
GSVM Generalized Support Vector MachineLinear
Programming Formulation
9
Examples of Kernels
  • Examples
  • Polynomial Kernel
  • denotes componentwise exponentiation as in
    MATLAB
  • Radial Basis Kernel
  • Neural Network Kernel
  • denotes the step functioncomponentwise.

10
A Nonlinear Kernel ApplicationCheckerboard
Training Set 1000 Points in R2Separate 486
Asterisks from 514 Dots
11
Previous Work
12
Polynomial Kernel
13
Large Margin Classifier (SOR) Reformulation in
Space
A
A-
14
(SOR) Linear Support Vector MachineQuadratic
Programming Formulation
  • Solve the following mathematical program
  • The quadratic term here maximizes the distance
    between the bounding planes in the space

15
Introducing a Nonlinear Kernel
  • The Wolfe Dual for the SOR Linear SVM is
  • Linear separating surface

16
SVM Optimality Conditions
  • Define
  • Then dual SVM becomes much simpler!
  • Gradient Projection necessary sufficient
    optimality condition
  • denotes projecting u onto the region

17
SOR Algorithm Convergence
  • Above optimality conditions lead to the SOR
    algorithm
  • Remember, optimality conditions are expressed as
  • SOR Linear Convergence Luo-Tseng 1993
  • The iterates of the SOR algorithm
    converge R-linearly to a solution of the
    dual problem
  • The objective function values
    converge Q-linearlyto

18
Numerical Testing
  • Comparison of Linear Nonlinear Kernels using
  • Linear Programming
  • Quadratic Programming - SOR Formulations
  • Data Sets
  • UCI Liver Disorders 345 points in R6
  • Bell Labs Checkerboard 1000 points in R2
  • Gaussian Synthetic 1000 points in R32
  • SCDS Synthetic 1 million points in R32
  • Massive Synthetic 10 million points in R32
  • Machines
  • Cluster of 4 Sun Enterprise E6000 machines each
    consisting of 16 UltraSPARC II 250 MHz Processors
    with 2 Gig RAM
  • Total 64 Processors, 8 Gig RAM

19
Comparison of Linear Nonlinear SVMsLinear
Programming Generated
  • Nonlinear kernels yield better training and
    testing set correctness

20
SOR Results
  • Comparison of linear and nonlinear kernels
  • Examples of training on massive data
  • 1 million point dataset generated by SCDS
    generator
  • Trained completely in 9.7 hours
  • Tuning set reached 99.7 of final accuracy in 0.3
    hours
  • 10 million point randomly generated dataset
  • Tuning set reached 95 of final accuracy in 14.3
    hours
  • Under 10,000 iterations

21
Conclusions
  • Linear programming and successive overrelaxation
    can generate complex nonlinear separating
    surfaces via GSVMs
  • Nonlinear separating surfaces improve
    generalization over linear ones
  • SOR can handle very large problems not (easily)
    solveable by other methods
  • SOR scales up with virtually no changes
  • Future directions
  • Parallel SOR for very large problems not resident
    in memory
  • Massive multicategory discrimination via SOR
  • Support vector regression

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com