Title: Support%20Vector%20Machines%20%20S.V.M.%20%20Special%20session
1Support Vector Machines S.V.M. Special session
Bernhard Schölkopf Stéphane Canu
GMD-FIRST I.N.S.A. - P.S.I.
http//svm.first.gmd.de/ http//psichaud.insa-rou
en.fr/scanu/
2radial SVM
3Road map
- linear discrimination the separable case
- linear discrimination the NON separable case
- quadratic discrimination
- radial SVM
- principle
- 3 regularization hyperparametres
- some benchmark results (glass data)
- SMV for regression
4What s new with SVM
Artificial Neural Networks Support Vector
Machine
- From biology to Machine
learning - It works ! Some reason
- formalization of learning statistical learning
theory - learning from data
- From maths ! to Machine
learning minimization - universality learn every thing Kernel
trick - complexity control but not any thing
Margin - minimization constraints
5Space functional
Kernels trick
6Minimization with constraints
L(x,?) the Lagrangian (Lagrange, 1788)
7Minimization with constraintsdual formulation
Phase 1
Phase 2
8Linear discriminationthe separable case
wx b0
9Linear discriminationthe separable case
With the largest MARGIN
Margin
wx b0
Margin
10Linear discriminationthe separable case
y
1
x
- 1
11Linear discriminationthe separable case
y wx
y
1
MARGIN
x
MARGIN
- 1
12Linear discriminationthe separable case
With the largest MARGIN
Margin
wx b0
Margin
13Linear classification- the separable case
14Equality constraint integration
15Inequality constraint integration
QP
While (???) do not verify optimality
conditions ? M-1 b
and ? - H ? c ??y if
?lt0, a constraint is blocked (?i0)
(an active variable is
eliminated) else if ? lt 0, a constraint is
relaxed
16Linear classification the non separable case
Error variables
17quadratic SVM
18polynomial classification
Rang(H) 5 regularization needed
19Gaussian Kernel based S.V.M.
201 d example
Class 1 mixture of 2 gaussian Class 2
gaussian
Training set
Output of the SVM for the test set
Margin
Support vectors
213 regularization parameters
- C the superior bound
- ? the kernel bandwidth
K?(x,y) - ????the linear system regularization
- H?b gt (H?I)?b
22Small bandwidth and large C
23Large bandwidth and large C
24Large bandwidth and small C
25SVMforregression
26Example...
27? small and ? also
28Geostatistics
29An other way to see things (Girosi, 97)
30SVM history and trends
The pioneers
Vapnik, V. Lerner, A. 1963 statistical learning
theory
Mangasarian, O. 1965, 1968 optimization
Kimeldorf, G Wahba, G 1971 non parametric
regression splines
The 2nd start ANN, learning computers...
Boser, B. Guyon, I.. Vapnik, V. 1992
Bennett, K. Mangasarian, O. 1992
Trends...
- Optimization
- Vapnik
- Osuna, E. Girosi,
- John C. Platt
- Linda Kaufman
- Thorsten Joachims
- Applications
- on-line handwritten C. R.
- Face recognition
- Text mining
- ...
- Learning Theory Cortes, C. 1995.
- soft margin classifier,
- effective VC-dimensions
- other formalisms, ...
31Optimization issuesQP with constraints
- Box constraints
- H is positive semidefinite (beware commercial
solver) - Size of H ! But a lot of l are 0 or C
- active constraint set, starting with l 0
- do not compute (store) the whole H
- chunk
- multiclass issue !
32Optimization issues
- Solve the whole problem
- commercial LOQO (primal-dual approach), MINOS,
Matlab !!! - Vapnik More and Toraldo (1991)
- Decompose the problem
- Chunking (Vapnik, 82, 92),
- Ozuna Girosi (implemented in SVMlight by
Thorsten Joachims, 98) - Sequential Minimal Optimization (SMO) John C.
Platt, 98 - No H Start from 0 - active set technique (Linda
Kaufman, 98) - minimize the cost function
- 2nd order Newton,
- conjugate gradient, projected conjugate gradient
PCG, Burges, 98 - select the relevant constraints
- Interior point methods
- Moré, 91, Z. Dostal, 97 and others...
33Some benchmark considerations (Platt 98)
- Osunas decomposition technique permits the
solution of SVMs via fixed-size QP subproblems - Using two-variable QP subproblems (SMO) does not
require QP library - SMO trades off QP time for kernel evaluation time
- Optimizations can dramatically reduce kernel time
- Linear SVMs (useful for text categorization)
- Sparse dot products
- Kernel caching (good for smaller problems,
Thorsten Joachims, 98) - SMO can be much faster than other techniques for
some problems - what about active set and interior points
technique ?
34open issues
- VC Entropy for Margin Classifiers learning
bounds - other margin classifiers boosting
- Non L2 (quadratic) cost function Sparse coding
(Drezet Harrsion) - curse of dimensionality local vs global
- kernel influence (Tsuda)
- applications
- classification (Weston Watkins),
- to regression (Pontil al.)
- face detection (Fernandez Viennet)
- algorithms (Christiani Campbell)
- making bridges - other formalisms
- bayesian (Kwok),
- statistical mechanics (Buhot Gordon),
- logic (Sebag),
35Books in Support Vector Research
- V. Vapnik, The Nature of Statistical Learning
Theory. Springer-Verlag, 1995, - Statistical
Learning Theory. Wiley, 1998. -
- SVM introductive chapter in
- S. Haykin, Neural Networks, a Comprehensive
Foundation. Macmillan, New York, NY., 1998 (2nd
ed). - V. Cherkassky and F. Mulier Learning from Data
Concepts, Theory, and Methods. Wiley, 1998. - C.J.C. Burges 1998. A tutorial on support vector
machines for pattern recognition. - Data Mining and Knowledge, Discovery, Vol 2
Number 2. - Schölkopf, B. 1997. Support Vector Learning. PhD
Thesis. - Published by R. Oldenbourg Verlag, Munich, 1997.
ISBN 3-486-24632-1. -
- Smola, A. J. 1998. Learning with Kernels. PhD
Thesis. Published by GMD, Birlinghoven, 1999 - NIPS 97 workshops book B. Schölkopf, C.
Burges, A. Smola. Advances in Kernel Methods
Support Vector Machines, MIT Press, Cambridge,
MA December 1998, - NIPS 98 workshops book on large margin
classifier is coming
36Events in Support Vector Research
ACAI '99 WORKSHOP Support Vector Machine
Theory and Applications Workshop on
Support Vector Machines - IJCAI'99, August 2,
1999, Stockholm, Sweden EUROCOLT'99
workshop on Kernel Methods , March 27, 1999,
Nordkirchen Castle, Germany
37Conclusion
SVM select relevant patterns in a robust way -
svm.cs.rhbnc.ac.uk Matlab code available under
request - scanu_at_insa-rouen.fr Multi class
problems Small error