Support%20Vector%20Machines%20%20S.V.M.%20%20Special%20session - PowerPoint PPT Presentation

About This Presentation
Title:

Support%20Vector%20Machines%20%20S.V.M.%20%20Special%20session

Description:

ESANN'99 : Special session 7 on Support Vector Machines, ... V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods. Wiley, 1998. ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 38
Provided by: leconcom
Category:

less

Transcript and Presenter's Notes

Title: Support%20Vector%20Machines%20%20S.V.M.%20%20Special%20session


1
Support Vector Machines S.V.M. Special session
Bernhard Schölkopf Stéphane Canu
GMD-FIRST I.N.S.A. - P.S.I.
http//svm.first.gmd.de/ http//psichaud.insa-rou
en.fr/scanu/
2
radial SVM
3
Road map
  • linear discrimination the separable case
  • linear discrimination the NON separable case
  • quadratic discrimination
  • radial SVM
  • principle
  • 3 regularization hyperparametres
  • some benchmark results (glass data)
  • SMV for regression

4
What s new with SVM
Artificial Neural Networks Support Vector
Machine
  • From biology to Machine
    learning
  • It works ! Some reason
  • formalization of learning statistical learning
    theory - learning from data
  • From maths ! to Machine
    learning minimization
  • universality learn every thing Kernel
    trick
  • complexity control but not any thing
    Margin
  • minimization constraints

5
Space functional
Kernels trick
6
Minimization with constraints
L(x,?) the Lagrangian (Lagrange, 1788)
7
Minimization with constraintsdual formulation
Phase 1
Phase 2
8
Linear discriminationthe separable case
wx b0







9
Linear discriminationthe separable case
With the largest MARGIN
Margin
wx b0
Margin
10
Linear discriminationthe separable case
y
1
x
- 1




11
Linear discriminationthe separable case
y wx
y
1
MARGIN
x
MARGIN
- 1




12
Linear discriminationthe separable case
With the largest MARGIN
Margin
wx b0
Margin
13
Linear classification- the separable case
14
Equality constraint integration
15
Inequality constraint integration
QP
While (???) do not verify optimality
conditions ? M-1 b
and ? - H ? c ??y if
?lt0, a constraint is blocked (?i0)
(an active variable is
eliminated) else if ? lt 0, a constraint is
relaxed
16
Linear classification the non separable case
Error variables
17
quadratic SVM
18
polynomial classification
Rang(H) 5 regularization needed
19
Gaussian Kernel based S.V.M.
20
1 d example
Class 1 mixture of 2 gaussian Class 2
gaussian
Training set
Output of the SVM for the test set
Margin
Support vectors
21
3 regularization parameters
  • C the superior bound
  • ? the kernel bandwidth
    K?(x,y)
  • ????the linear system regularization
  • H?b gt (H?I)?b

22
Small bandwidth and large C
23
Large bandwidth and large C
24
Large bandwidth and small C
25
SVMforregression
26
Example...
27
? small and ? also
28
Geostatistics
29
An other way to see things (Girosi, 97)
30
SVM history and trends
The pioneers
Vapnik, V. Lerner, A. 1963 statistical learning
theory
Mangasarian, O. 1965, 1968 optimization
Kimeldorf, G Wahba, G 1971 non parametric
regression splines
The 2nd start ANN, learning computers...
Boser, B. Guyon, I.. Vapnik, V. 1992
Bennett, K. Mangasarian, O. 1992
Trends...
  • Optimization
  • Vapnik
  • Osuna, E. Girosi,
  • John C. Platt
  • Linda Kaufman
  • Thorsten Joachims
  • Applications
  • on-line handwritten C. R.
  • Face recognition
  • Text mining
  • ...
  • Learning Theory Cortes, C. 1995.
  • soft margin classifier,
  • effective VC-dimensions
  • other formalisms, ...

31
Optimization issuesQP with constraints
  • Box constraints
  • H is positive semidefinite (beware commercial
    solver)
  • Size of H ! But a lot of l are 0 or C
  • active constraint set, starting with l 0
  • do not compute (store) the whole H
  • chunk
  • multiclass issue !

32
Optimization issues
  • Solve the whole problem
  • commercial LOQO (primal-dual approach), MINOS,
    Matlab !!!
  • Vapnik More and Toraldo (1991)
  • Decompose the problem
  • Chunking (Vapnik, 82, 92),
  • Ozuna Girosi (implemented in SVMlight by
    Thorsten Joachims, 98)
  • Sequential Minimal Optimization (SMO) John C.
    Platt, 98
  • No H Start from 0 - active set technique (Linda
    Kaufman, 98)
  • minimize the cost function
  • 2nd order Newton,
  • conjugate gradient, projected conjugate gradient
    PCG, Burges, 98
  • select the relevant constraints
  • Interior point methods
  • Moré, 91, Z. Dostal, 97 and others...

33
Some benchmark considerations (Platt 98)
  • Osunas decomposition technique permits the
    solution of SVMs via fixed-size QP subproblems
  • Using two-variable QP subproblems (SMO) does not
    require QP library
  • SMO trades off QP time for kernel evaluation time
  • Optimizations can dramatically reduce kernel time
  • Linear SVMs (useful for text categorization)
  • Sparse dot products
  • Kernel caching (good for smaller problems,
    Thorsten Joachims, 98)
  • SMO can be much faster than other techniques for
    some problems
  • what about active set and interior points
    technique ?

34
open issues
  • VC Entropy for Margin Classifiers learning
    bounds
  • other margin classifiers boosting
  • Non L2 (quadratic) cost function Sparse coding
    (Drezet Harrsion)
  • curse of dimensionality local vs global
  • kernel influence (Tsuda)
  • applications
  • classification (Weston Watkins),
  • to regression (Pontil al.)
  • face detection (Fernandez Viennet)
  • algorithms (Christiani Campbell)
  • making bridges - other formalisms
  • bayesian (Kwok),
  • statistical mechanics (Buhot Gordon),
  • logic (Sebag),

35
Books in Support Vector Research
  • V. Vapnik, The Nature of Statistical Learning
    Theory. Springer-Verlag, 1995,
  • Statistical
    Learning Theory. Wiley, 1998.
  • SVM introductive chapter in
  • S. Haykin, Neural Networks, a Comprehensive
    Foundation. Macmillan, New York, NY., 1998 (2nd
    ed).
  • V. Cherkassky and F. Mulier Learning from Data
    Concepts, Theory, and Methods. Wiley, 1998.
  • C.J.C. Burges 1998. A tutorial on support vector
    machines for pattern recognition.
  • Data Mining and Knowledge, Discovery, Vol 2
    Number 2.
  • Schölkopf, B. 1997. Support Vector Learning. PhD
    Thesis.
  • Published by R. Oldenbourg Verlag, Munich, 1997.
    ISBN 3-486-24632-1.
  • Smola, A. J. 1998. Learning with Kernels. PhD
    Thesis. Published by GMD, Birlinghoven, 1999
  • NIPS 97 workshops book B. Schölkopf, C.
    Burges, A. Smola. Advances in Kernel Methods
    Support Vector Machines, MIT Press, Cambridge,
    MA December 1998,
  • NIPS 98 workshops book on large margin
    classifier is coming

36
Events in Support Vector Research
ACAI '99 WORKSHOP Support Vector Machine
Theory and Applications Workshop on
Support Vector Machines - IJCAI'99, August 2,
1999, Stockholm, Sweden EUROCOLT'99
workshop on Kernel Methods , March 27, 1999,
Nordkirchen Castle, Germany
37
Conclusion
SVM select relevant patterns in a robust way -
svm.cs.rhbnc.ac.uk Matlab code available under
request - scanu_at_insa-rouen.fr Multi class
problems Small error
Write a Comment
User Comments (0)
About PowerShow.com