Support Vector Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machines

Description:

Title: Radial Basis Function Networks Author: M.W. Mak Last modified by: hkpu Created Date: 8/8/1996 11:12:16 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 35
Provided by: MW1
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines


1
Support Vector Machines
  • 1. Introduction to SVMs
  • 2. Linear SVMs
  • 3. Non-linear SVMs

References 1. S.Y. Kung, M.W. Mak, and S.H. Lin.
Biometric Authentication A Machine Learning
Approach, Prentice Hall, to appear. 2. S.R. Gunn,
1998. Support Vector Machines for Classification
and Regression. (http//www.isis.ecs.soton.ac.uk/r
esources/svminfo/) 3. Bernhard Schölkopf.
Statistical learning and kernel methods. MSR-TR
2000-23, Microsoft Research, 2000.
(ftp//ftp.research.microsoft.com/pub/tr/tr-2000-2
3.pdf) 4. For more resources on support vector
machines, see http//www.kernel-machines.org/
2
Introduction
  • SVMs were developed by Vapnik in 1995 and are
    becoming popular due to their attractive features
    and promising performance.
  • Conventional neural networks are based on
    empirical risk minimization where network weights
    are determined by minimizing the mean squares
    error between the actual outputs and the desired
    outputs.
  • SVMs are based on the structural risk
    minimization principle where parameters are
    optimized by minimizing classification error.
  • SVMs have been shown to posses better
    generalization capability than conventional
    neural networks.

3
Introduction (Cont.)
  • Given N labeled empirical data

(1)
where X is the set of input data in and
yi are the labels.
Domain X
4
Introduction (Cont.)
  • We construct a simple classifier by computing the
    means of the two classes

(2)
where N1 and N2 are the number of data in the
class with positive and negative labels,
respectively.
  • We assign a new point x to the class whose mean
    is closer to it.
  • To achieve this, we compute

5
Introduction (Cont.)
  • Then, we determine the class of x by checking
    whether the vector connecting x and c encloses an
    angle smaller than ?/2 with the vector

Domain X
where
x
6
Introduction (Cont.)
  • In the special case where b 0, we have

(3)
  • This means that we use ALL data point xi, each
    being weighted equally by 1/N1 or 1/N2, to define
    the decision plane.

7
Introduction (Cont.)
x
Decision plan
Domain X
8
Introduction (Cont.)
  • However, we might want to remove the influence of
    patterns that are far away from the decision
    boundary, because their influence is usually
    small.
  • We may also select only a few important data
    point (called support vectors) and weight them
    differently.
  • Then, we have a support vector machine.

9
Introduction (Cont.)
Margin
Support vectors
x
Decision plane
Domain X
  • We aim to find a decision plane that maximizes
    the margin.

10
Linear SVMs
  • Assume that all training data satisfy the
    constraints

(4)
which means
(5)
  • Training data points for which the above equality
    holds lie on hyperplanes parallel to the decision
    plane.

11
Linear SVMs (Conts.)
Margin d
  • Therefore, maximizing the margin is equivalent to
    minimizing w2.

12
Linear SVMs (Lagrangian)
  • We minimize w2 subject to the constraint
    that

(6)
  • This can be achieved by introducing Lagrange
    multipliers
  • and a Lagrangian

(7)
  • The Lagrangian has to be minimized with respect
    to w and b and maximized with respect to

13
Linear SVMs (Lagrangian)
  • Setting
  • We obtain

(8)
  • Patterns for which are called Support
    Vectors. These vectors lie on the margin and
    satisfy

where S contains the indexes to the support
vectors.
  • Patterns for which are considered to be
    irrelevant to the classification.

14
Linear SVMs (Wolfe Dual)
  • Substituting (8) into (7), we obtain the Wolfe
    dual

(9)
  • The hyper-decision plane is thus

15
Linear SVMs (Example)
  • Analytical example (3-point problem)
  • Objective function

16
Linear SVMs (Example)
  • We introduce another Lagrange multiplier ? to
    obtain the Lagrangian
  • Differentiating F(a, ?) with respect to ? and ai
    and set the results to zero, we obtain

17
Linear SVMs (Example)
  • Substitute the Lagrange multipliers into Eq. 8

18
Linear SVMs (Example)
  • 4-point linear separable problem

4 SVs
3 SVs
19
Linear SVMs (Non-linearly separable)
  • Non-linearly separable patterns that cannot be
    separated by a linear decision boundary without
    incurring classification error.

Data that causes classification error in linear
SVMs
20
Linear SVMs (Non-linearly separable)
  • We introduce a set of slack variables
    with
  • The slack variables allow some data to violate
    the constraints defined for the linearly
    separable case (Eq. 6)
  • Therefore, for some we
    have

21
Linear SVMs (Non-linearly separable)
  • E.g. because x10
    and x19 are inside the margins, i.e. they
    violate the constraint (Eq. 6).

22
Linear SVMs (Non-linearly separable)
  • For non-separable cases

where C is a user-defined penalty parameter to
penalize any violation of the margins.
  • The Lagrangian becomes

23
Linear SVMs (Non-linearly separable)
  • Wolfe dual optimization
  • The output weight vector and bias term are

24
2. Linear SVMs (Types of SVs)
  • Three types of support vectors
  1. On the margin

2. Inside the margin
3. Outside the margin
25
2. Linear SVMs (Types of SVs)
26
2. Linear SVMs (Types of SVs)
Swapping Class 1 and Class 2
27
2. Linear SVMs (Types of SVs)
  • Effect of varying C

C 0.1
C 100
28
3. Non-linear SVMs
  • In case the training data X are not linearly
    separable, we may use a kernel function to map
    the data from the input space to a feature space
    where data become linearly separable.

Decision boundary
Decision boundary
Input Space (Domain X)
Feature Space
29
3. Non-linear SVMs (Conts.)
  • The decision function becomes

(a)
30
3. Non-linear SVMs (Conts.)
31
3. Non-linear SVMs (Conts.)
  • The decision function becomes
  • For RBF kernels
  • For polynomial kernels

32
3. Non-linear SVMs (Conts.)
  • The optimization problem becomes

(9)
  • The decision function becomes

33
3. Non-linear SVMs (Conts.)
  • The effect of varying C on RBF-SVMs

C 1000
C 10
34
3. Non-linear SVMs (Conts.)
  • The effect of varying C on Polynomial-SVMs

C 1000
C 10
Write a Comment
User Comments (0)
About PowerShow.com