Support Vector Machines - PowerPoint PPT Presentation

About This Presentation

Title:

Support Vector Machines

Description:

Title: Radial Basis Function Networks Author: M.W. Mak Last modified by: hkpu Created Date: 8/8/1996 11:12:16 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 35

Provided by: MW1

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machines

1
Support Vector Machines

1. Introduction to SVMs
2. Linear SVMs
3. Non-linear SVMs

References 1. S.Y. Kung, M.W. Mak, and S.H. Lin.
Biometric Authentication A Machine Learning
Approach, Prentice Hall, to appear. 2. S.R. Gunn,
1998. Support Vector Machines for Classification
and Regression. (http//www.isis.ecs.soton.ac.uk/r
esources/svminfo/) 3. Bernhard Schölkopf.
Statistical learning and kernel methods. MSR-TR
2000-23, Microsoft Research, 2000.
(ftp//ftp.research.microsoft.com/pub/tr/tr-2000-2
3.pdf) 4. For more resources on support vector
machines, see http//www.kernel-machines.org/
2
Introduction

SVMs were developed by Vapnik in 1995 and are
becoming popular due to their attractive features
and promising performance.
Conventional neural networks are based on
empirical risk minimization where network weights
are determined by minimizing the mean squares
error between the actual outputs and the desired
outputs.
SVMs are based on the structural risk
minimization principle where parameters are
optimized by minimizing classification error.
SVMs have been shown to posses better
generalization capability than conventional
neural networks.

3
Introduction (Cont.)

Given N labeled empirical data

(1)
where X is the set of input data in and
yi are the labels.
Domain X
4
Introduction (Cont.)

We construct a simple classifier by computing the
means of the two classes

(2)
where N1 and N2 are the number of data in the
class with positive and negative labels,
respectively.

We assign a new point x to the class whose mean
is closer to it.
To achieve this, we compute

5
Introduction (Cont.)

Then, we determine the class of x by checking
whether the vector connecting x and c encloses an
angle smaller than ?/2 with the vector

Domain X
where
x
6
Introduction (Cont.)

In the special case where b 0, we have

(3)

This means that we use ALL data point xi, each
being weighted equally by 1/N1 or 1/N2, to define
the decision plane.

7
Introduction (Cont.)
x
Decision plan
Domain X
8
Introduction (Cont.)

However, we might want to remove the influence of
patterns that are far away from the decision
boundary, because their influence is usually
small.
We may also select only a few important data
point (called support vectors) and weight them
differently.
Then, we have a support vector machine.

9
Introduction (Cont.)
Margin
Support vectors
x
Decision plane
Domain X

We aim to find a decision plane that maximizes
the margin.

10
Linear SVMs

Assume that all training data satisfy the
constraints

(4)
which means
(5)

Training data points for which the above equality
holds lie on hyperplanes parallel to the decision
plane.

11
Linear SVMs (Conts.)
Margin d

Therefore, maximizing the margin is equivalent to
minimizing w2.

12
Linear SVMs (Lagrangian)

We minimize w2 subject to the constraint
that

(6)

This can be achieved by introducing Lagrange
multipliers
and a Lagrangian

(7)

The Lagrangian has to be minimized with respect
to w and b and maximized with respect to

13
Linear SVMs (Lagrangian)

Setting

We obtain

(8)

Patterns for which are called Support
Vectors. These vectors lie on the margin and
satisfy

where S contains the indexes to the support
vectors.

Patterns for which are considered to be
irrelevant to the classification.

14
Linear SVMs (Wolfe Dual)

Substituting (8) into (7), we obtain the Wolfe
dual

(9)

The hyper-decision plane is thus

15
Linear SVMs (Example)

Analytical example (3-point problem)

Objective function

16
Linear SVMs (Example)

We introduce another Lagrange multiplier ? to
obtain the Lagrangian

Differentiating F(a, ?) with respect to ? and ai
and set the results to zero, we obtain

17
Linear SVMs (Example)

Substitute the Lagrange multipliers into Eq. 8

18
Linear SVMs (Example)

4-point linear separable problem

4 SVs
3 SVs
19
Linear SVMs (Non-linearly separable)

Non-linearly separable patterns that cannot be
separated by a linear decision boundary without
incurring classification error.

Data that causes classification error in linear
SVMs
20
Linear SVMs (Non-linearly separable)

We introduce a set of slack variables
with

The slack variables allow some data to violate
the constraints defined for the linearly
separable case (Eq. 6)

Therefore, for some we
have

21
Linear SVMs (Non-linearly separable)

E.g. because x10
and x19 are inside the margins, i.e. they
violate the constraint (Eq. 6).

22
Linear SVMs (Non-linearly separable)

For non-separable cases

where C is a user-defined penalty parameter to
penalize any violation of the margins.

The Lagrangian becomes

23
Linear SVMs (Non-linearly separable)

Wolfe dual optimization

The output weight vector and bias term are

24
2. Linear SVMs (Types of SVs)

Three types of support vectors

On the margin

2. Inside the margin
3. Outside the margin
25
2. Linear SVMs (Types of SVs)
26
2. Linear SVMs (Types of SVs)
Swapping Class 1 and Class 2
27
2. Linear SVMs (Types of SVs)

Effect of varying C

C 0.1
C 100
28
3. Non-linear SVMs

In case the training data X are not linearly
separable, we may use a kernel function to map
the data from the input space to a feature space
where data become linearly separable.

Decision boundary
Decision boundary
Input Space (Domain X)
Feature Space
29
3. Non-linear SVMs (Conts.)