Lecture 19.

About This Presentation

Title:

Lecture 19.

Description:

Title: Lecture Title Author: Yu Hen Hu Last modified by: Yu Hen Hu Created Date: 8/20/2001 1:09:43 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 11

Provided by: YuH62

Learn more at: https://engineering.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 19.

1
Lecture 19. SVM (III)Kernel Formulation
2
Outline

Kernel representation
Mercer's Theorem
SVM using Kernels

3
Inner Product Kernels
In general, if the input is first transformed via
a set of nonlinear functions ?i(x) and then
subject to the hyperplane classifier
Define the inner product kernel as
one may obtain a dual optimization problem
formulation as
Often, dim of ? (p1) gtgt dim of x!
4
Polynomial Kernel
Consider a polynomial kernel
Let K(x,y) ?T(x) ?(y), then ?(x) 1 x12, ?,
xm2, ?2 x1, ?, ?2xm, ?2 x1 x2, ?, ?2 x1xm, ?2
x2 x3, ?, ?2 x2xm, ?,?2 xm?1xm 1 ?1(x), ?,
?p(x) where p 1 m m (m?1) (m?2) ?
1 (m2)(m1)/2 Hence, using a kernel, a low
dimensional pattern classification problem (with
dimension m) is solved in a higher dimensional
space (dimension p1). But only ?j(x)
corresponding to support vectors are used for
pattern classification!
5
Numerical Example XOR Problem
Training samples (?1 ?1 ?1), (?1 1 1), (1 ?1 1), (1 1 ?1)
x x1, x2T. Use K(x,y) (1 xTy)2 one
has ?(x) 1 x12 x22 ?2 x1, ?2 x2, ?2
x1x2T
Note dim?(x) 6 gt dimx 2! Dim(K) Ns
of support vectors.
6
XOR Problem (Continued)
Note that K(xi, xj) can be calculated directly
without using ?!
The corresponding Lagrange multiplier ? (1/8)1
1 1 1T.
Hence the hyper-plane is y wT?(x) ?
x1x2
(x1, x2) (?1, ?1) (?1, 1) (1,?1) (1,1)
y ?1 x1x2 ?1 1 1 ?1
7
Other Types of Kernels
type of SVM K(x,y) Comments
Polynomial learning machine (xTy 1)p p selected a priori
Radial basis function ?2 selected a priori
Two-layer perceptron tanh(?oxTy ?1) only some ?o and ?1 values are feasible.
What kernel is feasible? It must satisfy the
"Mercer's theorem"!
8
Mercer's Theorem
Let K(x,y) be a continuous, symmetric kernel,
defined on a? x,y ? b. K(x,y) admits an
eigen-function expansion
with ?i gt 0 for each i. This expansion converges
absolutely and uniformly if and only if
for all ?(x) such that
9
Testing with Kernels
For many types of kernels, ?(x) can not be
explicitly represented or even found. However,
Hence there is no need to know ?(x) explicitly!
For example, in the XOR problem, f (1/8)?1
1 1 ?1T. Suppose that x (?1, 1), then
10
SVM Using Nonlinear Kernels
?0
K(x,xj)
x1
?0
x1

f
W
?P
?P
K(x,xj)
xN
xN
Nonlinear transform
Kernel evaluation
Nonlinear transform

Using kernel, low dimensional feature vectors
will be mapped to high dimensional (may be
infinite dim) kernel feature space where the data
are likely to be linearly separable.

Write a Comment

User Comments (0)