Lecture 19. - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Lecture 19.

Description:

Title: Lecture Title Author: Yu Hen Hu Last modified by: Yu Hen Hu Created Date: 8/20/2001 1:09:43 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 11
Provided by: YuH62
Category:
Tags: hyper | lecture

less

Transcript and Presenter's Notes

Title: Lecture 19.


1
Lecture 19.  SVM (III)Kernel Formulation
2
Outline
  • Kernel representation
  • Mercer's Theorem
  • SVM using Kernels

3
Inner Product Kernels
In general, if the input is first transformed via
a set of nonlinear functions ?i(x) and then
subject to the hyperplane classifier
Define the inner product kernel as
one may obtain a dual optimization problem
formulation as
Often, dim of ? (p1) gtgt dim of x!
4
Polynomial Kernel
Consider a polynomial kernel
Let K(x,y) ?T(x) ?(y), then ?(x) 1 x12, ?,
xm2, ?2 x1, ?, ?2xm, ?2 x1 x2, ?, ?2 x1xm, ?2
x2 x3, ?, ?2 x2xm, ?,?2 xm?1xm 1 ?1(x), ?,
?p(x) where p 1 m m (m?1) (m?2) ?
1 (m2)(m1)/2 Hence, using a kernel, a low
dimensional pattern classification problem (with
dimension m) is solved in a higher dimensional
space (dimension p1). But only ?j(x)
corresponding to support vectors are used for
pattern classification!
5
Numerical Example XOR Problem
Training samples (?1 ?1 ?1), (?1 1 1), (1 ?1 1), (1 1 ?1)
x x1, x2T. Use K(x,y) (1 xTy)2 one
has ?(x) 1 x12 x22 ?2 x1, ?2 x2, ?2
x1x2T
Note dim?(x) 6 gt dimx 2! Dim(K) Ns
of support vectors.
6
XOR Problem (Continued)
Note that K(xi, xj) can be calculated directly
without using ?!
The corresponding Lagrange multiplier ? (1/8)1
1 1 1T.
Hence the hyper-plane is y wT?(x) ?
x1x2
(x1, x2) (?1, ?1) (?1, 1) (1,?1) (1,1)
y ?1 x1x2 ?1 1 1 ?1
7
Other Types of Kernels
type of SVM K(x,y) Comments
Polynomial learning machine (xTy 1)p p selected a priori
Radial basis function ?2 selected a priori
Two-layer perceptron tanh(?oxTy ?1) only some ?o and ?1 values are feasible.
What kernel is feasible? It must satisfy the
"Mercer's theorem"!
8
Mercer's Theorem
Let K(x,y) be a continuous, symmetric kernel,
defined on a? x,y ? b. K(x,y) admits an
eigen-function expansion
with ?i gt 0 for each i. This expansion converges
absolutely and uniformly if and only if
for all ?(x) such that
9
Testing with Kernels
For many types of kernels, ?(x) can not be
explicitly represented or even found. However,
Hence there is no need to know ?(x) explicitly!
For example, in the XOR problem, f (1/8)?1
1 1 ?1T. Suppose that x (?1, 1), then
10
SVM Using Nonlinear Kernels
?0
K(x,xj)
x1
?0
x1







f
W
?P
?P
K(x,xj)
xN
xN
Nonlinear transform
Kernel evaluation
Nonlinear transform
  • Using kernel, low dimensional feature vectors
    will be mapped to high dimensional (may be
    infinite dim) kernel feature space where the data
    are likely to be linearly separable.
Write a Comment
User Comments (0)
About PowerShow.com