Title: Advanced Issues on SVM
1Advanced Issues on SVM
- A simple learning algorithm
- A geometric method to improve the
- performance of SVM
2Basic Ideas of SVM (1)
3Basic Ideas of SVM (2)
The final solution
4In Summary
- What SVM does
- To use a kernel function to map the original
input data into a high-dimensional feature space,
so that data points become more linearly
separable. - To look for a maximum margin classifier in the
feature space. - Advantages
- Duo the kernel trick, everything is very simple
- The solution only depends on support vectors.
5The Dual Problem
The quadratic optimization
The final solution
6A Gradient Descent Method
- Consider
- The dual problem
The approximation is good, provide the non-linear
mapping has a constant component. This is true
for many kernels.
7The Algorithm
- At each step, calculate the gradient
- The updating rule
Vijayakumar Wu (1999)
8 Model Selection
- Kernel mapping should let two classes of data to
be, as far as possible, linearly separated. - The performance of SVM largely depends on the
choice of kernel. - Kernel function implies a smoothness assumption
on the discriminating function (regularization
theory, Gaussian process). - Without prior knowledge, kernel has to chosen in
a data-dependent way.
9The Geometry underlying Kernel Mapping
- The induced Riemannian metric
The volume element
References 1. C. Burges (1999) 2. Amari and Wu
(1999).
10Scaling the Kernel
- Enlarge the separation between two classes
-
- D(x) is chosen to have relatively larger
value around the boundary. - Conformal transformation of the kernel
11A Two-Step Training Procedure
- The dilemma where is the boundary?
- Two-Step training
- First step Applying a primary kernel to identify
where the boundary is roughly located. - Second step Modifying the primary kernel
properly, and re-training SVM.
12Choice of D(x)
- Based on the positions of SVs
The fact SVs are often located around the
boundary The shortcoming D is susceptible to the
distribution of data
References 1. Amari and Wu (1999) 2. Wu and
Amari (2001)
13Choice of D(x)
- Based on the distance measure
The fact f(x), given by the first-pass
solution, is a suitable distance measure in
term of discrimination, which has properties
1. At the boundary, f(x)0
2. At the margin of the separating region
f(x)1 3. Out of the
separating region f(x)gt1.
Reference Williams, Sheng, Feng Wu (2005)
14An example the RBF kernel
- The RBF kernel
- After scaling
15Simulation
The training data
16Simulation
The first-pass solution
The contour of f(x) is illustrated by the color
level
17Simulation
The magnification effect
18Simulation
The second-pass solution