Title: Study of Nearest Point Algorithm for SVM Classifier Design
1Study of Nearest Point Algorithm for SVM
Classifier Design
- EE645 Final Project
- Hui Ou
2Introduction
- Classical method to solve SVM optimizing problem.
- Quadratic Programming Problem Require enormous
matrix storages and intensive matrix operations. - Fast iterative Algorithms were introduced to
solve SVM. - Solve the QP problems analytically in the dual
space. - Chunking
- Decomposition Method
- Sequential Minimal Optimization (SMO)
- Solve the Nearest Point Problem directly based on
the geometric interpretation. - Goal of classification is to find the best
decision rule to separate two classes (U and V)
of points. - The best decision boundary can be constructed by
finding the two closest points in the two convex
hulls generated respectively by the two classes.
3Motivation
- SVM has been formulated as a special sort of
optimization problem, and NPA is a method to
solve it in geometric way. - The SVM classification problem is converted to a
problem of computing the nearest point between
two convex polytopes. (Two category
classification problem is considered.) - The performance of NPA is competitive with the
SMO and SVM-light.
4Contents
- Introduction of Nearest Point Problem.
- General idea of NPP.
- Reformulation of SVM as a Nearest Point Problem.
- Hard convex hull based on L-2 norm SVM.
- Soft convex hull based on L-1 norm SVM.
- Optimal Criteria for NPP
- Iterative Algorithms based on L-2 norm for NPP.
- Gilberts Algorithm
- Mitchell-Demyanov-Malozemov Algorithm (MDM)
- A fast iterative algorithm combined the idea of
the two algorithms. - Discussion of L-1 norm.
- Simulation results.
- Conclusion.
5General Idea of Nearest Point Algorithm
- Let U and V denote the two classes.
- Among all pairs of parallel hyperplanes that
separate the two classes, the pair with the
largest margin is the one which has (u - v) as
the normal direction, where (u,v) is a pair of
closest points of U and V. - Solution of NPP will be
6Reformulation of SVM as A Nearest Point Problem
- Notation
- X Input vector of the support vector machine.
- Z Feature space vector, zf(x).
- As in all SVM designs, we do not assume f to be
known - All computations will be done using only the
kernel function - I, J The index set for class 1 and class 2
respectively. - The SVM problem
- Without violation s.t.
- With violation, depends on the slack variables,
there will be two cases - L-1 norm, such as v-SVM
- L-2 norm, use a sum of squared violations in the
cost function.
7Reformulation of SVM as A Nearest Point Problem
- Based on L-2 norm, given a set S we use coS to
denote the hard convex hull of S. -
- Since I and J are finite sets, U and V are
convex polytopes. - Based on L-1 norm, we can define a soft convex
hull, which has one more constraint due to the
definition of v-SVM. - Nearest Point Problem
-
- We can rewrite the constraints of NPP as
-
-
8Optimality Criteria For NPP
- It is well known that the maximum of a linear
function over a convex polytope is attained by an
extreme point. - We can search in the direction of ?, and find
u(i) which can maximize ?.u in U and the points
in V, we search for v(j) that can maximize ?.v. - After that, we search in the line segment
cou,u(i), and cov,v(j), to find the pair of
points that can minimize the distance between the
two convex hulls.
9Algorithms For NPP
- The best general-purpose algorithms for NPP such
as Wolfes algorithm terminate within a finite
number of steps. - However, they require expensive matrix storage
and matrix operations in each step. - Unsuitable for large SVM design.
- Iterative algorithms
- Memory size needed is linear to the number of
training vectors. - Reach the solution asymptotically as the number
of iteration goes to infinity. - Better suited for SVM design.
- Some popular iterative algorithms for NPP
- Gilberts Algorithm
- Mitchell-Demyanov-Malozemov Algorithm (MDM)
10Gilberts Algorithm
- Gilberts Algorithm was one of the first
algorithms suggested for solving NPP. - NPP is equivalent to solve the following minimum
norm problem -
- Gilbert Step
- Choose z of Z.
- If this point minimize z2, then stop with zz
else set zu - v, where u and v maximize -zu
and zv respectively. - Compute z, the point on the line segment
joining z and z which has least norm. Set zz,
and back to step 2.
11A Few Iterations of Gilberts Algorithm on A Two
Dimensional Example
12Mitchell-Demyanov-Malozemov Algorithm (MDM)
- Unlike Gilberts algorithm, MDM algorithm
fundamentally uses the representation, z St
?t(zi(t)-zj(t)) in its basic operations. - In this case, only ?t and i(t), j(t) need to
be stored and maintained in order to represent z. - In each iteration, MDM algorithm attempts to
decrease the following ?(z) - ?(z)minzz, for z in Z min-z
(zi(t)-zj(t)) for t which has ?tgt0. - MDM looks for improvement of z along the
direction of z.
13The Idea of MDM Iteration
- Azi(tmin)-zj(tmin) Bz, which can minimize
zz. - MDM algorithm tries to crush the total slab
toward zero, while Gilberts Algorithm only
attempts to push the lower slab to zero.
14Comments on The Two Algorithms
- Gilberts algorithm makes rapid movement toward
the solution during its initial iterations,
however, it will be very slow as it approached
the final solution. This is because, when z
get small, it will be slow in driving the norm of
z to be smaller. - MDM algorithm works faster than Gilberts
algorithm, especially in the end stages when z
approaches z. - Algorithms which are much faster than MDM
algorithm can be designed using the following two
observations - It is easy to combine the ideas of Gilberts
algorithm and MDM algorithm into a hybrid
algorithm which is faster. - Working directly in the space where U and V are
located.
15Combination of Gilberts Algorithm and MDM
Algorithm
16A Fast Iterative Algorithm For NPP
- In this section we will discuss an algorithm for
NPP directly in the space in which U and V are
located. - Key idea is combine the Gilberts Algorithm and
MDM Algorithm together. - Cost function is in L2-norm.
- The stop criteria for this algorithm
- Lets define the stop condition first.
- We say that an index, k satisfies stop condition
at (u, v) if - If we can find such an index k (lets suppose it
is in I), then in the line segment joining u and
zk, there must be a point u which is closer to v
than u. - If we can not find an index k that satisfies the
above condition, then it will be a good time to
stop the algorithm.
17A Fast Iterative Algorithm for NPP
- Steps
- Choose u in U, v in V and set zu-v.
- Find an index k satisfying the stop condition. If
such an index cannot be found, stop with the
conclusion that the approximate optimality
criterion is satisfied. Else go to step 3 with
the k found. - Choose two convex polytopes
- Compute (u, v) to be a pair of closest points
minimizing the distance between U and V. - Set uu, vv and go back to step 2.
- In step 3, suppose we choose U as the line
segment joining u and zk, and Vv. (Lets
suppose k in I.) Then the algorithm is more close
to Gilberts algorithm.
18Final Solution of NPP
- Support Vectors
- For the zk, where is served as a
support vector. - Use the sign of the following function to decide
if a given x belongs to Class 1 or Class 2.
19Simulations
- The fast iterative nearest point algorithm in L-2
norm is implemented in MATLAB, and tested in the
following data sets - WDBC data
- Problem Set 1 5 data set
- Iris data
- The results are compared with SVM-light.
20Simulations
- WDBC data
- Randomly choose 304 examples as training
examples, test on the other 202 examples. - Percentage misclassification on the test set
Gaussian Kernel Polynomial Kernel
NPA 10.2 Optimal criteria can not satisfy.
SVM-light 33.9 6.6
21NPA Performance for WDBC Data Gaussian Kernel
22Simulations
- PS1, problem 5 data
- 100 training examples, 1000 test examples.
-
- Percentage misclassification on the test set
Gaussian Kernel Polynomial Kernel
NPA 7.4 5.4
SVM-light 7 5.3
23NPA Performance for PS1 5 Data Gaussian Kernel
24NPA Performance for PS1 5 Data Polynomial
Kernel
25Simulations
- Iris data
- Separate the second 50 data from the other 100
data. Randomly pick 50 data as training data, the
other 100 as testing data. - Separate the third 50 data from the other 100
data. Randomly pick 50 data as training data, the
other 100 as testing data.
Gaussian Kernel Polynomial Kernel
NPA 2 5
SVM-light 4 5
Gaussian Kernel Polynomial Kernel
NPA 3 4
SVM-light 5 4
26NPA Performance for Iris Data 1 Gaussian Kernel
27NPA Performance for Iris Data 1 Polynomial
Kernel
28NPA Performance for Iris Data 2 Gaussian Kernel
29NPA Performance for Iris Data 2 Polynomial
Kernel
30Conclusion for Simulation Results
- Advantages
- The misclassified percentage is competitive with
SVM-light. - The algorithm is quite straightforward to
implement. - Disadvantages
- The CPU time needed for training a neural network
using Nearest Point Algorithm is more than that
of SVM-light. - For some data sets, the optimality criteria for
the Nearest Point Algorithm cannot be satisfied,
and the algorithm cannot improve the situation
anymore.
31Discussion of NPP based on L-1 Norm Cost Function
- The Gilberts algorithm and MDM algorithm based
on L-2 norm could be modified to solve L-1 norm
v-SVM classification problems, using the soft
convex hulls. - The simulation results, as provided by Qing Tao
and Gao-wei Wu 2, shows that it is not as good
as L-2 norm algorithms. - The classification error is larger than MDM
algorithm. - The computational cost is also more than that of
MDM algorithm.
32Conclusion for NPA
- The Nearest Point Algorithm provides us a method
to solve the SVM classification problem on
geometric interpretation. - For two class classification problems, NPA is
competitive with other fast iterative algorithms,
such as SVM-light. - Unfortunately, this algorithm is not as popular
the other fast iterative algorithms due to the
following reasons - For two classes classification problems as
examined above, the computational costs for NPA
are more than that of SVM-light. - For some data sets, the optimality criteria for
NPA cannot be satisfied. - If we use this algorithm for three or more
classes classification problems, it will have to
find the convex hulls and margins for every two
classes, which will need much more CPU time to
converge. - This algorithm is not suitable for SVM regression
problems.
33Reference
- S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and
K.R.K. Murthy. A Fast Iterative Nearest Point
Algorithm for Support Vector Machine Classifier
Design. Online Available http//guppy.mpe.nus.e
du.sg/mpessk. Also in IEEE Transactions on
Neural Networks. 2000, 11(1)124-136. - Qing Tao, Gao-wei Wu. A General Soft Method for
Learning SVM Classifiers with L1 Norm.