Kernel Matching Reduction Algorithms

About This Presentation

Title:

Kernel Matching Reduction Algorithms

Description:

KMP appends kernel functions iteratively to the classification model. By contrary, KMRAs reduce the size of the dictionary step by step, ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 45

Provided by: sistSwjt3

Category:

more less

Transcript and Presenter's Notes

Title: Kernel Matching Reduction Algorithms

1

Kernel Matching Reduction Algorithms
for Classification
Jianwu Li and Xiaocheng Deng
Beijing Institute of Technology

2
Introduction

Kernel-based pattern classification techniques
Support vector machines (SVM)
Kernel linear discriminant analysis (KLDA)
Kernel Perceptrons

3
Introduction

Support vector machines (SVM)
Structural risk minimization (SRM)
Maximum margin classification
Quadratic optimization problem
Kernel trick

4
Introduction

Support vector machines (SVM)
Support vectors (SV)
Sparse solutions

5
Introduction

Kernel matching pursuit (KMP)
KMP appends functions to an initially empty basis
sequentially, from a redundant dictionary of
functions, to approximate a classification
function by using a certain loss criterion.
KMP can produce much sparser models than SVMs.

6
Introduction

Kernel Matching Reduction Algorithms (KMRAs)
Inspired by KMP and SVMs, we propose kernel
matching reduction algorithms.
Different from KMP, kernel matching reduction
algorithms (KMRAs), are proposed to perform a
reverse procedure in this paper.

7
Introduction

Kernel Matching Reduction Algorithms (KMRAs)
Firstly, all training examples are selected to
construct a function dictionary.
Then the function dictionary is reduced
iteratively by linear support vector machines
(SVMs).
During the reduction process, the parameters of
the functions in the dictionary can be adjusted
dynamically.

8
Kernel Matching Reduction Algorithms

Constructing a Kernel-Based Dictionary
For a binary classification problem, assume there
exist l training examples, which form the
training set S (x1, y1),
(x2, y2), . . . , (xl, yl),
where xi ? Rd, yi ?-1, 1, and yi represents
the class label of the point xi, i 1, 2, . . .
, l.

9
Kernel Matching Reduction Algorithms

Constructing a Kernel-Based Dictionary
Given a kernel function K Rd Rd ? R, similar
to KMP, we use kernel functions, centered on the
training points, as our dictionary
D K(x, xi)i 1, . . . , l.

10
Kernel Matching Reduction Algorithms

Constructing a Kernel-Based Dictionary
Here, the Gaussian kernel function is selected

11
Kernel Matching Reduction Algorithms

Constructing a Kernel-Based Dictionary
The value of si should be set to keep the
influence of the local domain around xi and
prevent xi from having a high activation for the
field far from xi.

12
Kernel Matching Reduction Algorithms

Constructing a Kernel-Based Dictionary
Therefore, we adopt the following heuristic
method
Where are p nearest neighbors of xi. Such,
the receptive width of each point is determined
to cover a certain region in the sample space.

13
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
Using all the kernel functions from the
kernel-based dictionary D K(x, xi)i 1, . . .
, l, we construct a mapping from original space
to feature space.
Any training example xi in S is mapped to a
corresponding point zi in S, where zi (K(xi,
x1),K(xi, x2), . . . , K(xi, xl)).
The training set S (x1, y1), (x2, y2), . . . ,
(xl, yl) in original space is mapped to S (z1,
y1), (z2, y2), . . . , (zl, yl) in feature
space.

14
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
we design a linear decision function gl(zt)
sign(fl(zt)) in feature space, and
which corresponds to the nonlinear form in
original space
where w (w1, w2, . . . , wl) represents
weights of every dimension in z.

15
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
We can decide which kernel functions are
important for classification, and which are not,
according to their weights magnitudes wi in (3)
or (4), where wi denotes the absolute value of
wi. Those redundant kernel functions, which have
lowest weights magnitudes, can be deleted from
the dictionary to reduce the model.

16
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
If we use the usual least squares error criterion
to find this function, it is not practical, since
the number of training examples, at the
beginning, is equal to, or near to, the dimension
number of the feature space S, and we will
confront the problem of the not-invertible matrix.

17
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
In fact, support vector machines (SVMs), based on
the structural risk minimization, are fit for
solving supervised classification problems with
high dimensions. we also adopt linear SVMs to
find the classification function in (3) or (4) on
S.

18
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
The optimization objective of linear SVMs is to
minimize
subject to the constraints
yi(w zi) b 1 - ?i, and ?i 0, i 1, 2,
, l ,

19
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
wi denotes the contribution of zi to the
classifier in
(3), and the higher the value of wi, the
more contribution of zi to the model.
Consequently, we can rank zi according to the
values of wi (i 1, 2, , l) from large to
small. We can also rank xi by wi, because xi is
the preimage of zi in the original space.

20
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
The xi with the smallest wi can be deleted from
the dictionary D, and D can be reduced to D.
Then we can continue this procedure on the new
dictionary D. Thus, the process can be
iteratively performed until a given stop
criterion is satisfied.
Note that, each s should be computed again on the
new dictionary D, according to (2), after D is
reduced to D every time, such that the receptive
widths of kernel functions in D can always cover
the whole sample space.

21
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
We can set a tolerant minimum accuracy d for the
training examples, as the termination criterion
of this procedure.
We expect to gain a simplest model under the
condition of guaranteeing the satisfied
classification accuracy for all training
examples.
This idea accords with the principles of minimum
description length and Occams Razor.
Therefore, this algorithm can be expected to have
a good generalization ability.

22
Kernel Matching Reduction Algorithms

Reducing the Kernel-Based Dictionary by Linear
SVMs
Different from KMP which appends kernel functions
to the last model gradually, this reduction
strategy can expect to avoid local optima, just
due to deleting redundant functions from the
functions dictionary iteratively.

23
Kernel Matching Reduction Algorithms

The Detailed Procedure of KMRAs
Step 1, Set the parameter p in (2), the cross
validation fold number v for determining C in
(5), and the required classification accuracy d
on the training examples.
Step 2, Input training examples S (x1, y1),
(x2, y2), . . . , (xl, yl).
Step 3, Compute each s by the equation (2), and
construct the kernel-based dictionary D K(x,
xi)i 1, . . . , l.

24
Kernel Matching Reduction Algorithms

The Detailed Procedure of KMRAs
Step 4, Transform S to S by the dictionary D.
Step 5, Determine C by v-fold cross validation.
Step 6, Train the linear SVM with the penalty
factor C on S, and obtain the classification
model, including wi, i 1, 2, . . . , l.
Step 7, Rank xi by their weights magnitudes wi,
i 1, 2, . . . , l.

25
Kernel Matching Reduction Algorithms

The Detailed Procedure of KMRAs
Step 8, If the classification accuracy of this
model for training data is higher than d, delete
from D the K(x, xi) which has the smallest wi,
then adjust each s for new D by (2), and go to
Step 4 Otherwise go to Step 9.
Step 9, Output the classification model, which
satisfies the accuracyd with the simplest
structure.

26
Kernel Matching Reduction Algorithms

The Detailed Procedure of KMRAs
The reduction step 8 can be generalized to remove
more than one basis function per iteration for
improving the training speed.

27
Comparing with Other Machine Learning Algorithms

Although KMRAs, KMP, SVMs, HSSVMs, and RBFNNs can
all generate a similar decision function shape as
the equation (4), KMRAs have distinct
characteristics, in the essence, compared with
several other algorithms.

28
Comparing with Other Machine Learning Algorithms

Differences with KMP
Both KMRA and KMP build kernel-based
dictionaries, but they adopt different ways to
select basis functions for last solutions. KMP
appends kernel functions iteratively to the
classification model. By contrary, KMRAs reduce
the size of the dictionary step by step, by
deleting redundant kernel functions.
Moreover, different from KMP, KMRAs utilize
linear SVMs to find solutions in feature space.

29
Comparing with Other Machine Learning Algorithms

KMRA Versus SVM
The main difference between KMRA and SVM consists
in the approaches of producing feature spaces.
KMRAs create the feature space by a kernel-based
dictionary, whereas SVMs by kernel functions.
Kernel functions in SVMs must satisfy Mercers
theorem, while KMRAs have no restrictions on
kernel functions in the dictionary . The
comparison between KMRAs and SVMs is similar to
that between KMP and SVM. In fact, we select
Gaussian kernel functions in this paper, which
can have different kernel widths obtained by the
equation (2), but those Gaussian kernel
functions, for all support vectors of SVMs, have
the same kernel width.

30
Comparing with Other Machine Learning Algorithms

Linking with HSSVMs
Hidden space support vector machines (HSSVMs),
also map input patterns into a high-dimensional
hidden space by a set of nonlinear functions, and
then train linear SVMs in the hidden space. From
this viewpoint of constructing feature spaces and
performing linear SVMs, KMRAs are similar to
HSSVMs. But we adopt an iterative procedure to
eliminate redundant kernel functions, until
obtaining a condense solution.
KMRAs can be considered as an improved version of
HSSVMs.

31
Comparing with Other Machine Learning Algorithms

Relation with RBFNNs
Although RBFNNs also build feature spaces using
usually Gaussian kernel functions, they create
discrimination functions in the least square
sense. However, KMRAs use linear SVMs, i.e. the
idea of structural risk minimization, to find
solutions.
In a broad sense, we can think of KMRAs as a
special model of RBFNNs with a new configuration
design strategy.

32
Experiments

Description on Data Sets and Parameter Settings
We compare KMRAs with SVMs, on four datasets
Wisconsin Breast Cancer, Pima Indians Diabetes,
Heart, and Australian, in which the former two
are from the UCI machine learning databases, and
the latter two from the Statlog database.
We directly use the LIBSVM software package for
performing the normal SVM.

33
Experiments

Description on Data Sets and Parameter Settings
Throughout the experiments
1. All training data and test data are normalized
to -1, 1.
2. Two-thirds of examples are randomly selected
as training examples, and the remaining one-third
as test those.
3. Gaussian kernel functions are chosen for SVMs,
in which the kernel width s and the penalty
parameter C are decided by ten-fold cross
validation on the training set.
4. p 2, in equation (2), is adopted.
5. v 5, in Step 5 of algorithm KMRA, is set.
6. For any dataset, SVM is firstly trained, and
then according to the classification accuracy of
SVM, we determine the stop accuracyd for KMRAs.

34
Experiments

Experimental Results
We first illustrate the results from standard
SVMs, including their parameters C and s in Table
1, and support vector numbers SVs, and the
prediction accuracy in Table 2.

35
Experiments

Experimental Results
We set the termination accuracy d 0.97, 0.8,
0.8, and 0.9 in KMRAs for these four datasets
respectively, according to the classification
accuracies of SVMs in Table 2.
We perform KMRAs on these datasets, and record
classification accuracies for test datasets per
iteration with algorithms running. Then we also
show the results in Fig. 1.

36
Experiments

Experimental Results
In Fig. 1, the accuracies of SVMs on test
examples are expressed in the thick straight
lines, and the thin curves represent the
classification performance of KMRAs. The row axis
denotes iteration times of KMRAs, that is to say,
numbers of kernel functions in the dictionary
decrease gradually from left to right.

37
Experiments

Experimental Results
For Diabetes and Australian, we can find the
prediction accuracies of KMRAs are improved
gradually with kernel functions in the dictionary
reducing. At the beginning of KMRAs runs, we can
conclude that the overfittings happen. Before
KMRAs end, the performance of KMRAs approaches
to, even is superior to, that of SVMs.
For Breast and Heart, from the beginning to the
end, the curves of KMRAs fluctuate up and down
around the accuracy lines of SVMs.

38
Experiments

Experimental Results
We further illustrate, in the Table 2, the
numbers of kernel functions (i.e. SVs), which
appear in the last classification functions, as
well as the corresponding prediction accuracies,
when KMRAs terminate.
Moreover, we record the best performance during
the iterative process of KMRAs, and also list
them in the Table 2.
From Table 2, compared with SVMs, KMRAs use much
sparser support vectors, whereas they can obtain
comparable results.

39
Experiments

Experimental Results

40
Experiments

Experimental Results

41
Experiments

Experimental Results

42
Conclusions

We propose KMRAs, which delete redundant kernel
functions from a kernel-based dictionary,
iteratively. Therefore, we expect KMRAs can avoid
local optima, and can have a good generalization
ability.
Experimental results demonstrate that, compared
with SVMs, KMRAs show comparable accuracies, but
with typically much sparser representations. This
means that KMRAs can have a fast classification
speed for test examples than SVMs.
In addition, analogous to SVMs, we can extend
KMRAs to solve multi-classification problems,
though we only consider the two-class situation
in this paper.

43
Conclusions

We can also find, KMRAs gain sparser models at
the expense of a long training time.
Consequently, future work should attempt to
explore how to reduce the training cost.
In conclusion, KMRAs provide a new problem
solving approach for classification.

44
Thanks!

Write a Comment

User Comments (0)