Learning Larger Margin Machine Locally and Globally - PowerPoint PPT Presentation

About This Presentation

Title:

Learning Larger Margin Machine Locally and Globally

Description:

The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Kaizhu Huang (kzhuang_at_cse.cuhk.edu.hk) Haiqin Yang, Irwin King, Michael R. Lyu – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 28

Provided by: CSE102

Category:

more less

Transcript and Presenter's Notes

Title: Learning Larger Margin Machine Locally and Globally

1
Learning Larger Margin Machine Locally and
Globally
The Chinese University of Hong Kong

Kaizhu Huang (kzhuang_at_cse.cuhk.edu.hk)
Haiqin Yang, Irwin King, Michael R. Lyu
Dept. of Computer Science and Engineering
The Chinese University of Hong Kong
July 5, 2004

2
Learning Larger Margin Machine Locally and
Globally
The Chinese University of Hong Kong

Contributions
Background
Linear Binary Classification
Motivation
Maxi-Min Margin Machine(M4)
Model Definition
Geometrical Interpretation
Solving Methods
Connections With Other Models
Nonseparable case
Kernelizations
Experimental Results
Future Work
Conclusion

3
The Chinese University of Hong Kong
Contributions

Theory A unified model of Support Vector
Machine (SVM), Minimax Probability Machine
(MPM), and Linear Discriminant Analysis (LDA).
Practice A sequential Conic Programming Problem.

4
Background Linear Binary Classification
The Chinese University of Hong Kong
Given two classes of data sampled from x and y,
we are trying to find a linear decision plane wT
z b0, which can correctly discriminate x from
y. wT z blt 0, z is classified as y wT z b
gt0, z is classified as x.
wT z b0 decision hyperplane
y
Only partial information is available, we need to
choose a criterion to select hyperplanes
x
5
Background Support Vector Machine
The Chinese University of Hong Kong
Support Vector Machines (SVM) The optimal
hyperplane is the one which maximizes the margin
between two classes of data
wT z b0
The boundary of SVM is exclusively determined by
several critical points called support vectors
y
All other points are totally irrelevant with the
decision plane SVM discards global information
x
Margin
6
Learning Locally and Globally
The Chinese University of Hong Kong
Along the dashed axis, y data have a larger data
trend than x data. Therefore, a more reasonable
hyerplane may lie closer than x data rather than
locating itself in the middle of two classes as
in SVM.
wT z b0
SVM
y
x
7
M4 Learning Locally and Globally
The Chinese University of Hong Kong
8
The Chinese University of Hong Kong
M4 Geometric Interpretation
9
The Chinese University of Hong Kong
M4 Solving Method
Divide and Conquer If we fix ? to a specific ?n
, the problem changes to check whether this ?n
satisfies the following constraints If yes,
we increase ?n otherwise, we decrease it.
Second Order Cone Programming Problem!!!
10
The Chinese University of Hong Kong
M4 Solving Method (Cont)
Iterate the following two Divide and Conquer
steps
Sequential Second Order Cone Programming
Problem!!!
11
The Chinese University of Hong Kong
M4 Solving Method (Cont)
12
The Chinese University of Hong Kong
M4 Links with MPM

Exactly MPM Optimization Problem!!!
13
M4 Links with MPM (Cont)

Remarks
The procedure is not reversible MPM is a special
case of M4
MPM focuses on building decision boundary
GLOBALLY, i.e., it exclusively depends on the
means and covariances.
However, means and covariances may not be
accurately estimated.

MPM
14
The Chinese University of Hong Kong
M4 Links with SVM
1
4
If one assumes ?I
2
Support Vector Machines!!!
The magnitude of w can scale up without
influencing the optimization
3
SVM is the special case of M4
15
The Chinese University of Hong Kong
M4 Links with SVM (Cont)
Assumption 1
Assumption 2
If one assumes ?I
These two assumptions of SVM are
inappropriate
16
The Chinese University of Hong Kong
M4 Links with LDA
If one assumes ?x?y(?y?x)/2
LDA
Perform a procedure similar to MPM
17
The Chinese University of Hong Kong
M4 Links with LDA (Cont)
If one assumes ?x?y(?y?x)/2
Assumption
?
Still inappropriate
18
The Chinese University of Hong Kong
Nonseparable Case
Introducing slack variables
19
The Chinese University of Hong Kong
Nonlinear Classifier Kernelization

Map data to higher dimensional feature space Rf
xi??(xi)
yi??(xi)
Construct the linear decision plane f(? ,b)?T z
b in the feature space Rf, with ? ? Rf, b ? R
In Rf, we need to solve
However, we do not want to solve this in an
explicit form of ?. Instead, we want to solve it
in a kernelization form
K(z1,z2) ?(z1)T?(z2)

20
The Chinese University of Hong Kong
Nonlinear Classifier Kernelization
21
The Chinese University of Hong Kong
Nonlinear Classifier Kernelization
Notation
22
The Chinese University of Hong Kong
Experimental Results
Toy Example Two Gaussian Data with different
data trends
23
The Chinese University of Hong Kong
Experimental Results
Data sets UCI Machine Learning
Repository Procedures 10-fold Cross
validation Solving Package SVM Libsvm 2.4, M4
Sedumi 1.05 MPM MPM 1.0
In linear cases, M4 outperforms SVM and MPM In
Gaussian cases, M4 is slightly better or
comparable than SVM (1). Sparsity in the feature
space results in inaccurate estimation of
covariance matrices (2) Kernelization may not
keep data topology of the original
data.Maximizing Margin in the feature space does
not necessarily maximize margin in the original
space
24
The Chinese University of Hong Kong
Experimental Results
An example to illustrate that maximizing Margin
in the feature space does not necessarily
maximize margin in the original space
25
The Chinese University of Hong Kong
Future Work

Speeding up M4
Contain support vectorscan we employ its
sparsity as has been done in SVM?
Can we reduce redundant points??
How to impose constrains on the kernelization for
keeping the topology of data?
Generalization error bound?
SVM and MPM have both error bounds.
How to extend to multi-category classifications?

26
The Chinese University of Hong Kong
Conclusion

Proposed a new large margin classifier M4 which
learns the decision boundary both locally and
globally
Built theoretical connections with other models
A unified model of SVM, MPM and LDA
Developed sequential Second Order Cone
Programming algorithm for M4
Experimental results demonstrated the advantages
of our new model

27
The Chinese University of Hong Kong
Thanks!

Write a Comment

User Comments (0)