Title: Learning Larger Margin Machine Locally and Globally
1Learning Larger Margin Machine Locally and
Globally
The Chinese University of Hong Kong
- Kaizhu Huang (kzhuang_at_cse.cuhk.edu.hk)
- Haiqin Yang, Irwin King, Michael R. Lyu
- Dept. of Computer Science and Engineering
- The Chinese University of Hong Kong
- July 5, 2004
2Learning Larger Margin Machine Locally and
Globally
The Chinese University of Hong Kong
- Contributions
- Background
- Linear Binary Classification
- Motivation
- Maxi-Min Margin Machine(M4)
- Model Definition
- Geometrical Interpretation
- Solving Methods
- Connections With Other Models
- Nonseparable case
- Kernelizations
- Experimental Results
- Future Work
- Conclusion
3The Chinese University of Hong Kong
Contributions
- Theory A unified model of Support Vector
Machine (SVM), Minimax Probability Machine
(MPM), and Linear Discriminant Analysis (LDA). - Practice A sequential Conic Programming Problem.
4Background Linear Binary Classification
The Chinese University of Hong Kong
Given two classes of data sampled from x and y,
we are trying to find a linear decision plane wT
z b0, which can correctly discriminate x from
y. wT z blt 0, z is classified as y wT z b
gt0, z is classified as x.
wT z b0 decision hyperplane
y
Only partial information is available, we need to
choose a criterion to select hyperplanes
x
5Background Support Vector Machine
The Chinese University of Hong Kong
Support Vector Machines (SVM) The optimal
hyperplane is the one which maximizes the margin
between two classes of data
wT z b0
The boundary of SVM is exclusively determined by
several critical points called support vectors
y
All other points are totally irrelevant with the
decision plane SVM discards global information
x
Margin
6Learning Locally and Globally
The Chinese University of Hong Kong
Along the dashed axis, y data have a larger data
trend than x data. Therefore, a more reasonable
hyerplane may lie closer than x data rather than
locating itself in the middle of two classes as
in SVM.
wT z b0
SVM
y
x
7M4 Learning Locally and Globally
The Chinese University of Hong Kong
8The Chinese University of Hong Kong
M4 Geometric Interpretation
9The Chinese University of Hong Kong
M4 Solving Method
Divide and Conquer If we fix ? to a specific ?n
, the problem changes to check whether this ?n
satisfies the following constraints If yes,
we increase ?n otherwise, we decrease it.
Second Order Cone Programming Problem!!!
10The Chinese University of Hong Kong
M4 Solving Method (Cont)
Iterate the following two Divide and Conquer
steps
Sequential Second Order Cone Programming
Problem!!!
11The Chinese University of Hong Kong
M4 Solving Method (Cont)
12The Chinese University of Hong Kong
M4 Links with MPM
Exactly MPM Optimization Problem!!!
13M4 Links with MPM (Cont)
- Remarks
- The procedure is not reversible MPM is a special
case of M4 - MPM focuses on building decision boundary
GLOBALLY, i.e., it exclusively depends on the
means and covariances. - However, means and covariances may not be
accurately estimated.
MPM
14The Chinese University of Hong Kong
M4 Links with SVM
1
4
If one assumes ?I
2
Support Vector Machines!!!
The magnitude of w can scale up without
influencing the optimization
3
SVM is the special case of M4
15The Chinese University of Hong Kong
M4 Links with SVM (Cont)
Assumption 1
Assumption 2
If one assumes ?I
These two assumptions of SVM are
inappropriate
16The Chinese University of Hong Kong
M4 Links with LDA
If one assumes ?x?y(?y?x)/2
LDA
Perform a procedure similar to MPM
17The Chinese University of Hong Kong
M4 Links with LDA (Cont)
If one assumes ?x?y(?y?x)/2
Assumption
?
Still inappropriate
18The Chinese University of Hong Kong
Nonseparable Case
Introducing slack variables
19The Chinese University of Hong Kong
Nonlinear Classifier Kernelization
- Map data to higher dimensional feature space Rf
- xi??(xi)
- yi??(xi)
- Construct the linear decision plane f(? ,b)?T z
b in the feature space Rf, with ? ? Rf, b ? R - In Rf, we need to solve
- However, we do not want to solve this in an
explicit form of ?. Instead, we want to solve it
in a kernelization form - K(z1,z2) ?(z1)T?(z2)
20The Chinese University of Hong Kong
Nonlinear Classifier Kernelization
21The Chinese University of Hong Kong
Nonlinear Classifier Kernelization
Notation
22The Chinese University of Hong Kong
Experimental Results
Toy Example Two Gaussian Data with different
data trends
23The Chinese University of Hong Kong
Experimental Results
Data sets UCI Machine Learning
Repository Procedures 10-fold Cross
validation Solving Package SVM Libsvm 2.4, M4
Sedumi 1.05 MPM MPM 1.0
In linear cases, M4 outperforms SVM and MPM In
Gaussian cases, M4 is slightly better or
comparable than SVM (1). Sparsity in the feature
space results in inaccurate estimation of
covariance matrices (2) Kernelization may not
keep data topology of the original
data.Maximizing Margin in the feature space does
not necessarily maximize margin in the original
space
24The Chinese University of Hong Kong
Experimental Results
An example to illustrate that maximizing Margin
in the feature space does not necessarily
maximize margin in the original space
25The Chinese University of Hong Kong
Future Work
- Speeding up M4
- Contain support vectorscan we employ its
sparsity as has been done in SVM? - Can we reduce redundant points??
- How to impose constrains on the kernelization for
keeping the topology of data? - Generalization error bound?
- SVM and MPM have both error bounds.
- How to extend to multi-category classifications?
26The Chinese University of Hong Kong
Conclusion
- Proposed a new large margin classifier M4 which
learns the decision boundary both locally and
globally - Built theoretical connections with other models
A unified model of SVM, MPM and LDA - Developed sequential Second Order Cone
Programming algorithm for M4 - Experimental results demonstrated the advantages
of our new model
27The Chinese University of Hong Kong
Thanks!