Online Learning of Maximum Margin Classifiers - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Online Learning of Maximum Margin Classifiers

Description:

Online Learning of. Maximum Margin Classifiers. Kohei ... For p=O(ln n) [Gentile '03], similar to Winnow [Littlestone 89]. Fast when the target is sparse. ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 24

Provided by: iKyus

Category:

more less

Transcript and Presenter's Notes

Title: Online Learning of Maximum Margin Classifiers

1
Online Learning of Maximum Margin
Classifiers
p-Norm
with Bias

Kohei HATANO
Kyusyu University
(Joint work with K. Ishibashi and M. Takeda)

COLT 2008
2
Plan of this talk

Introduction
Preliminalies
ROMMA
Our result
Our new algorithm
Our implicit reduction
Experiments

PUMMA
3
Maximum Margin Classification

SVMs Boser et al. 92
2-norm margin
Boosting FreundSchapire 97
8-norm margin (approximtely)
Why maximum (or large) margin?
Good generalization
Schapire et al. 98
Shawe-Taylor et al. 98
Formulated as convex optimization problems(QP, LP)

margin
4
Scaling up Max. Margin Classification

Decomposition Methods (for SVMs)
Break original QP into smaller QPs
SMO Platt 99,SVMlight Joachims 99,
LIBSVM Chang Lin 01
state-of-the-art implementations
Online Learning (our approach)

5
Online Learning

Online Learning Algorithm
For t1 to T
Receive an instance xt in Rn
Guess a label ytsign(wt xtbt)
Receive the label yt in -1,1
Update (wt1,bt1)UPDATE_RULE(wt,bt,xt,yt)
end

-1
xt
1?

Advantages of online Learning
Simple easy to implement
Uses less memory
Adaptive for changing concepts

6
Online Learning Algorithms for maximum margin
classification

Max Margin Perceptron Kowalzyk 00
ROMMA Li Long 02
ALMA Gentile 01
LASVM Bordes et al. 05
MICRA TsampoukaShawe-Taylor 07
Pegasos Shalev-Shwalz et al. 07
Etc.

hyperplane with bias
0
hyperplane w/o bias
7
Typical Reduction to deal with bias Cf.
Cristianini Shawe-Taylor 00

Adding an extra dimension corresponding bias.

Augmented space
Original space
?
instance
hyperplane
?
NOTE
margin (over normalized Instances)
?
8
Our New Online Learning Algorithm

PUMMA(P-norm Utilizing Maximum Margin Algorithm)
PUMMA can learn maximum margin classifiers
with bias directly (without using the typical
reduction!).
Margin is defined as p-norm (p2)
For p2, similar to Perceptron.
For pO(ln n) Gentile 03, similar to Winnow
Littlestone 89.
Fast when the target is sparse.
Extended to linearly inseparable case (omitted).
Soft margin with2-norm slack variables.

9
Problem of finding the p-norm maximum margin
hyperplane Cf. Mangasarian 99
Given (linearly separable) S((x1,y1),,(xT,yT))
,
Goal Find an approximate solution of (w,b)
q-norm (dual norm) 1/p1/q1 E.g. p2, q2 p8,
q1
We want an online alg. solving the problem with
small of updates.

0
10
ROMMA(Relaxed Online Maximum Margin
Algorithm)LiLong,02

Given S((x1,y1),,(xt-1,yt-1)), xt,
Predict ytsign(wtxt), and receive yt
If yt(wt xt )lt1-d (margin is insufficient),
update
Otherwise, wt1wt

Constraint over the last example which causes an
update
Constraint over the last hyperplane
NOTE bias is fixed with 0
11
ROMMA LiLong,02
feasible region of SVM
weght space
4
3
0
1
2
12
Solution of ROMMA
Solution of ROMMA is an additive update

13
PUMMA

Given S((x1,y1),,(xt-1,yt-1)), xt,
Predict ytsign(wtxt), and receive yt
If yt(wt xt bt)gt1-d, update
Otherwise, wt1wt

q-norm (1/p1/q1)
xpost, xnegt last positive and negative
examples which incur updates
bias is optimized
link function Grove et al. 97
14
Solution of PUMMA
Solution of PUMMA is found numerically

xpost, xnegt last positive and negative
examples which incur updates

Observation
For p2, the solution is the same as that of
ROMMA for zt xtpos xtneg.

15
Our (implicit) reduction which preserves the
margin
margin
margin
hyperplane without bias over pairs of positive
and negative instances
hyperplane with bias

For p2,

16
Main Result

Thm
Suppose that given S((x1,y1),,(xT,yT)),
there exists a linear classifier (u,b) , s.t.
yt(uxb)1 for t1,,T.
( of updates of PUMMAp(d)) (p-1)?u?q2R2/ d2
After (p-1)?u ?q2R2/ d2 updates,
PUMMAp(d) outputs a hypothesis with p-norm
margin
(1-d)? (? margin of (u,b) ).

similar to those of previous algorithms
17
Experiment over artificial data

example (x,y)
- x n(100)-dimensional -1,1-valued vector
- yf(x),where
generate 1000 examples randomly
3 datasets (b1 (small), 9(medium), 15(large))
Compare ROMMA(p2), ALMA(p2ln n).

18
Results over Artificial Data
p2ln n
p2
ROMMA
PUMMA
ALMA
margin
PUMMA
margin
of updates
of updates
of updates

NOTE1 margin is defined over the original space
(w/o reduction)
NOTE2 We omit the results for b9 for clarity .

19
Computation Time
p2
p2ln n

time

ALMA
ROMMA
Sec.
Sec.
PUMMA
PUMMA
large? bias ?small
large? bias ?small
For p2,PUMMA is faster than ROMMA. For p2ln
n,PUMMA is faster than ALMA even though PUMMA
uses Newton method.
20
Results over UCI Adult data

result

adult adult
of data 32561 32561
algorithm sec. magin rate
SVMlight 5893 100
ROMMA (99) 71296 99.03
PUMMA (99) 44480 99.14

Fix p2.
2-norm soft margin formulation for linearly
inseparable data.
Run ROMMA and PUMMA until they achieves 99 of
the maximum margin.

21
Results over MNIST data
MNIST MNIST
of data
algorithm sec. margin rate()
SVMlight 401.36 100
ROMMA (99) 1715.57 93.5
PUMMA (99) 1971.30 99.2

Fix p2.
Use polynomial kernels.
2-norm soft margin formulation for linearly
inseparable data.
Run ROMMA and PUMMA until they achieves 99 of
the maximum margin.

MNIST

22
Summary

PUMMA can learn p-norm maximum margin classifiers
with bias directly.
of updates is similar to those of previous
algs.
achieves (1-d) times the maximum p-norm margin.
PUMMA outperforms other online algs
when the underlying hyperplane has large bias.

23
Future work

Maximizing 8-norm margin directly.
Tighter bounds of of updates
In our experiments, PUMMA is faster especially
when bias is large (like WINNOW).
Our current bound does not reflect this fact.

Write a Comment

User Comments (0)