Online Learning of Maximum Margin Classifiers - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Online Learning of Maximum Margin Classifiers

Description:

Online Learning of. Maximum Margin Classifiers. Kohei ... For p=O(ln n) [Gentile '03], similar to Winnow [Littlestone 89]. Fast when the target is sparse. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 24
Provided by: iKyus
Category:

less

Transcript and Presenter's Notes

Title: Online Learning of Maximum Margin Classifiers


1
Online Learning of Maximum Margin
Classifiers
p-Norm
with Bias
  • Kohei HATANO
  • Kyusyu University
  • (Joint work with K. Ishibashi and M. Takeda)

COLT 2008
2
Plan of this talk
  • Introduction
  • Preliminalies
  • ROMMA
  • Our result
  • Our new algorithm
  • Our implicit reduction
  • Experiments

PUMMA
3
Maximum Margin Classification
  • SVMs Boser et al. 92
  • 2-norm margin
  • Boosting FreundSchapire 97
  • 8-norm margin (approximtely)
  • Why maximum (or large) margin?
  • Good generalization
  • Schapire et al. 98
  • Shawe-Taylor et al. 98
  • Formulated as convex optimization problems(QP, LP)

margin
4
Scaling up Max. Margin Classification
  • Decomposition Methods (for SVMs)
  • Break original QP into smaller QPs
  • SMO Platt 99,SVMlight Joachims 99,
  • LIBSVM Chang Lin 01
  • state-of-the-art implementations
  • Online Learning (our approach)

5
Online Learning
  • Online Learning Algorithm
  • For t1 to T
  • Receive an instance xt in Rn
  • Guess a label ytsign(wt xtbt)
  • Receive the label yt in -1,1
  • Update (wt1,bt1)UPDATE_RULE(wt,bt,xt,yt)
  • end

-1
xt
1?
  • Advantages of online Learning
  • Simple easy to implement
  • Uses less memory
  • Adaptive for changing concepts

6
Online Learning Algorithms for maximum margin
classification
  • Max Margin Perceptron Kowalzyk 00
  • ROMMA Li Long 02
  • ALMA Gentile 01
  • LASVM Bordes et al. 05
  • MICRA TsampoukaShawe-Taylor 07
  • Pegasos Shalev-Shwalz et al. 07
  • Etc.

hyperplane with bias
0
hyperplane w/o bias
7
Typical Reduction to deal with bias Cf.
Cristianini Shawe-Taylor 00
  • Adding an extra dimension corresponding bias.

Augmented space
Original space
?
instance
hyperplane
?
NOTE
margin (over normalized Instances)
?
8
Our New Online Learning Algorithm
  • PUMMA(P-norm Utilizing Maximum Margin Algorithm)
  • PUMMA can learn maximum margin classifiers
  • with bias directly (without using the typical
    reduction!).
  • Margin is defined as p-norm (p2)
  • For p2, similar to Perceptron.
  • For pO(ln n) Gentile 03, similar to Winnow
    Littlestone 89.
  • Fast when the target is sparse.
  • Extended to linearly inseparable case (omitted).
  • Soft margin with2-norm slack variables.

9
Problem of finding the p-norm maximum margin
hyperplane Cf. Mangasarian 99
Given (linearly separable) S((x1,y1),,(xT,yT))
,
Goal Find an approximate solution of (w,b)
q-norm (dual norm) 1/p1/q1 E.g. p2, q2 p8,
q1
We want an online alg. solving the problem with
small of updates.

0
10
ROMMA(Relaxed Online Maximum Margin
Algorithm)LiLong,02
  • Given S((x1,y1),,(xt-1,yt-1)), xt,
  • Predict ytsign(wtxt), and receive yt
  • If yt(wt xt )lt1-d (margin is insufficient),
  • update
  • Otherwise, wt1wt

Constraint over the last example which causes an
update
Constraint over the last hyperplane
NOTE bias is fixed with 0
11
ROMMA LiLong,02
feasible region of SVM
weght space
4
3
0
1
2
12
Solution of ROMMA
Solution of ROMMA is an additive update

13
PUMMA
  • Given S((x1,y1),,(xt-1,yt-1)), xt,
  • Predict ytsign(wtxt), and receive yt
  • If yt(wt xt bt)gt1-d, update
  • Otherwise, wt1wt

q-norm (1/p1/q1)
xpost, xnegt last positive and negative
examples which incur updates
bias is optimized
link function Grove et al. 97
14
Solution of PUMMA
Solution of PUMMA is found numerically

xpost, xnegt last positive and negative
examples which incur updates
  • Observation
  • For p2, the solution is the same as that of
    ROMMA for zt xtpos xtneg.

15
Our (implicit) reduction which preserves the
margin
margin
margin
hyperplane without bias over pairs of positive
and negative instances
hyperplane with bias
  • For p2,

16
Main Result
  • Thm
  • Suppose that given S((x1,y1),,(xT,yT)),
  • there exists a linear classifier (u,b) , s.t.
    yt(uxb)1 for t1,,T.
  • ( of updates of PUMMAp(d)) (p-1)?u?q2R2/ d2
  • After (p-1)?u ?q2R2/ d2 updates,
  • PUMMAp(d) outputs a hypothesis with p-norm
    margin
  • (1-d)? (? margin of (u,b) ).

similar to those of previous algorithms
17
Experiment over artificial data
  • example (x,y)
  • - x n(100)-dimensional -1,1-valued vector
  • - yf(x),where
  • generate 1000 examples randomly
  • 3 datasets (b1 (small), 9(medium), 15(large))
  • Compare ROMMA(p2), ALMA(p2ln n).

18
Results over Artificial Data
p2ln n
p2
ROMMA
PUMMA
ALMA
margin
PUMMA
margin
of updates
of updates
of updates
  • NOTE1 margin is defined over the original space
    (w/o reduction)
  • NOTE2 We omit the results for b9 for clarity .

19
Computation Time
p2
p2ln n
  • time

ALMA
ROMMA
Sec.
Sec.
PUMMA
PUMMA
large? bias ?small
large? bias ?small
For p2,PUMMA is faster than ROMMA. For p2ln
n,PUMMA is faster than ALMA even though PUMMA
uses Newton method.
20
Results over UCI Adult data
  • result

adult adult
of data 32561 32561
algorithm sec. magin rate
SVMlight 5893 100
ROMMA (99) 71296 99.03
PUMMA (99) 44480 99.14
  • Fix p2.
  • 2-norm soft margin formulation for linearly
    inseparable data.
  • Run ROMMA and PUMMA until they achieves 99 of
    the maximum margin.

21
Results over MNIST data
MNIST MNIST
of data
algorithm sec. margin rate()
SVMlight 401.36 100
ROMMA (99) 1715.57 93.5
PUMMA (99) 1971.30 99.2
  • Fix p2.
  • Use polynomial kernels.
  • 2-norm soft margin formulation for linearly
    inseparable data.
  • Run ROMMA and PUMMA until they achieves 99 of
    the maximum margin.
  • MNIST

22
Summary
  • PUMMA can learn p-norm maximum margin classifiers
    with bias directly.
  • of updates is similar to those of previous
    algs.
  • achieves (1-d) times the maximum p-norm margin.
  • PUMMA outperforms other online algs
  • when the underlying hyperplane has large bias.

23
Future work
  • Maximizing 8-norm margin directly.
  • Tighter bounds of of updates
  • In our experiments, PUMMA is faster especially
    when bias is large (like WINNOW).
  • Our current bound does not reflect this fact.
Write a Comment
User Comments (0)
About PowerShow.com