Title: PEGASOS Primal Efficient subGrAdient SOlver for SVM
1PEGASOS Primal Efficient sub-GrAdient SOlver for
SVM
YASSO Yet Another Svm SOlver
- Shai Shalev-Shwartz
- Yoram Singer
- Nati Srebro
The Hebrew University Jerusalem, Israel
2Support Vector Machines
QP form
More natural form
Regularization term
Empirical loss
3Outline
- Previous Work
- The Pegasos algorithm
- Analysis faster convergence rates
- Experiments outperforms state-of-the-art
- Extensions
- kernels
- complex prediction problems
- bias term
4Previous Work
- Dual-based methods
- Interior Point methods
- Memory m2, time m3 log(log(1/?))
- Decomposition methods
- Memory m, Time super-linear in m
- Online learning Stochastic Gradient
- Memory O(1), Time 1/?2 (linear kernel)
- Memory 1/?2, Time 1/?4 (non-linear kernel)
- Typically, online learning algorithms do not
converge to the optimal solution of SVM
Better rates for finite dimensional instances
(Murata, Bottou)
5PEGASOS
A_t S Subgradient method
A_t 1 Stochastic gradient
Subgradient
Projection
6Run-Time of Pegasos
- Choosing At1 and a linear kernel over Rn
- ? Run-time required for Pegasos to find ?
accurate solution w.p. 1-? - Run-time does not depend on examples
- Depends on difficulty of problem (? and ?)
7Formal Properties
- Definition w is ? accurate if
- Theorem 1 Pegasos finds ? accurate solution w.p.
1-? after at most iterations. - Theorem 2 Pegasos finds log(1/?) solutions s.t.
w.p. 1-?, at least one of them is ? accurate
after iterations
8Proof Sketch
A second look on the update step
9Proof Sketch
- Lemma (free projection)
- Logarithmic Regret for OCP (Hazan et al06)
- Take expectation
- f(wr)-f(w) 0 ? Markov gives that w.p. 1-?
- Amplify the confidence
10Experiments
- 3 datasets (provided by Joachims)
- Reuters CCAT (800K examples, 47k features)
- Physics ArXiv (62k examples, 100k features)
- Covertype (581k examples, 54 features)
- 4 competing algorithms
- SVM-light (Joachims)
- SVM-Perf (Joachims06)
- Norma (Kivinen, Smola, Williamson 02)
- Zhang04 (stochastic gradient descent)
- Source-Code available online
11Training Time (in seconds)
12Compare to Norma (on Physics)
obj. value test error
13Compare to Zhang (on Physics)
Objective
But, tuning the parameter is more expensive than
learning
14Effect of kAt when T is fixed
Objective
15Effect of kAt when kT is fixed
Objective
16I want my kernels !
- Pegasos can seamlessly be adapted to employ
non-linear kernels while working solely on the
primal objective function - No need to switch to the dual problem
- Number of support vectors is bounded by
17Complex Decision Problems
- Pegasos works whenever we know how to calculate
subgradients of loss func. l(w(x,y)) - Example Structured output prediction
- Subgradient is ?(x,y)-?(x,y) where y is the
maximizer in the definition of l
18bias term
- Popular approach increase dimension of xCons
pay for b in the regularization term - Calculate subgradients w.r.t. w and w.r.t
bCons convergence rate is 1/?2 - DefineCons At need to be large
- Search b in an outer loopCons evaluating
objective is 1/?2
19Discussion
- Pegasos Simple Efficient solver for SVM
- Sample vs. computational complexity
- Sample complexity How many examples do we need
as a function of VC-dim (?), accuracy (?), and
confidence (?) - In Pegasos, we aim at analyzing computational
complexity based on ?, ?, and ? (also in Bottou
Bousquet) - Finding argmin vs. calculating min It seems that
Pegasos finds the argmin more easily than it
requires to calculate the min value