Semidefinite Programming Machines

About This Presentation

Title:

Semidefinite Programming Machines

Description:

From Support Vector Machines (SVMs) to Semidefinite Programming Machines (SDPMs) ... Toy Features for Handwritten Digits 1 =0.48 3=0.37 2=0.58. Microsoft Research Ltd. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 36

Provided by: ral121

Category:

more less

Transcript and Presenter's Notes

Title: Semidefinite Programming Machines

1
Semidefinite Programming Machines

Thore Graepel and Ralf Herbrich
Microsoft Research Cambridge

2
Overview

Invariant Pattern Recognition
Semidefinite Programming (SDP)
From Support Vector Machines (SVMs) to
Semidefinite Programming Machines (SDPMs)
Experimental Illustration
Future Work

3
Typical Invariances for Images
Translation
Shear
Rotation
4
Typical Invariances for Images
Translation
Shear
Rotation
5
Toy Features for Handwritten Digits
?1 0.48
?20.58
?30.37
6
Warning Highly Non-Linear
Á2
Á1
7
Warning Highly Non-Linear
8
Motivation Classification Learning
0.65
0.6
0.55
Can we learn with infinitely many examples?
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
9
Motivation Classification Learning
0.65
0.6
0.55
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
10
Motivation Version Spaces
Original patterns
Transformed patterns
11
Semidefinite Programs (SDPs)

Linear objective function
Positive semidefinite (psd) constraints

Infinitely many linear constraints

12
SVM as a Quadratic Program

Given A sample ((x1,y1),,(xm,ym)).
SVMs find the weight vector w that maximises the
margin on the sample

13
SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and only
if all its blocks are psd.

Aj
B
14
SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and only
if all its blocks are psd.

Aj
B
15
SVM as a Semidefinite Program (II)

Transform quadratic into linear objective

Use Schurs complement lemma

Adds new (n1)(n1) block to Aj and B

16
Taylor Approximation of Invariance

Let T (x,µ) be an invariance transformation with
parameter µ (e.g., angle of rotation).
Taylor Expansion about ?00 gives

Polynomial approximation to trajectory.

17
Extension to Polynomials

Consider polynomial trajectory x(µ)

Infinite number of constraints from training
example (x(0),, x(r),y)

18
Non-Negative Polynomials (I)

Theorem (Nesterov,2000) If r2l then
For every psd matrix P the polynomial p(µ)µTP
µ is non-negative everywhere.
For every non-negative polynomial p there exists
a psd matrix P such that p(µ)µTPµ.
Example

19
Non-Negative Polynomials (II)

(1) follows directly from psd definition
(2) follows from sum-of-squares lemma.
Note that (2) states the mere existence
Polynomial of degree r r1 parameters
Coefficient matrix P(r2) (r4)/8 parameters
For r gt2, we have to introduce another
r(r-2)/8 auxiliary variables to find P.

20
Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

g1,j
1
G1,j
1
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
21
Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

g1,j
G1,j
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
22
Example Second-Order SDPMs

2nd order Taylor expansion

Resulting polynomial in µ

Set of constraint matrices

23
Example Second-Order SDPMs

2nd order Taylor expansion

Resulting polynomial in µ

Set of constraint matrices

24
Non-Negative on Segment

Given a polynomial p of degree 2l, consider the
polynomial

Note that q is a polynomial of degree 4l.
If q is positive everywhere, then p is positive
everywhere in -,.

25
Non-Negative on Segment
26
Truly Virtual Support Vectors

Dual complementarity yields expansion

The truly virtual support vectors are linear
combinations of derivatives

27
Truly Virtual Support Vectors
0.22
0.2
1
0.18
0.16
0.14
0.12
9
0.1
0.08
0.06
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
28
Visualisation USPS 1 vs. 9
20º
29
Results Experimental Setup

All 45 USPS classification tasks (1-v-1).
20 training images 250 test images.
Rotation is applied to all training images with
10º.
All results are averaged over 50 random training
sets.
Compared to SVM and virtual SVM.

30
Results SDPM vs. SVM
31
Results SDPM vs. Virtual SVM
32
Results Curse of Dimensionality
33
Results Curse of Dimensionality
2 parameters
1 parameter
34
Extensions Future Work

Multiple parameters µ1, µ2,..., µD.
(Efficient) adaptation to kernel space.
Semidefinite Perceptrons (NIPS poster with A.
Kharechko and J. Shawe-Taylor).
Sparsification by efficiently finding the example
x and transformation µ with maximal information
(idea of Neil Lawrence).
Expectation propagation for BPMs (idea of Tom
Minka).

35
Conclusions Future Work