Title: Semidefinite Programming Machines
1Semidefinite Programming Machines
- Thore Graepel and Ralf Herbrich
- Microsoft Research Cambridge
2Overview
- Invariant Pattern Recognition
- Semidefinite Programming (SDP)
- From Support Vector Machines (SVMs) to
Semidefinite Programming Machines (SDPMs) - Experimental Illustration
- Future Work
3Typical Invariances for Images
Translation
Shear
Rotation
4Typical Invariances for Images
Translation
Shear
Rotation
5Toy Features for Handwritten Digits
?1 0.48
?20.58
?30.37
6Warning Highly Non-Linear
Á2
Á1
7Warning Highly Non-Linear
8Motivation Classification Learning
0.65
0.6
0.55
Can we learn with infinitely many examples?
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
9Motivation Classification Learning
0.65
0.6
0.55
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
10Motivation Version Spaces
Original patterns
Transformed patterns
11Semidefinite Programs (SDPs)
- Linear objective function
- Positive semidefinite (psd) constraints
- Infinitely many linear constraints
12SVM as a Quadratic Program
- Given A sample ((x1,y1),,(xm,ym)).
- SVMs find the weight vector w that maximises the
margin on the sample
13SVM as a Semidefinite Program (I)
- A (block)-diagonal matrix is psd if and only
- if all its blocks are psd.
Aj
B
14SVM as a Semidefinite Program (I)
- A (block)-diagonal matrix is psd if and only
- if all its blocks are psd.
Aj
B
15SVM as a Semidefinite Program (II)
- Transform quadratic into linear objective
- Use Schurs complement lemma
- Adds new (n1)(n1) block to Aj and B
16Taylor Approximation of Invariance
- Let T (x,µ) be an invariance transformation with
parameter µ (e.g., angle of rotation). - Taylor Expansion about ?00 gives
- Polynomial approximation to trajectory.
17Extension to Polynomials
- Consider polynomial trajectory x(µ)
- Infinite number of constraints from training
example (x(0),, x(r),y)
18Non-Negative Polynomials (I)
- Theorem (Nesterov,2000) If r2l then
- For every psd matrix P the polynomial p(µ)µTP
µ is non-negative everywhere. - For every non-negative polynomial p there exists
a psd matrix P such that p(µ)µTPµ. - Example
19Non-Negative Polynomials (II)
- (1) follows directly from psd definition
- (2) follows from sum-of-squares lemma.
- Note that (2) states the mere existence
- Polynomial of degree r r1 parameters
- Coefficient matrix P(r2) (r4)/8 parameters
- For r gt2, we have to introduce another
r(r-2)/8 auxiliary variables to find P.
20Semidefinite Programming Machines
- Extension of SVMs as (non-trivial) SDP.
g1,j
1
G1,j
1
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
21Semidefinite Programming Machines
- Extension of SVMs as (non-trivial) SDP.
g1,j
G1,j
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
22Example Second-Order SDPMs
- 2nd order Taylor expansion
- Resulting polynomial in µ
- Set of constraint matrices
23Example Second-Order SDPMs
- 2nd order Taylor expansion
- Resulting polynomial in µ
- Set of constraint matrices
24Non-Negative on Segment
- Given a polynomial p of degree 2l, consider the
polynomial
- Note that q is a polynomial of degree 4l.
- If q is positive everywhere, then p is positive
everywhere in -,.
25Non-Negative on Segment
26Truly Virtual Support Vectors
- Dual complementarity yields expansion
- The truly virtual support vectors are linear
combinations of derivatives
27Truly Virtual Support Vectors
0.22
0.2
1
0.18
0.16
0.14
0.12
9
0.1
0.08
0.06
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
28Visualisation USPS 1 vs. 9
20º
29Results Experimental Setup
- All 45 USPS classification tasks (1-v-1).
- 20 training images 250 test images.
- Rotation is applied to all training images with
10º. - All results are averaged over 50 random training
sets. - Compared to SVM and virtual SVM.
30Results SDPM vs. SVM
31Results SDPM vs. Virtual SVM
32Results Curse of Dimensionality
33Results Curse of Dimensionality
2 parameters
1 parameter
34Extensions Future Work
- Multiple parameters µ1, µ2,..., µD.
- (Efficient) adaptation to kernel space.
- Semidefinite Perceptrons (NIPS poster with A.
Kharechko and J. Shawe-Taylor). - Sparsification by efficiently finding the example
x and transformation µ with maximal information
(idea of Neil Lawrence). - Expectation propagation for BPMs (idea of Tom
Minka).
35Conclusions Future Work
- Learning from infinitely many examples.
- Truly virtual support vectors xi(µi).
- Multiple parameters µ1, µ2,..., µD.
- (Efficient) adaptation to kernel space.
- Semidefinite Perceptrons (NIPS poster with A.
Kharechko and J. Shawe-Taylor).