Semidefinite Programming Machines - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Semidefinite Programming Machines

Description:

From Support Vector Machines (SVMs) to Semidefinite Programming Machines (SDPMs) ... Toy Features for Handwritten Digits 1 =0.48 3=0.37 2=0.58. Microsoft Research Ltd. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 36
Provided by: ral121
Category:

less

Transcript and Presenter's Notes

Title: Semidefinite Programming Machines


1
Semidefinite Programming Machines
  • Thore Graepel and Ralf Herbrich
  • Microsoft Research Cambridge

2
Overview
  • Invariant Pattern Recognition
  • Semidefinite Programming (SDP)
  • From Support Vector Machines (SVMs) to
    Semidefinite Programming Machines (SDPMs)
  • Experimental Illustration
  • Future Work

3
Typical Invariances for Images
Translation
Shear
Rotation
4
Typical Invariances for Images
Translation
Shear
Rotation
5
Toy Features for Handwritten Digits
?1 0.48
?20.58
?30.37
6
Warning Highly Non-Linear
Á2
Á1
7
Warning Highly Non-Linear
8
Motivation Classification Learning
0.65
0.6
0.55
Can we learn with infinitely many examples?
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
9
Motivation Classification Learning
0.65
0.6
0.55
0.5
)
0.45
x
(
2
f
0.4
0.35
0.3
0.25
0.2
0.1
0.2
0.3
0.4
0.5
f
x
(
)
1
10
Motivation Version Spaces
Original patterns
Transformed patterns
11
Semidefinite Programs (SDPs)
  • Linear objective function
  • Positive semidefinite (psd) constraints
  • Infinitely many linear constraints

12
SVM as a Quadratic Program
  • Given A sample ((x1,y1),,(xm,ym)).
  • SVMs find the weight vector w that maximises the
    margin on the sample

13
SVM as a Semidefinite Program (I)
  • A (block)-diagonal matrix is psd if and only
  • if all its blocks are psd.

Aj
B
14
SVM as a Semidefinite Program (I)
  • A (block)-diagonal matrix is psd if and only
  • if all its blocks are psd.

Aj
B
15
SVM as a Semidefinite Program (II)
  • Transform quadratic into linear objective
  • Use Schurs complement lemma
  • Adds new (n1)(n1) block to Aj and B

16
Taylor Approximation of Invariance
  • Let T (x,µ) be an invariance transformation with
    parameter µ (e.g., angle of rotation).
  • Taylor Expansion about ?00 gives
  • Polynomial approximation to trajectory.

17
Extension to Polynomials
  • Consider polynomial trajectory x(µ)
  • Infinite number of constraints from training
    example (x(0),, x(r),y)

18
Non-Negative Polynomials (I)
  • Theorem (Nesterov,2000) If r2l then
  • For every psd matrix P the polynomial p(µ)µTP
    µ is non-negative everywhere.
  • For every non-negative polynomial p there exists
    a psd matrix P such that p(µ)µTPµ.
  • Example

19
Non-Negative Polynomials (II)
  • (1) follows directly from psd definition
  • (2) follows from sum-of-squares lemma.
  • Note that (2) states the mere existence
  • Polynomial of degree r r1 parameters
  • Coefficient matrix P(r2) (r4)/8 parameters
  • For r gt2, we have to introduce another
    r(r-2)/8 auxiliary variables to find P.

20
Semidefinite Programming Machines
  • Extension of SVMs as (non-trivial) SDP.

g1,j
1
G1,j
1
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
21
Semidefinite Programming Machines
  • Extension of SVMs as (non-trivial) SDP.

g1,j
G1,j
Gi,j
1
Aj
B
gi,j
1
1
Gm,j
gm,j
1
22
Example Second-Order SDPMs
  • 2nd order Taylor expansion
  • Resulting polynomial in µ
  • Set of constraint matrices

23
Example Second-Order SDPMs
  • 2nd order Taylor expansion
  • Resulting polynomial in µ
  • Set of constraint matrices

24
Non-Negative on Segment
  • Given a polynomial p of degree 2l, consider the
    polynomial
  • Note that q is a polynomial of degree 4l.
  • If q is positive everywhere, then p is positive
    everywhere in -,.

25
Non-Negative on Segment
26
Truly Virtual Support Vectors
  • Dual complementarity yields expansion
  • The truly virtual support vectors are linear
    combinations of derivatives

27
Truly Virtual Support Vectors
0.22
0.2
1
0.18
0.16
0.14
0.12
9
0.1
0.08
0.06
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
28
Visualisation USPS 1 vs. 9
20º
29
Results Experimental Setup
  • All 45 USPS classification tasks (1-v-1).
  • 20 training images 250 test images.
  • Rotation is applied to all training images with
    10º.
  • All results are averaged over 50 random training
    sets.
  • Compared to SVM and virtual SVM.

30
Results SDPM vs. SVM
31
Results SDPM vs. Virtual SVM
32
Results Curse of Dimensionality
33
Results Curse of Dimensionality
2 parameters
1 parameter
34
Extensions Future Work
  • Multiple parameters µ1, µ2,..., µD.
  • (Efficient) adaptation to kernel space.
  • Semidefinite Perceptrons (NIPS poster with A.
    Kharechko and J. Shawe-Taylor).
  • Sparsification by efficiently finding the example
    x and transformation µ with maximal information
    (idea of Neil Lawrence).
  • Expectation propagation for BPMs (idea of Tom
    Minka).

35
Conclusions Future Work
  • Learning from infinitely many examples.
  • Truly virtual support vectors xi(µi).
  • Multiple parameters µ1, µ2,..., µD.
  • (Efficient) adaptation to kernel space.
  • Semidefinite Perceptrons (NIPS poster with A.
    Kharechko and J. Shawe-Taylor).
Write a Comment
User Comments (0)
About PowerShow.com