Support Vector Machine (Chapter 5 - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machine (Chapter 5

Description:

Minimise. Subject to. This is a quadratic programming problem with linear inequality constraints. ... minimise objective function. subject to. inequality ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 33
Provided by: marti292
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machine (Chapter 5


1
Support Vector Machine (Chapter 5 6)
  • Maximum margin classifier ( Chapter 6)
    Optimisation Theory ( Chapter 5)
  • Soft Margin Hyperplane ( Chapter 6)
  • Support Vector Regression ( Chapter 6)

2
Simple Classification Problem Linear Separable
Case
  • Many decision boundaries can separate these two
    classes
  • Which one should we choose?

Class 2
Class 1
3
Separating Hyperplane
  • Linear separable data.
  • Canonical Hyperplane

wxbgt0
Class 2
wxblt0
Class 1
Class 2
wxb1
Class 1
wxb-1
wxb0
4
Margins
Support vectors
  • Functional margin the margin from the output of
    the function
  • Geometric margin

Class 2
Class 1
wxb1
wxb-1
wxb0
5
Importance of margin
Given a training point Suppose test points
Hyperplane correctly classify all test points when
6
Error bound
Maximal margin hyperplane error bounded by
Any distribution D on X -1,1 ,with probability
1-d over l random examples. d is the number of
support vectors.
7
Maximum margin Minimum norm
  • x and x- are the nearest positive and negative
    data
  • Computing the geometric margin (to be maximised)
  • And here are the constraints

8
Maximum margin Summing up
  • Given a linearly separable training set (xi,
    yi), i1,2,l yi?1,-1
  • Minimise
  • Subject to
  • This is a quadratic programming problem with
    linear inequality constraints.

9
Optimisation Theory
  • Primal optimisation problem
  • minimise
    objective function
  • subject to


  • inequality constraints

10
Convexity
11
Primal to Dual
  • Minimise
  • Subject to
  • difficult to be solved directly by primal
    Lagrangian with inequality constraints.
  • transform from primal to dual problem, which is
    obtained by introducing Lagrange Multipliers
  • Construct minimise Primal Lagrangian

Lagrange Multiplier
12
Primal to Dual (2)
  • Find minimum with respect to
    w and b by taking derivatives of them and equate
    them to 0
  • Plug them back into the Lagrangian to obtain the
    dual formulation

13
Primal to Dual (3)
Find maximum of L(a,b) with respect to a, b by
taking derivatives of them and equate them to 0.
Optimal a can be found.
Data enters only in the form of dot products! can
use kernels
14
Why Primal and Dual are Equal ?
  • Assume (w, b) is an optimal solution of the
    primal with the optimal objective value g
  • Thus, all (w, b) satisfies
  • There is agt0, that for all (w, b),
  • On the other hand,

15
Solving
  • In addition, putting (w, b) into
  • With agt0,


Karush-Kuhn-Tucker condition
  • only training points whose margin 1 will
  • have non-zero ?, they are support vectors.
  • The decision boundary is determined only by the
    SV.

Important !
16
A Geometrical Interpretation
Class 2
SV mean how important a given training point is
in forming the final solution.
a100
a80.6
a70
a20
a50
a10.8
a40
a61.4
wxb1
a90
a30
Class 1
wxb0
wxb-1
17
Solving
  • parameters are expressed as linear combination
    of training points.
  • except an abnormal situation where all optimal
    a are zero, b can be solved using KKT.
  • for testing with a new data z, compute
  • and classify z as class 1 if the sum is
    positive,
  • class 2 otherwise

18
What if data is not linearly separable
  • We allow error ?i in classification

Class 2
wxb1
wxb0
Class 1
wxb-1
19
Soft Margin Hyperplane
  • ?i are just slack variables in optimization
    theory
  • We want to minimize
  • C tradeoff parameter between error and margin

20
1-Norm Soft Margin Box Constraint
  • The optimization problem becomes
  • Incorporating kernels, and rewriting it in terms
    of Lagrange Multiplier, this leads to the dual
    problem,
  • The only difference with the linear separable
    case is the upper bounded C on the a (Box
    constraint).
  • The influence of the individual patterns (which
    could be outliers) get limited.

21
1-Norm Soft Margin the Box Constraint (2)
  • The related KKT condition is
  • This implies that non-zero slack variables can
    only occur when ai C.

wxb1
wxb-1
22
Support Vector Regression
  • e Insensitive Loss Regression
  • Kernel Ridge Regression

23
e Insensitive Loss Regression
L
24
Quadratic e Insensitive Loss
25
Primal function
subject to
26
Lagrangian function
  • Optimality Conditions

27
Dual Form
maximize
Subjext to
KKT Optimality Conditions
28
Another form
If
Subject to
29
Solving and general to nonlinear
30
Kernel Ridge Regression
under constraints
Lagrangian
Differentiating in w and b we obtain
31
Dual Form of Kernel Ridge Regression
dual form
under constraint
the regression function
32
Vector form of Kernel Ridge Regression
Write a Comment
User Comments (0)
About PowerShow.com