Support Vector Machine (SVM) - PowerPoint PPT Presentation

About This Presentation
Title:

Support Vector Machine (SVM)

Description:

Support Vector Machine (SVM) Based on Nello Cristianini presentation ... Sports, news, business, science, ... Feature space. Bag of words. Huge sparse vector! ... – PowerPoint PPT presentation

Number of Views:218
Avg rating:3.0/5.0
Slides: 30
Provided by: yishaym4
Category:
Tags: svm | machine | support | vector

less

Transcript and Presenter's Notes

Title: Support Vector Machine (SVM)


1
Support Vector Machine (SVM)
  • Based on Nello Cristianini presentation
  • http//www.support-vector.net/tutorial.html

2
Basic Idea
  • Use Linear Learning Machine (LLM).
  • Overcome the linearity constraints
  • Map to non-linearly to higher dimension.
  • Select between hyperplans
  • Use margin as a test
  • Generalization depends on the margin.

3
General idea
Transformed Problem
Original Problem
4
Kernel Based Algorithms
  • Two separate learning functions
  • Learning Algorithm
  • in an imbedded space
  • Kernel function
  • performs the embedding

5
Basic Example Kernel Perceptron
  • Hyperplane classification
  • f(x)ltw,xgtb ltw,xgt
  • h(x) sign(f(x))
  • Perceptron Algorithm
  • Sample (xi,ti), ti?-1,1
  • If ti ltwk,xigt lt 0 THEN / Error/
  • wk1 wk ti xi
  • kk1

6
Recall
  • Margin of hyperplan w
  • Mistake bound

7
Observations
  • Solution is a linear combination of inputs
  • w ? ai ti xi
  • where ai gt0
  • Mistake driven
  • Only points on which we make mistake influence!
  • Support vectors
  • The non-zero ai

8
Dual representation
  • Rewrite basic function
  • f(x) ltw,xgt b ? ai ti ltxi , xgt b
  • w ? ai ti xi
  • Change update rule
  • IF tj (? ai ti ltxi , xjgt b) lt 0
  • THEN aj aj1
  • Observation
  • Data only inside inner product!

9
Limitation of Perceptron
  • Only linear separations
  • Only converges for linearly separable data
  • Only defined on vectorial data

10
The idea of a Kernel
  • Embed data to a different space
  • Possibly higher dimension
  • Linearly separable in the new space.

11
Kernel Mapping
  • Need only to compute inner-products.
  • Mapping M(x)
  • Kernel K(x,y) lt M(x) , M(y)gt
  • Dimensionality of M(x) unimportant!
  • Need only to compute K(x,y)
  • Using it in the embedded space
  • Replace ltx,ygt by K(x,y)

12
Example
  • x(x1 , x2) z(z1 , z2) K(x,z) (ltx,zgt)2

13
Polynomial Kernel
Transformed Problem
Original Problem
14
Kernel Matrix
15
Example of Basic Kernels
  • Polynomial
  • K(x,z) (ltx,zgt )d
  • Gaussian
  • K(x,z) exp- x-z2 /2?

16
Kernel Closure Properties
  • K(x,z) K1(x,z) c
  • K(x,z) cK1(x,z)
  • K(x,z) K1(x,z) K2(x,z)
  • K(x,z) K1(x,z) K2(x,z)
  • Create new kernels using basic ones!

17
Support Vector Machines
  • Linear Learning Machines (LLM)
  • Use dual representation
  • Work in the kernel induced feature space
  • f(x) ? ai ti K(xi , x) b
  • Which hyperplane to select

18
Generalization of SVM
  • PAC theory
  • error O( Vcdim / m)
  • Problem Vcdim gtgt m
  • No preference between consistent hyperplanes

19
Margin based bounds
  • H Basic Hypothesis class
  • conv(H) finite convex combinations of H
  • D Distribution over X and 1,-1
  • S Sample of size m over D

20
Margin based bounds
  • THEOREM for every f in conv(H)

21
Maximal Margin Classifier
  • Maximizes the margin
  • Minimizes the overfitting due to margin
    selection.
  • Increases margin
  • Rather than reduce dimensionality

22
SVM Support Vectors
23
Margins
  • Geometric Margin mini ti f(xi)/ w Functional
    margin mini ti f(xi)

f(x)
24
Main trick in SVM
  • Insist on functional marginal at least 1.
  • Support vectors have margin 1.
  • Geometric margin 1 / w
  • Proof.

25
SVM criteria
  • Find a hyperplane (w,b)
  • That Maximizes w 2 ltw,wgt
  • Subject to
  • for all i
  • ti (ltw,xigtb) ? 1

26
Quadratic Programming
  • Quadratic goal function.
  • Linear constraint.
  • Unique Maximum.
  • Polynomial time algorithms.

27
Dual Problem
  • Maximize
  • W(a) ? ai - 1/2 ?i,j ai ti aj tj K(xi , xj) b
  • Subject to
  • ?i ai ti 0
  • ai ? 0

28
Applications Text
  • Classify a text to given categories
  • Sports, news, business, science,
  • Feature space
  • Bag of words
  • Huge sparse vector!

29
Applications Text
  • Practicalities
  • Mw(x) tfw log (idfw) / K
  • ftw text frequency of w
  • idfw inverse document frequency
  • idfw documents / documents with w
  • Inner product ltM(x),M(z)gt
  • sparse vectors
  • SVM finds a hyperplan in document space
Write a Comment
User Comments (0)
About PowerShow.com