Support Vector Machines - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Support Vector Machines

Description:

Support Vector Machines are a non-parametric tool for ... 'The Art History of Florence' Nissan Levin and Jacob Zahavi in Lattin, Carroll and Green (2003) ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 53
Provided by: jian89
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines


1
Support Vector Machines
Summer Course Data Mining
Support Vector Machinesand other penalization
classifiers
  • Presenter Georgi Nalbantov

Presenter Georgi Nalbantov
August 2009
2
Contents
  • Purpose
  • Linear Support Vector Machines
  • Nonlinear Support Vector Machines
  • (Theoretical justifications of SVM)
  • Marketing Examples
  • Other penalization classification methods
  • Conclusion and Q A
  • (some extensions)

3
Purpose
  • Task to be solved (The Classification Task)
    Classify cases (customers) into type 1 or
    type 2 on the basis of
  • some known attributes (characteristics)
  • Chosen tool to solve this taskSupport Vector
    Machines

4
The Classification Task
  • Given data on explanatory and explained
    variables, where the explained variable can take
    two values ? 1 , find a function that gives
    the best separation between the -1 cases and
    the 1 cases
  • Given ( x1, y1 ), , ( xm , ym ) ?
    ?n ? ? 1
  • Find ? ?n ? ? 1
  • best function the expected error on unseen
    data ( xm1, ym1 ), , ( xmk , ymk )
    is minimal
  • Existing techniques to solve the classification
    task
  • Linear and Quadratic Discriminant Analysis
  • Logit choice models (Logistic Regression)
  • Decision trees, Neural Networks, Least Squares
    SVM

5
Support Vector Machines Definition
  • Support Vector Machines are a non-parametric tool
    for classification/regression
  • Support Vector Machines are used for prediction
    rather than description purposes
  • Support Vector Machines have been developed by
    Vapnik and co-workers

6
Linear Support Vector Machines
  • A direct marketing company wants to sell a new
    book
  • The Art History of Florence
  • Nissan Levin and Jacob Zahavi in Lattin, Carroll
    and Green (2003).
  • Problem How to identify buyers and non-buyers
    using the two variables
  • Months since last purchase
  • Number of art books purchased

? buyers
? non-buyers
Number of art books purchased
Months since last purchase
7
Linear SVM Separable Case
  • Main idea of SVMseparate groups by a line.
  • However There are infinitely many lines that
    have zero training error
  • which line shall we choose?

? buyers
? non-buyers
Number of art books purchased
Months since last purchase
8
Linear SVM Separable Case
  • SVM use the idea of a margin around the
    separating line.
  • The thinner the margin,
  • the more complex the model,
  • The best line is the one with thelargest margin.

? buyers
? non-buyers
Number of art books purchased
Months since last purchase
9
Linear SVM Separable Case
  • The line having the largest margin isw1x1
    w2x2 b 0
  • Where
  • x1 months since last purchase
  • x2 number of art books purchased
  • Note
  • w1xi 1 w2xi 2 b ? 1 for i ? ?
  • w1xj 1 w2xj 2 b ? 1 for j ? ?

x2
w1x1 w2x2 b 1
w1x1 w2x2 b 0
w1x1 w2x2 b -1
Number of art books purchased
margin
x1
Months since last purchase
10
Linear SVM Separable Case
  • The width of the margin is given by
  • Note

x2
w1x1 w2x2 b 1
w1x1 w2x2 b 0
w1x1 w2x2 b -1
Number of art books purchased
margin
x1
Months since last purchase
11
Linear SVM Separable Case
x2
  • The optimization problem for SVM is
  • subject to
  • w1xi 1 w2xi 2 b ? 1 for i ? ?
  • w1xj 1 w2xj 2 b ? 1 for j ? ?

margin
x1
12
Linear SVM Separable Case
Support vectors
x2
  • Support vectors are those points that lie on
    the boundaries of the margin
  • The decision surface (line) is determined only by
    the support vectors. All other points are
    irrelevant

x1
13
Linear SVM Nonseparable Case
  • Non-separable case there is no line separating
    errorlessly the two groups
  • Here, SVM minimize L(w,C)
  • subject to
  • w1xi 1 w2xi 2 b ? 1 ?i for i ? ?
  • w1xj 1 w2xj 2 b ? 1 ?i for j ? ?
  • ?I,j ? 0

Training set 1000 targeted customers
x2
? buyers
? non-buyers
w1x1 w2x2 b 1
?
?
?
?
?
?
?
?
?
?
L(w,C) Complexity Errors
?
?
?
?
?
?
?
?
?
?
x1
14
Linear SVM The Role of C
x2
C 5
?
?
x1
  • Bigger C
  • Smaller C

decreased complexity
increased complexity
( wider margin )
( thinner margin )
bigger number errors
smaller number errors
( worse fit on the data )
( better fit on the data )
  • Vary both complexity and empirical error via C
    by affecting the optimal w and optimal number of
    training errors

15
Bias Variance trade-off
16
From Regression into Classification
  • We have a linear model, such as
  • We have to estimate this relation using our
    training data set and having in mind the
    so-called accuracy, or 0-1 loss function (our
    evaluation criterion).
  • The training data set we have consists of only
    MANY observations, for instance

Training data
17
From Regression into Classification
  • We have a linear model, such as

y
  • We have to estimate this relation using our
    training data set and having in mind the
    so-called accuracy, or 0-1 loss function (our
    evaluation criterion).

1
  • The training data set we have consists of only
    MANY observations, for instance

-1
Training data
x
x
18
From Regression into ClassificationSupport
Vector Machines
  • flatter line ? greater penalization

y
1
-1
x
x
margin
19
From Regression into ClassificationSupport
Vector Machines
y
x2
x1
x2
x1
margin
  • flatter line ? greater penalization
  • smaller slope ? bigger margin

equivalently
20
Nonlinear SVM Nonseparable Case
  • Mapping into a higher-dimensional space
  • Optimization task minimize L(w,C)
  • subject to

  • ?

  • ?

x2
x1
21
Nonlinear SVM Nonseparable Case
  • Map the data into higher-dimensional space ?2
    ?3

?
x1
?
(1,-1)
22
Nonlinear SVM Nonseparable Case
  • Find the optimal hyperplane in the transformed
    space

?
x1
?
(1,-1)
23
Nonlinear SVM Nonseparable Case
  • Observe the decision surface in the original
    space (optional)

x2
?
?
?
x1
?
?
?
24
Nonlinear SVM Nonseparable Case
  • Dual formulation of the (primal) SVM
    minimization problem

Primal
Dual
Subject to
Subject to
25
Nonlinear SVM Nonseparable Case
  • Dual formulation of the (primal) SVM
    minimization problem

Dual
Subject to
(kernel function)
26
Nonlinear SVM Nonseparable Case
  • Dual formulation of the (primal) SVM
    minimization problem

Dual
Subject to
(kernel function)
27
Strengths and Weaknesses of SVM
  • Strengths of SVM
  • Training is relatively easy
  • No local minima
  • It scales relatively well to high dimensional
    data
  • Trade-off between classifier complexity and error
    can be controlled explicitly via C
  • Robustness of the results
  • The curse of dimensionality is avoided
  • Weaknesses of SVM
  • What is the best trade-off parameter C ?
  • Need a good transformation of the original space

28
The Ketchup Marketing Problem
  • Two types of ketchup Heinz and Hunts
  • Seven Attributes
  • Feature Heinz
  • Feature Hunts
  • Display Heinz
  • Display Hunts
  • FeatureDisplay Heinz
  • FeatureDisplay Hunts
  • Log price difference between Heinz and Hunts
  • Training Data 2498 cases (89.11 Heinz is
    chosen)
  • Test Data 300 cases (88.33 Heinz is chosen)

29
The Ketchup Marketing Problem
  • Choose a kernel mapping

Cross-validation mean squared errors, SVM with
RBF kernel
Linear kernel Polynomial kernel RBF kernel
  • Do (5-fold ) cross-validation procedure to find
    the best combination of the manually adjustable
    parameters (here C and s)

C
min
max
s
30
The Ketchup Marketing Problem Training Set
31
The Ketchup Marketing Problem Training Set
32
The Ketchup Marketing Problem Training Set
33
The Ketchup Marketing Problem Training Set
34
The Ketchup Marketing Problem Test Set
35
The Ketchup Marketing Problem Test Set
36
The Ketchup Marketing Problem Test Set
37
  • Part II
  • Penalized classification and regression methods
  • Support Hyperplanes
  • Nearest Convex Hull classifier
  • Soft Nearest Neighbor
  • Application An example Support Vector
    Regression financial study
  • Conclusion

38
  • Classification
  • Support Hyperplanes
  • There are infinitely many hyperplanes that are
    semi-consistent ( commit no error) with the
    training data.
  • Consider a (separable) binary classification
    case training data (,-) and a test point x.

39
  • Classification
  • Support Hyperplanes
  • Support hyperplaneof x
  • For the classification of the test point x, use
    the farthest-away h-plane that is semi-consistent
    with training data.
  • The SH decision surface. Each point on it has 2
    support h-planes.

40
  • Classification
  • Support Hyperplanes
  • Toy Problem Experiment with Support Hyperplanes
    and Support Vector Machines

41
  • Classification
  • Support Vector Machines and Support Hyperplanes
  • Support Vector Machines
  • Support Hyperplanes

42
  • Classification
  • Support Vector Machines and Nearest Convex Hull
    cl.
  • Support Vector Machines
  • Nearest Convex Hull classification

43
  • Classification
  • Support Vector Machines and Soft Nearest Neighbor
  • Support Vector Machines
  • Soft Nearest Neighbor

44
  • Classification Support Hyperplanes
  • Support Hyperplanes
  • Support Hyperplanes
  • (bigger penalization)

45
  • Classification Nearest Convex Hull classification
  • Nearest Convex Hull classification
  • Nearest Convex Hull classification
  • (bigger penalization)

46
  • Classification Soft Nearest Neighbor
  • Soft Nearest Neighbor
  • (bigger penalization)
  • Soft Nearest Neighbor

47
  • Classification Support Vector Machines,
  • Nonseparable Case
  • Support Vector Machines

48
  • Classification Support Hyperplanes,
  • Nonseparable Case
  • Support Hyperplanes

49
  • Classification Nearest Convex Hull
    classification,
  • Nonseparable Case
  • Nearest Convex Hull classification

50
  • Classification Soft Nearest Neighbor,
  • Nonseparable Case
  • Soft Nearest Neighbor

51
Summary Penalization Techniques for
Classification
  • Penalization methods for classification Support
    Vector Machines (SVM), Support Hyperplanes (SH),
    Nearest Convex Hull classification (NCH), and
    Soft Nearest Neighbour (SNN). In all cases, the
    classificarion of test point x is dete4rmined
    using the hyperplane h. Equivalently, x is
    labelled 1 (-1) if it is farther away from set
    S_ (S).

52
Conclusion
  • Support Vector Machines (SVM) can be applied in
    the binaryand multi-class classification
    problems
  • SVM behave robustly in multivariate problems
  • Further research in various Marketing areas is
    needed to justifyor refute the applicability of
    SVM
  • Support Vector Regressions (SVR) can also be
    applied
  • http//www.kernel-machines.org
  • Email nalbantov_at_few.eur.nl
Write a Comment
User Comments (0)
About PowerShow.com