Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Description:

Epsilon-insensitive loss instead of hinge. Ridge Regression: Squared-error loss ... Epsilon-insensitive. Defines band around function for 0-loss. A KTEC Center ... – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 44
Provided by: herb94
Category:

less

Transcript and Presenter's Notes

Title: Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion


1
Pattern Analysis using Convex Optimization Part
2 of Chapter 7 Discussion
  • Presenter Brian Quanz

2
About todays discussion
  • Last time discussed convex opt.
  • Today Will apply what we learned to 4 pattern
    analysis problems given in book
  • (1) Smallest enclosing hypersphere (one-class
    SVM)
  • (2) SVM classification
  • (3) Support vector regression (SVR)
  • (4) On-line classification and regression

3
About todays discussion
  • This time for the most part
  • Describe problems
  • Derive solutions ourselves on the board!
  • Apply convex opt. knowledge to solve
  • Mostly board work today

4
Recall KKT Conditions
  • What we will use
  • Key to remember ch. 7
  • Complementary slackness -gt sparse dual rep.
  • Convexity -gt efficient global solution

5
Novelty Detection Hypersphere
  • Train data learn support
  • Capture with hypersphere
  • Outside novel or abnormal or anomaly
  • Smaller sphere more fine-tuned novelty detection

6
1st Smallest Enclosing Hypersphere
  • Given
  • Find center, c, of smallest hypersphere
    containing S

7
S.E.H. Optimization Problem
  • O.P.
  • Lets solve using Lagrangian and KKT and discuss

8
Cheat
9
S.E.H. Solution
  • H(x) 1 if xgt0, 0 o.w.

Dualprimal _at_
10
Theorem on bound of false positive
11
Hypersphere that only contains some data soft
hypersphere
  • Balance missing some points and reducing radius
  • Robustness single point could throw off
  • Introduce slack variables (repeated approach)
  • 0 within sphere, squared distance outside

12
Hypersphere optimization problem
  • Now with trade off between radius and training
    point error
  • Lets derive solution again

13
Cheat
14
Soft hypersphere solution
15
Linear Kernel Example
16
Similar theorem
17
Remarks
  • If data lies in subspace of feature space
  • Hypersphere overestimates support in
    perpendicular dir.
  • Can use kernel PCA (next week discussion)
  • If normalized data (k(x,x)1)
  • Corresponds to separating hyperplane, from origin

18
Maximal Margin Classifier
  • Data and linear classifier
  • Hinge loss, gamma margin
  • Linear separable if

19
Margin Example
20
Typical formulation
  • Typical formulation fixes gamma (functional
    margbin) to 1 and allows w to vary since scaling
    doesnt affect decision, margin proportional to
    1/norm(w) to vary.
  • Here we fix w norm, and vary functional margin
    gamma

21
Hard Margin SVM
  • Arrive at optimization problem
  • Lets solve

22
Cheat
23
Solution
  • Recall

24
Example with Gaussian kernel
25
Soft Margin Classifier
  • Non-separable - Introduce slack variables as
    before
  • Trade off with 1-norm of error vector

26
Solve Soft Margin SVM
  • Lets solve it!

27
Soft Margin Solution
28
Soft Margin Example
29
Support Vector Regression
  • Similar idea to classification, except turned
    inside-out
  • Epsilon-insensitive loss instead of hinge
  • Ridge Regression Squared-error loss

30
Support Vector Regression
  • But, encourage sparseness
  • Need inequalities
  • epsilon-insensitive loss

31
Epsilon-insensitive
  • Defines band around function for 0-loss

32
SVR (linear epsilon)
  • Opt. problem
  • Lets solve again

33
SVR Dual and Solution
  • Dual problem

34
Online
  • So far batch processed all at once
  • Many tasks require data processed one at a time
    from start
  • Learner
  • Makes prediction
  • Gets feedback (correct value)
  • Updates
  • Conservative only updates if non-zero loss

35
Simple On-line Alg. Perceptron
  • Threshold linear function
  • At t1 weight updated if error
  • Dual update rule
  • If

36
Algorithm Pseudocode
37
Novikoff Theorem
  • Convergence bound for hard-margin case
  • If training points contained in ball of radius R
    around origin
  • w hard margin svm with no bias and geometric
    margin gamma
  • Initial weight
  • Number of updates bounded by

38
Proof
  • From 2 inequalities
  • Putting these together we have
  • Which leads to bound

39
Kernel Adatron
  • Simple modification to perceptron, models hard
    margin SVM with 0 threshold

alpha stops changing, either alpha positive and
right term 0, or right term negative
40
Kernel Adatron Soft Margin
  • 1-norm soft margin version
  • Add upper bound to the values of alpha (C)
  • 2-norm soft margin version
  • Add constant to diagonal of kernel matrix
  • SMO
  • To allow a variable threshold, updates must be
    made on pair of examples at once
  • Results in SMO
  • Rate of convergence both algs. sensitive to order
  • Good heuristics, e.g. choose points most violate
    conditions first

41
On-line regression
  • Also works for regression case
  • Basic gradient ascent with additional constraints

42
Online SVR
43
Questions
  • Questions, Comments?
Write a Comment
User Comments (0)
About PowerShow.com