Title: Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion
1Pattern Analysis using Convex Optimization Part
2 of Chapter 7 Discussion
2About todays discussion
- Last time discussed convex opt.
- Today Will apply what we learned to 4 pattern
analysis problems given in book - (1) Smallest enclosing hypersphere (one-class
SVM) - (2) SVM classification
- (3) Support vector regression (SVR)
- (4) On-line classification and regression
3About todays discussion
- This time for the most part
- Describe problems
- Derive solutions ourselves on the board!
- Apply convex opt. knowledge to solve
- Mostly board work today
4Recall KKT Conditions
- What we will use
- Key to remember ch. 7
- Complementary slackness -gt sparse dual rep.
- Convexity -gt efficient global solution
5Novelty Detection Hypersphere
- Train data learn support
- Capture with hypersphere
- Outside novel or abnormal or anomaly
- Smaller sphere more fine-tuned novelty detection
61st Smallest Enclosing Hypersphere
- Given
- Find center, c, of smallest hypersphere
containing S
7S.E.H. Optimization Problem
- O.P.
- Lets solve using Lagrangian and KKT and discuss
8Cheat
9S.E.H. Solution
Dualprimal _at_
10Theorem on bound of false positive
11Hypersphere that only contains some data soft
hypersphere
- Balance missing some points and reducing radius
- Robustness single point could throw off
- Introduce slack variables (repeated approach)
- 0 within sphere, squared distance outside
12Hypersphere optimization problem
- Now with trade off between radius and training
point error - Lets derive solution again
13Cheat
14Soft hypersphere solution
15Linear Kernel Example
16Similar theorem
17Remarks
- If data lies in subspace of feature space
- Hypersphere overestimates support in
perpendicular dir. - Can use kernel PCA (next week discussion)
- If normalized data (k(x,x)1)
- Corresponds to separating hyperplane, from origin
18Maximal Margin Classifier
- Data and linear classifier
- Hinge loss, gamma margin
- Linear separable if
19Margin Example
20Typical formulation
- Typical formulation fixes gamma (functional
margbin) to 1 and allows w to vary since scaling
doesnt affect decision, margin proportional to
1/norm(w) to vary. - Here we fix w norm, and vary functional margin
gamma
21Hard Margin SVM
- Arrive at optimization problem
- Lets solve
22Cheat
23Solution
24Example with Gaussian kernel
25Soft Margin Classifier
- Non-separable - Introduce slack variables as
before - Trade off with 1-norm of error vector
26Solve Soft Margin SVM
27Soft Margin Solution
28Soft Margin Example
29Support Vector Regression
- Similar idea to classification, except turned
inside-out - Epsilon-insensitive loss instead of hinge
- Ridge Regression Squared-error loss
30Support Vector Regression
- But, encourage sparseness
- Need inequalities
- epsilon-insensitive loss
31Epsilon-insensitive
- Defines band around function for 0-loss
32SVR (linear epsilon)
- Opt. problem
- Lets solve again
33SVR Dual and Solution
34Online
- So far batch processed all at once
- Many tasks require data processed one at a time
from start - Learner
- Makes prediction
- Gets feedback (correct value)
- Updates
- Conservative only updates if non-zero loss
35Simple On-line Alg. Perceptron
- Threshold linear function
- At t1 weight updated if error
- Dual update rule
- If
36Algorithm Pseudocode
37Novikoff Theorem
- Convergence bound for hard-margin case
- If training points contained in ball of radius R
around origin - w hard margin svm with no bias and geometric
margin gamma - Initial weight
- Number of updates bounded by
38Proof
- From 2 inequalities
- Putting these together we have
- Which leads to bound
39Kernel Adatron
- Simple modification to perceptron, models hard
margin SVM with 0 threshold
alpha stops changing, either alpha positive and
right term 0, or right term negative
40Kernel Adatron Soft Margin
- 1-norm soft margin version
- Add upper bound to the values of alpha (C)
- 2-norm soft margin version
- Add constant to diagonal of kernel matrix
- SMO
- To allow a variable threshold, updates must be
made on pair of examples at once - Results in SMO
- Rate of convergence both algs. sensitive to order
- Good heuristics, e.g. choose points most violate
conditions first
41On-line regression
- Also works for regression case
- Basic gradient ascent with additional constraints
42Online SVR
43Questions