Review - PowerPoint PPT Presentation

About This Presentation

Title:

Review

Description:

Learn N SVM's. SVM 1 learns 'Output==1' vs 'Output != 1' SVM 2 learns 'Output==2' ... SVM N learns 'Output==N' vs 'Output != N' Error Correct Output Code (ECOC) ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 42

Provided by: rong7

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Review

1
Review

Rong Jin

2
Comparison of Different Classification Models

The goal of all classifiers
Predicating class label y for an input x
Estimate p(yx)

3
K Nearest Neighbor (kNN) Approach
4
K Nearest Neighbor Approach (KNN)

What is the appropriate size for neighborhood
N(x)?
Leave one out approach
Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(yx)
How to estimate the appropriate value for ?2?

5
K Nearest Neighbor Approach (KNN)

What is the appropriate size for neighborhood
N(x)?
Leave one out approach
Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(yx)
How to estimate the appropriate value for ?2?

6
K Nearest Neighbor Approach (KNN)

What is the appropriate size for neighborhood
N(x)?
Leave one out approach
Weight K nearest neighbor
Neighbor is defined through a weight function
Estimate p(yx)
How to estimate the appropriate value for ?2?

7
Weighted K Nearest Neighbor

Leave one out maximum likelihood
Estimate leave one out probability
Leave one out likelihood of training data
Search the optimal ?2 by maximizing the leave one
out likelihood

8
Weight K Nearest Neighbor

Leave one out maximum likelihood
Estimate leave one out probability
Leave one out likelihood of training data
Search the optimal ?2 by maximizing the leave one
out likelihood

9
Gaussian Generative Model

p(yx) p(xy) p(y) posterior likelihood ?
prior
Estimate p(xy) and p(y)
Allocate a separate set of parameters for each
class
? ? ?1, ?2,, ?c
p(xly?) ? p(x?y)
Maximum likelihood estimation

10
Gaussian Generative Model

p(yx) p(xy) p(y) posterior likelihood ?
prior
Estimate p(xy) and p(y)
Allocate a separate set of parameters for each
class
? ? ?1, ?2,, ?c
p(xly?) ? p(x?y)
Maximum likelihood estimation

11
Gaussian Generative Model

Difficult to estimate p(xy) if x is of high
dimensionality
Naïve Bayes
Essentially a linear model
How to make a Gaussian generative model
discriminative?
(?m,?m) of each class are only based on the data
belonging to that class ? lack of discriminative
power

12
Gaussian Generative Model

Maximum likelihood estimation

13
Gaussian Generative Model

Bound optimization algorithm

14
Gaussian Generative Model
We have decomposed the interaction of parameters
between different classes
Question how to handle x with multiple features ?
15
Logistic Regression Model

A linear decision boundary w?xb
A probabilistic model p(yx)
Maximum likelihood approach for estimating
weights w and threshold b

16
Logistic Regression Model

Overfitting issue
Example text classification
Words that appears in only one document will be
assigned with infinite large weight
Solution regularization

17
Non-linear Logistic Regression Model

Kernelize logistic regression model

18
Non-linear Logistic Regression Model

Hierarchical Mixture Expert Model
Group linear classifiers into a tree structure

Products generates nonlinearity in the prediction
function
19
Non-linear Logistic Regression Model

It could be a rough assumption by assuming all
data points can be fitted by a linear model
But, it is usually appropriate to assume a local
linear model
KNN can be viewed as a localized model without
any parameters
Can we extend the KNN approach by introducing a
localized linear model?

20
Localized Logistic Regression Model

Similar to the weight KNN
Weigh each training example by
Build a logistic regression model using the
weighted examples

21
Localized Logistic Regression Model

Similar to the weight KNN
Weigh each training example by
Build a logistic regression model using the
weighted examples

22
Conditional Exponential Model

An extension of logistic regression model to
multiple class case
A different set of weights wy and threshold b for
each class y
Translation invariance

23
Maximum Entropy Model

Finding the simplest model that matches with the
data

Iterative scaling methods for optimization

24
Support Vector Machine

Classification margin
Maximum margin principle
Separate data far away from the decision boundary
Two objectives
Minimize the classification error over training
data
Maximize the classification margin
Support vectors
Only support vectors have impact on the location
of decision boundary

denotes 1 denotes -1
25
Support Vector Machine

Classification margin
Maximum margin principle
Separate data far away from the decision boundary
Two objectives
Minimize the classification error over training
data
Maximize the classification margin
Support vectors
Only support vectors have impact on the location
of decision boundary

denotes 1 denotes -1
26
Support Vector Machine

Separable case
Noisy case

27
Support Vector Machine

Separable case
Noisy case

28
Logistic Regression Model vs. Support Vector
Machine

Logistic regression model
Support vector machine

29
Logistic Regression Model vs. Support Vector
Machine
Logistic regression differs from support vector
machine only in the loss function
30
Kernel Tricks

Introducing nonlinearity into the discriminative
models
Diffusion kernel
A graph laplacian L for local similarity
Diffusion kernel
Propagate local similarity information into a
global one

31
Fisher Kernel

Derive a kernel function from a generative model
Key idea
Map a point x in original input space into the
model space
The similarity of two data points are measured in
the model space

Model Space
32
Kernel Methods in Generative Model

Usually, kernels can be introduced to a
generative model through a Gaussian process
Define a kernelized covariance matrix
Positive semi-definitive, similar to Mercers
condition

33
Multi-class SVM

SVMs can only handle two-class outputs
One-against-all
Learn N SVMs
SVM 1 learns Output1 vs Output ! 1
SVM 2 learns Output2 vs Output ! 2
SVM N learns OutputN vs Output ! N

34
Error Correct Output Code (ECOC)

Encode each class into a bit vector

1 1 2
x
1 1 1 0
35
Ordinal Regression

A special class of multi-class classification
problem
There a natural ordinal relationship between
multiple classes
Maximum margin principle
The computation of margin involves multiple
classes

36
Ordinal Regression
37
Decision Tree
From slides of Andrew Moore
38
Decision Tree

A greedy approach for generating a decision tree
Choose the most informative feature
Using the mutual information measurements
Split data set according to the values of the
selected feature
Recursive until each data item is classified
correctly
Attributes with real values
Quantize the real value into a discrete one

39
Decision Tree

The overfitting problem
Tree pruning
Reduced error pruning
Rule post-pruning

40
Decision Tree

The overfitting problem
Tree pruning
Reduced error pruning
Rule post-pruning

41
Generalize Decision Tree
Each node is a linear classifier
?
?

?
?
?
?

a decision tree using classifiers for data
partition
a decision tree with simple data partition

Write a Comment

User Comments (0)