Linear Models I - PowerPoint PPT Presentation

About This Presentation

Title:

Linear Models I

Description:

Input features: words campaigning', efforts', Iowa', Democrats' ... Topic: politics. Which is a bird image? Learning Setup for Classification Problems ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 24

Provided by: rong7

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Linear Models I

1
Linear Models (I)

Rong Jin

2
Review of Information Theory

What is information?
What is entropy?
Average information
Minimum coding length
Important inequality

Distribution for Generating Symbols
Distribution for Coding Symbols
3
Review of Information Theory (contd)

Mutual information
Measure the correlation between two random
variables
Symmetric
Kullback-Leibler distance
Difference between two distributions

4
Outline

Classification problems
Information theory for text classification
Gaussian generative
Naïve Bayes
Logistic regression

5
Classification Problems

Given input Xx1, x2, , xm
Predict the class label y
y?-1,1, binary class classification problems
y ?1, 2, 3, , c, multiple class
classification problems
Goal need to learn the function

6
Examples of Classification Problems

Text categorization
Input features words campaigning, efforts,
Iowa, Democrats,
Class label politics and non-politics
Image Classification
Input features color histogram, texture
distribution, edge distribution,
Class label bird image and non-bird image

7
Learning Setup for Classification Problems

Training examples
Identical Independent Distribution (i.i.d.)
Training examples are similar to testing examples
Goal
Find a model or a function that is consistent
with the training data

8
Information Theory for Text Classification
Distribution for Generating Symbols
Distribution for Coding Symbols

If coding distribution is similar to the
generating distribution ? short coding length ?
good compression rate

9
Compression Algorithm for TC
Topic Sports
New Document
Compression Model M1
Politics
16K bits
Compression Model M2
10K bits
Sports
10
Probabilistic Models for Classification Problems

Apply statistical inference methods
Key finding the best parameters ?
Maximum likelihood (MLE) approach
Log-likelihood of data
Find the parameters ? that maximizes the
log-likelihood

11
Generative Models

Not directly estimate p(yx?)
Using Bayes rule
Estimate p(xly?) instead of p(yx?)
Why p(xly?)?
Most well known distributions are p(xl?).
Allocate a separate set of parameters for each
class
? ? ?1, ?2,, ?c
p(xly?) ? p(xl?y)
Describes the special input patterns for each
class y

12
Gaussian Generative Model (I)

Assume a Gaussian model for each class
One dimension case
Results for MLE

13
Example

Height histogram for males and females.
Using Gaussian generative model
P(male1.8) ? , P(female1.4) ?

14
Gaussian Generative Model (II)

Consider multiple input features
Xx1, x2, , xm
Multi-variate Gaussian distribution
?y is a m?m covariance matrix
Results for MLE
Problem
Singularity of ?y too many parameters

15
Overfitting Issue

Complex model
Insufficient training
Consider a classification problem of multiple
inputs
100 input features
5 classes
1000 training examples
Total number parameters for a full Gaussian model
is
5 means ? 500 parameters
5 covariance matrices ? 50,000 parameters
50,500 parameters ? insufficient training data

16
Another Example of Overfitting
17
Another Example of Overfitting
18
Another Example of Overfitting
19
Another Example of Overfitting
20
Naïve Bayes