Linear Models I - PowerPoint PPT Presentation

About This Presentation
Title:

Linear Models I

Description:

Input features: words campaigning', efforts', Iowa', Democrats' ... Topic: politics. Which is a bird image? Learning Setup for Classification Problems ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 24
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:
Tags: linear | models

less

Transcript and Presenter's Notes

Title: Linear Models I


1
Linear Models (I)
  • Rong Jin

2
Review of Information Theory
  • What is information?
  • What is entropy?
  • Average information
  • Minimum coding length
  • Important inequality

Distribution for Generating Symbols
Distribution for Coding Symbols
3
Review of Information Theory (contd)
  • Mutual information
  • Measure the correlation between two random
    variables
  • Symmetric
  • Kullback-Leibler distance
  • Difference between two distributions

4
Outline
  • Classification problems
  • Information theory for text classification
  • Gaussian generative
  • Naïve Bayes
  • Logistic regression

5
Classification Problems
  • Given input Xx1, x2, , xm
  • Predict the class label y
  • y?-1,1, binary class classification problems
  • y ?1, 2, 3, , c, multiple class
    classification problems
  • Goal need to learn the function

6
Examples of Classification Problems
  • Text categorization
  • Input features words campaigning, efforts,
    Iowa, Democrats,
  • Class label politics and non-politics
  • Image Classification
  • Input features color histogram, texture
    distribution, edge distribution,
  • Class label bird image and non-bird image

7
Learning Setup for Classification Problems
  • Training examples
  • Identical Independent Distribution (i.i.d.)
  • Training examples are similar to testing examples
  • Goal
  • Find a model or a function that is consistent
    with the training data

8
Information Theory for Text Classification
Distribution for Generating Symbols
Distribution for Coding Symbols
  • If coding distribution is similar to the
    generating distribution ? short coding length ?
    good compression rate

9
Compression Algorithm for TC
Topic Sports
New Document
Compression Model M1
Politics
16K bits
Compression Model M2
10K bits
Sports
10
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Key finding the best parameters ?
  • Maximum likelihood (MLE) approach
  • Log-likelihood of data
  • Find the parameters ? that maximizes the
    log-likelihood

11
Generative Models
  • Not directly estimate p(yx?)
  • Using Bayes rule
  • Estimate p(xly?) instead of p(yx?)
  • Why p(xly?)?
  • Most well known distributions are p(xl?).
  • Allocate a separate set of parameters for each
    class
  • ? ? ?1, ?2,, ?c
  • p(xly?) ? p(xl?y)
  • Describes the special input patterns for each
    class y

12
Gaussian Generative Model (I)
  • Assume a Gaussian model for each class
  • One dimension case
  • Results for MLE

13
Example
  • Height histogram for males and females.
  • Using Gaussian generative model
  • P(male1.8) ? , P(female1.4) ?

14
Gaussian Generative Model (II)
  • Consider multiple input features
  • Xx1, x2, , xm
  • Multi-variate Gaussian distribution
  • ?y is a m?m covariance matrix
  • Results for MLE
  • Problem
  • Singularity of ?y too many parameters

15
Overfitting Issue
  • Complex model
  • Insufficient training
  • Consider a classification problem of multiple
    inputs
  • 100 input features
  • 5 classes
  • 1000 training examples
  • Total number parameters for a full Gaussian model
    is
  • 5 means ? 500 parameters
  • 5 covariance matrices ? 50,000 parameters
  • 50,500 parameters ? insufficient training data

16
Another Example of Overfitting
17
Another Example of Overfitting
18
Another Example of Overfitting
19
Another Example of Overfitting
20
Naïve Bayes
  • Simplify the model complexity
  • Diagonalize the covariance matrix ?y
  • Simplified Gaussian distribution
  • Feature independence assumption
  • Naïve Bayes assumption

21
Naïve Bayes
  • A terrible estimator for
  • But it is a very reasonable estimator for
  • Why?
  • The ratio of likelihood is more
    important
  • Naïve Bayes does a reasonable job on the
    estimation of ratio

22
The Ratio of Likelihood
  • Binary class
  • Both classes share the similar variance
  • A linear model !

23
Decision Boundary
  • Gaussian Generative Models Finding a linear
    decision boundary
  • Why not do it directly?
Write a Comment
User Comments (0)
About PowerShow.com