Linear Models (II) presentation

About This Presentation

Transcript and Presenter's Notes

Title: Linear Models (II)

1
Linear Models (II)

Rong Jin

2
Recap

Classification problems
Inputs x ? output y
y is from a discrete set
Example height 1.8m ? male/female?
Statistical learning approaches for
classification problems

(1.8m, m) (1.87, m) (1.65, f) (1.66, m) (1.58, f)
(1.63, f)
p(hmale), p(male) p(fmale), p(female)
p(male1.8) p(female1.8)
3
Recap

Generative Model
p(yx) determine the class y for object x
p(y) how frequent class y appears
p(xy) the input pattern for class y
Example
1.8m ? male? female?
p(male1.8m) p(male)p(1.8mmale)/p(1.8m)
p(female1.8m) p(female)p(1.8mfemale)/p(1.8m)
p(1.8m) p(1.8mmale)p(male)p(1.8mfemale)p(fema
le)

4
Recap

Learning p(xy) and p(y)
p(y) example(y)/examples
Maximum likelihood estimation for p(xy)
Example
Training examples
(1.8m, m) (1.87, m) (1.65, f) (1.66, m) (1.58, f)
(1.63, f)
p(male) Nmale/N p(female) Nfemale/N
Assume that the height distributions for male and
female are Gaussian
(?male,?male), (?female,?female)
MLE estimation

5
Recap
6
Recap
7
Recap

Naïve Bayes
Input x is a vector xx1, x2,,xm
Assume each feature is independent from each
other given the class y
p(xy)p(x1y)p(x2y)p(xmy)
each p(xiy) is estimated using MLE approach

8
Text Classification (I)

Learning to classify text
Input x document
Represented by a vector of words
Output y interesting or not
1 for interesting document, -1 for uninteresting
Generative model for text classification (TC)
p(), p(-)
p(doc), p(doc-)
Naïve Bayes approach

9
Text Classification (II)

Learning parameters for TC
p() n()/N, p(-) n(-)/N
n(?) number of positive (or negative) documents
N total number of documents
Apply MLE for estimating p(w), p(w-)

10
Text Classification (IV)
Twenty Newsgroups
An Example
11
Text Classification (IV)

Any problems with the naïve Bayes text classifier?

12
Text Classifier (V)

Problems
Irrelevant words
Unseen words
Solution
Select relevant words using mutual information
I(x, y)
x whether or not word x appearing in a document
y the document is of interests or not
Unseen words
Word class approach
Introduce word class T t1, t2, , tm
Compute p(ti), p(ti-)
When w is unseen before, replace p(w?) with
p(ti?)
Word correlation approach
finding out the correlations between words
p(ww)
Using web information
p(w?) ?w p(ww)p(w?)

13
Logistic Regression Model

Gaussian generative model find a linear
decision boundary.
Why not learn a linear decision boundary
directly?

14
Logistic Regression Model

The log-ratio of positive class to negative class
Results

15
Logistic Regression Model

Assume the inputs and outputs are related in the
log linear function
Estimate weights MLE approach

16
Example 1 Heart Disease
1 25-29 2 30-34 3 35-39 4 40-44 5 45-49 6
50-54 7 55-59 8 60-64

Input feature x age group id
output y having heart disease or not
1 having heart disease
-1 no heart disease

17
Example 1 Heart Disease

Logistic regression model
Learning w and c MLE approach
Numerical optimization w 0.58, c -3.34

18
Example 1 Heart Disease

W 0.58
An old person is more likely to have heart
disease
C -3.34
i?wc lt 0 ? p(i) lt p(-i)
i?wc gt 0 ? p(i) gt p(-i)
i?wc 0 ? decision boundary
i 5.78 ? 53 year old

19
Naïve Bayes Solution

Inaccurate fitting
Non Gaussian distribution
i 5.59
Close to the estimation by logistic regression
Even though naïve Bayes does not fit input
patterns well, it still works fine for the
decision boundary

20
Problems with Using Histogram Data?
21
Uneven Sampling for Different Ages
22
Solution
w 0.63, c -3.56 ? i 5.65
23
Example Text Classification

Input x a binary vector
Each word is a different dimension
xi 0 if the ith word does not appear in the
document
xi 1 if it appears in the document
Output y interesting document or not
1 interesting
-1 uninteresting

24
Example Text Classification
Doc 1 The purpose of the Lady Bird Johnson
Wildflower Center is to educate people around the
world,
Doc 2 Rain Bird is one of the leading irrigation
manufacturers in the world, providingcomplete
irrigation solutions for people
term the world people company center
Doc 1 1 1 1 0 1
Doc 2 1 1 1 1 0
25
Example 2 Text Classification

Logistic regression model
Every term ti is assigned with a weight wi
Learning parameters MLE approach
Need numerical solutions

26
Example 2 Text Classification

Weight wi
wi gt 0 term ti is a positive evidence
wi lt 0 term ti is a negative evidence
wi 0 term ti is irrelevant to whether the
document is intesting
The larger the wi , the more important ti term
is determining whether the document is
interesting.
Threshold c

27
Example 2 Text Classification

Dataset Reuter-21578
Classification accuracy
Naïve Bayes 77
Logistic regression 88

28
Why Logistic Regression Works better for Text
Classification?

Common words
Small weights in logistic regression
Large weights in naïve Bayes
Weight p(w) p(w-)
Independence assumption
Naive Bayes assumes that each word is generated
independently
Logistic regression is able to take into account
of the correlation of words

29
Comparison

Generative Model
Model P(xy)
Model the input patterns
Usually fast converge
Cheap computation
Robust to noise data
But
Usually performs worse

Discriminative Model
Model P(yx) directly
Model the decision boundary
Usually good performance
But
Slow convergence
Expensive computation
Sensitive to noise data

Write a Comment

User Comments (0)

About PowerShow.com

Linear Models (II) PowerPoint PPT Presentation