CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers

About This Presentation

Title:

CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers

Description:

Title: QA for the Web Author: Dan Moldovan Last modified by: ivan Created Date: 5/7/2002 3:19:09 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 28

Provided by: DanMol3

Category:

more less

Transcript and Presenter's Notes

Title: CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers

1
CS546 Machine Learning and Natural
LanguageDiscriminative vs Generative Classifiers

This lecture is based on (Ng Jordan, 02) paper
and some slides are based on Tom Mitchells slides

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. A
2
Outline

Reminder Naive Bayes and Logistic Regression
(MaxEnt)
Asymptotic Analysis
What is better if you have an infinite dataset?
Non-asymptotic Analysis
What is the rate of convergence of parameters?
More important convergence of the expected error
Empirical evaluation
Why this lecture?
Nice and simple application of Large Deviation
bounds we considered before
We will analyze specifically NB vs
LogRegression, but hope it generalizes to other
models (e.g, models for sequence labeling or
parsing)

3
Discriminative vs Generative

Training classifiers involves estimating f X ?
Y, or P(YX)
Discriminative classifiers (conditional models)
Assume some functional form for P(YX)
Estimate parameters of P(YX) directly from
training data
Generative classifiers (joint models)
Assume some functional form for P(XY), P(X)
Estimate parameters of P(XY), P(X) directly from
training data
Use Bayes rule to calculate P(YX xi)

4
Naive Bayes

Example assume Y boolean, X ltx1, x2, , xngt,
where xi are binary
Generative model Naive Bayes
Classify new example x based on ratio
You can do it in log-scale

s indicates size of set. l is smoothing parameter
5
Naive Bayes vs Logistic Regression

Generative model Naive Bayes
Classify new example x based on ratio
Logistic Regression
Recall both classifiers are linear

6
What is the difference asymptotically?

Notation let denote error of
hypothesis learned via algorithm A, from m
examples
If the Naive Bayes model is true
Otherwise
Logistic regression estimator is consistent
² (hDis,m) converges to
H is the class of all linear classifers
Therefore, it is asymptotically better than the
linear classifier selected by the NB algorithm

7
Rate of covergence logistic regression

Convergences to best linear classifier, in order
of n examples
follows from Vapniks structural risk bound
(VC-dimension of n dimensional linear separators
is n1 )

8
Rate of covergence Naive Bayes

We will proceed in 2 stages
Consider how fast parameters converge to their
optimal values
(we do not care about it, actually)
We care Derive how it corresponds to the
convergence of the error to the asymptotical
error
The authors consider a continous case (where
input is continious) but it is not very
interesting for NLP
However, similar techniques apply

9
Convergence of Parameters
10
Recall Chernoff Bound
11
Recall Union Bound
12
Proof of Lemma (no smoothing for simplicity)

By the Chernoffs bound, with probability at
least
the fraction of positive examples will be within
of
Therefore we have at least positive and
negative examples
By the Chernoffs bound for every feature and
class label (2n cases) with probability
We have one event with probability and 2n
events with probabilities , there joint
probability is not greater than sum
Solve this for m, and you get

13
Implications

With a number of samples logarithmic in n (not
linear as for the logistic regression!) the
parameters of approach parameters
of
Are we done?
Not really this does not automatically imply
that
the error approaches
with the same rate

14
Implications