Title: Na
1Naïve Bayes Classifier
- Adopted from slides by Ke Chen from University of
Manchester and YangQiu Song from MSRA
2Generative vs. Discriminative Classifiers
- Training classifiers involves estimating f X ?
Y, or P(YX) - Discriminative classifiers (also called
informative by RubinsteinHastie) - Assume some functional form for P(YX)
- Estimate parameters of P(YX) directly from
training data - Generative classifiers
- Assume some functional form for P(XY), P(X)
- Estimate parameters of P(XY), P(X) directly from
training data - Use Bayes rule to calculate P(YX xi)
3Bayes Formula
4Generative Model
- Color
- Size
- Texture
- Weight
5Discriminative Model
- Color
- Size
- Texture
- Weight
6Comparison
- Generative models
- Assume some functional form for P(XY), P(Y)
- Estimate parameters of P(XY), P(Y) directly from
training data - Use Bayes rule to calculate P(YX x)
- Discriminative models
- Directly assume some functional form for P(YX)
- Estimate parameters of P(YX) directly from
training data
7Probability Basics
- Prior, conditional and joint probability for
random variables - Prior probability
- Conditional probability
- Joint probability
- Relationship
- Independence
- Bayesian Rule
8Probability Basics
- Quiz We have two six-sided dice. When they are
tolled, it could end up with the following
occurance (A) dice 1 lands on side 3, (B) dice
2 lands on side 1, and (C) Two dice sum to
eight. Answer the following questions
9Probabilistic Classification
- Establishing a probabilistic model for
classification - Discriminative model
10Probabilistic Classification
- Establishing a probabilistic model for
classification (cont.) - Generative model
-
11Probabilistic Classification
- MAP classification rule
- MAP Maximum A Posterior
- Assign x to c if
- Generative classification with the MAP rule
- Apply Bayesian rule to convert them into
posterior probabilities - Then apply the MAP rule
12Naïve Bayes
- Bayes classification
- Difficulty learning the joint probability
- Naïve Bayes classification
- Assumption that all input attributes are
conditionally independent! - MAP classification rule for
13Naïve Bayes
- Naïve Bayes Algorithm (for discrete input
attributes) - Learning Phase Given a training set S,
- Output conditional probability tables for
elements - Test Phase Given an unknown instance
, - Look up tables to assign the label c to X
if -
14Example
15Example
Outlook PlayYes PlayNo
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature PlayYes PlayNo
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity PlayYes PlayNo
High 3/9 4/5
Normal 6/9 1/5
Wind PlayYes PlayNo
Strong 3/9 3/5
Weak 6/9 2/5
P(PlayYes) 9/14
P(PlayNo) 5/14
16Example
- Test Phase
- Given a new instance,
- x(OutlookSunny, TemperatureCool,
HumidityHigh, WindStrong) - Look up tables
- MAP rule
P(OutlookSunnyPlayNo) 3/5 P(TemperatureCool
PlayNo) 1/5 P(HuminityHighPlayNo)
4/5 P(WindStrongPlayNo) 3/5 P(PlayNo) 5/14
P(OutlookSunnyPlayYes) 2/9 P(TemperatureCool
PlayYes) 3/9 P(HuminityHighPlayYes)
3/9 P(WindStrongPlayYes) 3/9 P(PlayYes)
9/14
P(Yesx) P(SunnyYes)P(CoolYes)P(HighYes)P(St
rongYes)P(PlayYes) 0.0053 P(Nox)
P(SunnyNo) P(CoolNo)P(HighNo)P(StrongNo)P(Pl
ayNo) 0.0206 Given the fact
P(Yesx) lt P(Nox), we label x to be No.
17Example
- Test Phase
- Given a new instance,
- x(OutlookSunny, TemperatureCool,
HumidityHigh, WindStrong) - Look up tables
- MAP rule
P(OutlookSunnyPlayNo) 3/5 P(TemperatureCool
PlayNo) 1/5 P(HuminityHighPlayNo)
4/5 P(WindStrongPlayNo) 3/5 P(PlayNo) 5/14
P(OutlookSunnyPlayYes) 2/9 P(TemperatureCool
PlayYes) 3/9 P(HuminityHighPlayYes)
3/9 P(WindStrongPlayYes) 3/9 P(PlayYes)
9/14
P(Yesx) P(SunnyYes)P(CoolYes)P(HighYes)P(St
rongYes)P(PlayYes) 0.0053 P(Nox)
P(SunnyNo) P(CoolNo)P(HighNo)P(StrongNo)P(Pl
ayNo) 0.0206 Given the fact
P(Yesx) lt P(Nox), we label x to be No.
18Relevant Issues
- Violation of Independence Assumption
- For many real world tasks,
- Nevertheless, naïve Bayes works surprisingly well
anyway! - Zero conditional probability Problem
- If no example contains the attribute value
- In this circumstance,
during test - For a remedy, conditional probabilities estimated
with -
19Relevant Issues
- Continuous-valued Input Attributes
- Numberless values for an attribute
- Conditional probability modeled with the normal
distribution - Learning Phase
- Output normal distributions and
- Test Phase
- Calculate conditional probabilities with all the
normal distributions - Apply the MAP rule to make a decision
20Conclusions
- Naïve Bayes based on the independence assumption
- Training is very easy and fast just requiring
considering each attribute in each class
separately - Test is straightforward just looking up tables
or calculating conditional probabilities with
normal distributions - A popular generative model
- Performance competitive to most of
state-of-the-art classifiers even in presence of
violating independence assumption - Many successful applications, e.g., spam mail
filtering - A good candidate of a base learner in ensemble
learning - Apart from classification, naïve Bayes can do
more -
-
21Extra Slides
22Naïve Bayes (1)
- Revisit
- Which is equal to
- Naïve Bayes assumes conditional independency
- Then the inference of posterior is
23Naïve Bayes (2)
- Training Observation is multinomial Supervised,
with label information - Maximum Likelihood Estimation (MLE)
- Maximum a Posteriori (MAP) put Dirichlet prior
- Classification
24Naïve Bayes (3)
- What if we have continuous Xi?
- Generative training
- Prediction
25Naïve Bayes (4)
- Problems
- Features may overlapped
- Features may not be independent
- Size and weight of tiger
- Use a joint distribution estimation (P(XY),
P(Y)) to solve a conditional problem (P(YX x)) - Can we discriminatively train?
- Logistic regression
- Regularization
- Gradient ascent