Na - PowerPoint PPT Presentation

About This Presentation
Title:

Na

Description:

Na ve Bayes Classifier Adopted from s by Ke Chen from University of Manchester and YangQiu Song from MSRA * * * * * For a class, the previous generative model ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 26
Provided by: KeC2
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: Na


1
Naïve Bayes Classifier
  • Adopted from slides by Ke Chen from University of
    Manchester and YangQiu Song from MSRA

2
Generative vs. Discriminative Classifiers
  • Training classifiers involves estimating f X ?
    Y, or P(YX)
  • Discriminative classifiers (also called
    informative by RubinsteinHastie)
  • Assume some functional form for P(YX)
  • Estimate parameters of P(YX) directly from
    training data
  • Generative classifiers
  • Assume some functional form for P(XY), P(X)
  • Estimate parameters of P(XY), P(X) directly from
    training data
  • Use Bayes rule to calculate P(YX xi)

3
Bayes Formula
4
Generative Model
  • Color
  • Size
  • Texture
  • Weight

5
Discriminative Model
  • Logistic Regression
  • Color
  • Size
  • Texture
  • Weight

6
Comparison
  • Generative models
  • Assume some functional form for P(XY), P(Y)
  • Estimate parameters of P(XY), P(Y) directly from
    training data
  • Use Bayes rule to calculate P(YX x)
  • Discriminative models
  • Directly assume some functional form for P(YX)
  • Estimate parameters of P(YX) directly from
    training data

7
Probability Basics
  • Prior, conditional and joint probability for
    random variables
  • Prior probability
  • Conditional probability
  • Joint probability
  • Relationship
  • Independence
  • Bayesian Rule

8
Probability Basics
  • Quiz We have two six-sided dice. When they are
    tolled, it could end up with the following
    occurance (A) dice 1 lands on side 3, (B) dice
    2 lands on side 1, and (C) Two dice sum to
    eight. Answer the following questions

9
Probabilistic Classification
  • Establishing a probabilistic model for
    classification
  • Discriminative model

10
Probabilistic Classification
  • Establishing a probabilistic model for
    classification (cont.)
  • Generative model

11
Probabilistic Classification
  • MAP classification rule
  • MAP Maximum A Posterior
  • Assign x to c if
  • Generative classification with the MAP rule
  • Apply Bayesian rule to convert them into
    posterior probabilities
  • Then apply the MAP rule

12
Naïve Bayes
  • Bayes classification
  • Difficulty learning the joint probability
  • Naïve Bayes classification
  • Assumption that all input attributes are
    conditionally independent!
  • MAP classification rule for

13
Naïve Bayes
  • Naïve Bayes Algorithm (for discrete input
    attributes)
  • Learning Phase Given a training set S,
  • Output conditional probability tables for
    elements
  • Test Phase Given an unknown instance
    ,
  • Look up tables to assign the label c to X
    if

14
Example
  • Example Play Tennis

15
Example
  • Learning Phase

Outlook PlayYes PlayNo
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature PlayYes PlayNo
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity PlayYes PlayNo
High 3/9 4/5
Normal 6/9 1/5
Wind PlayYes PlayNo
Strong 3/9 3/5
Weak 6/9 2/5
P(PlayYes) 9/14
P(PlayNo) 5/14
16
Example
  • Test Phase
  • Given a new instance,
  • x(OutlookSunny, TemperatureCool,
    HumidityHigh, WindStrong)
  • Look up tables
  • MAP rule

P(OutlookSunnyPlayNo) 3/5 P(TemperatureCool
PlayNo) 1/5 P(HuminityHighPlayNo)
4/5 P(WindStrongPlayNo) 3/5 P(PlayNo) 5/14
P(OutlookSunnyPlayYes) 2/9 P(TemperatureCool
PlayYes) 3/9 P(HuminityHighPlayYes)
3/9 P(WindStrongPlayYes) 3/9 P(PlayYes)
9/14
P(Yesx) P(SunnyYes)P(CoolYes)P(HighYes)P(St
rongYes)P(PlayYes) 0.0053 P(Nox)
P(SunnyNo) P(CoolNo)P(HighNo)P(StrongNo)P(Pl
ayNo) 0.0206 Given the fact
P(Yesx) lt P(Nox), we label x to be No.
17
Example
  • Test Phase
  • Given a new instance,
  • x(OutlookSunny, TemperatureCool,
    HumidityHigh, WindStrong)
  • Look up tables
  • MAP rule

P(OutlookSunnyPlayNo) 3/5 P(TemperatureCool
PlayNo) 1/5 P(HuminityHighPlayNo)
4/5 P(WindStrongPlayNo) 3/5 P(PlayNo) 5/14
P(OutlookSunnyPlayYes) 2/9 P(TemperatureCool
PlayYes) 3/9 P(HuminityHighPlayYes)
3/9 P(WindStrongPlayYes) 3/9 P(PlayYes)
9/14
P(Yesx) P(SunnyYes)P(CoolYes)P(HighYes)P(St
rongYes)P(PlayYes) 0.0053 P(Nox)
P(SunnyNo) P(CoolNo)P(HighNo)P(StrongNo)P(Pl
ayNo) 0.0206 Given the fact
P(Yesx) lt P(Nox), we label x to be No.
18
Relevant Issues
  • Violation of Independence Assumption
  • For many real world tasks,
  • Nevertheless, naïve Bayes works surprisingly well
    anyway!
  • Zero conditional probability Problem
  • If no example contains the attribute value
  • In this circumstance,
    during test
  • For a remedy, conditional probabilities estimated
    with

19
Relevant Issues
  • Continuous-valued Input Attributes
  • Numberless values for an attribute
  • Conditional probability modeled with the normal
    distribution
  • Learning Phase
  • Output normal distributions and
  • Test Phase
  • Calculate conditional probabilities with all the
    normal distributions
  • Apply the MAP rule to make a decision

20
Conclusions
  • Naïve Bayes based on the independence assumption
  • Training is very easy and fast just requiring
    considering each attribute in each class
    separately
  • Test is straightforward just looking up tables
    or calculating conditional probabilities with
    normal distributions
  • A popular generative model
  • Performance competitive to most of
    state-of-the-art classifiers even in presence of
    violating independence assumption
  • Many successful applications, e.g., spam mail
    filtering
  • A good candidate of a base learner in ensemble
    learning
  • Apart from classification, naïve Bayes can do
    more

21
Extra Slides
22
Naïve Bayes (1)
  • Revisit
  • Which is equal to
  • Naïve Bayes assumes conditional independency
  • Then the inference of posterior is

23
Naïve Bayes (2)
  • Training Observation is multinomial Supervised,
    with label information
  • Maximum Likelihood Estimation (MLE)
  • Maximum a Posteriori (MAP) put Dirichlet prior
  • Classification

24
Naïve Bayes (3)
  • What if we have continuous Xi?
  • Generative training
  • Prediction

25
Naïve Bayes (4)
  • Problems
  • Features may overlapped
  • Features may not be independent
  • Size and weight of tiger
  • Use a joint distribution estimation (P(XY),
    P(Y)) to solve a conditional problem (P(YX x))
  • Can we discriminatively train?
  • Logistic regression
  • Regularization
  • Gradient ascent
Write a Comment
User Comments (0)
About PowerShow.com