Naive Bayes Classifiers, an Overview - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Naive Bayes Classifiers, an Overview

Description:

Naive Bayes Classifiers, an Overview By Roozmehr Safi What is Naive Bayes Classifier (NBC)? NBC is a probabilistic classification method. Classification (A.K.A ... – PowerPoint PPT presentation

Number of Views:361
Avg rating:3.0/5.0
Slides: 13
Provided by: me77118
Learn more at: https://askit.ttu.edu
Category:

less

Transcript and Presenter's Notes

Title: Naive Bayes Classifiers, an Overview


1
Naive Bayes Classifiers,an Overview
  • By Roozmehr Safi

2
What is Naive Bayes Classifier (NBC)?
  • NBC is a probabilistic classification method.
  • Classification (A.K.A. discrimination, or
    supervised learning) is assigning new cases to
    one of a the pre-defined classes given a sample
    of cases for which the true classes are known.
  • NBC is one of the oldest and simplest
    classification methods.

3
Some NBC Applications
  • Credit scoring
  • Marketing applications
  • Employee selection
  • Image processing
  • Speech recognition
  • Search engines

4
How does NBC Work?
  • NBC applies Bayes theorem with (naive)
    independence assumptions.
  • A more descriptive term for it would be
    "independent feature model".

5
How does NBC work, Cntd.
  • Let X1,, Xm denote our features (Height, weight,
    foot size), Y is the class number (1 for men,2
    for women), and C is the number of classes (2).
    The problem consists of classifying the case
    (x1,, xm) to the class c maximizing P(Yc
    X1x1,, Xmxm) over c1,, C. Applying Bayes
    rule gives
  • P(Yc X1x1,, Xmxm) P(X1x1,, Xmxm
    Yc)P(Yc) / P(X1x1,, Xmxm)
  • .
  • Under the NBs assumption of conditional
    independence, P(X1x1,, Xmxm Yc) is replaced
    by
  • And NB reduces the original problem to
  • .

6
An example
  • P(Obserevd HeightMale) a
  • P(Observed WeightMale) b
  • P(Observed Foot sizeMale) c
  • P(Maleobserved case) P(male) a b C
  • P(Observed HeightFemale) d
  • P(Observed WeightFemale) e
  • P(Observed Foot sizeFemale) f
  • P(Femaleobserved case) P(Female) d e f
  • Pick the one that is larger

7
NBC advantage
  • Despite unrealistic assumption of independence,
    NBC is remarkably successful even when
    independence is violated.
  • Due to its simple structure the NBC it is
    appealing when the set of variables is large.
  • NBC requires a small amount of training data
  • It only needs to estimate means and variances of
    the variables
  • No need to form the covariance matrix.
  • Computationally inexpensive.

8
A Demonstration
  • Data From an online B2B exchange (1220 cases).
  • Purpose To distinguish cheaters of good sellers.
  • Predictors
  • Member type Enterprise, personal, other
  • Years since joined 1 to 10 years.
  • No. of months since last membership renewal
  • Membership Renewal duration.
  • Type of service bought standard, limited
    edition
  • If the member has a registered company.
  • If the company page is decorated.
  • Number of days in which member logged in during
    past 60 days.
  • Industry production, distribution, investment
  • Target to predict if a seller is likely to cheat
    buyers based on data from old sellers.

9
Issues involved Prob. distribution
  • With discrete (categorical) features, estimating
    the probabilities can be done using frequency
    counts.
  • With continuous features one can assume a certain
    form of quantitative probability distribution.
  • There is evidence that discretization of data
    before applying NB is effective.
  • Equal Frequency Discretization (EFD) divides the
    sorted values of a continuous variable into k
    equally populated bins.

10
Issues involved Zero probabilities
  • The case when a class and a feature value never
    occur together in the training set creates a
    problem, because assigning a probability of zero
    to one of the terms causes the whole expression
    to evaluate to zero.
  • The zero probability can be replaced by a small
    constant, such as 0.5/n where n is the number of
    observations in the training set.

11
Issues involved Missing values
  • In some applications, values are missing not at
    random and can be meaningful. Therefore, missing
    values are treated as a separate category.
  • If one does not want to treat missing values as a
    separate category, they should be handled prior
    to applying this macro with either a missing
    value imputation or excluding cases where they
    are present.

12
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com