Nave Bayes Classifier for Text Classification - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Nave Bayes Classifier for Text Classification

Description:

A test dataset of 1000 apples, 200 are originated from Japan, 300 ... The decision rule: 'maximum a posteriori' probability (MAP) arg max P(Ci|F1, F2,...,Fn) ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 13
Provided by: infor257
Category:

less

Transcript and Presenter's Notes

Title: Nave Bayes Classifier for Text Classification


1
Naïve Bayes Classifier for Text Classification
  • ZhuoRan Chen
  • 2006-4-3

2
Table of Content
  • Background
  • Methods
  • Evaluations
  • Conclusion

3
Naïve Bayes Assumption
  • The assumption features are independent.
  • P(Fi C, Fj) P (Fi C), where Fi, Fj stand
    for any two features and C is a class variable.
  • Theoretically oversimplified, but practically
    works well
  • Parameter estimation is easy maximal likelihood

4
The Framework of Naïve Bayes Classifiers
  • P(CiF1, F2,,Fn) P(F1, F2, Fn) P(Ci)
    P(F1, F2,FnCi)
  • P(CiF1, F2,,Fn) P(Ci) P(F1, F2,FnCi)
    1/P(F1, F2, Fn)
  • P(CiF1, F2,,Fn) P(Ci) P(F1, F2,FnCi)
    lambda
  • P(CiF1, F2,,Fn) lambda P(Ci) P(F1Ci)
    P(F2,FnCi, F1)
  • P(CiF1, F2,,Fn) P(Ci) P(F1Ci)
    P(F2,FnCi)
  • P(CiF1, F2,,Fn) P(Ci) P(F1Ci)
    P(F2Ci)P(F3Ci)P(FnCi)
  • P(Ci) prior
  • Eg. A test dataset of 1000 apples, 200 are
    originated from Japan, 300 from US, 500 from
    China? P(Cjapan) 0.2, P(Cus) 0.3, P(Cchina)
    0.5
  • P(FiCi) class-conditional probability
    distribution (CPD)
  • P(FiCi) P(Fi, Ci)/P(Ci)
  • Eg. F1 big, small , F2 red, yellow,
    green, out of 200 Japan apples, 80 are big, 120
    are small among 300 US apples, 200 are red, 50
    are yellow, etc
  • ? P(F1bigCJapan) (80/1000) / (200/1000)
    0.4, P(F1smallCJapan) 0.6
  • P(F2redCUS) 200/300 0.67,
    P(F2yellowCUS) 50/300 0.17

5
The Framework of Naïve Bayes Classifiers (cont)
  • The decision rule maximum a posteriori
    probability (MAP)
  • arg max P(CiF1, F2,,Fn)
  • i
  • Eg given a big red apple, which may be its
    original country?
  • P(CjapanF1 big, F2red)
  • P(Cjapan) P(F1bigCjapan)P(F2redCjapan)
    0.20.4 0.04
  • P(CusF1 big, F2red)
  • P(Cus) P(F1bigCus)P(F2redCus) 0.3
    0.67 0.12
  • P(CchinaF1 big, F2red) 0.08
  • ? Its originated from US.
  • Note
  • Fi could be a continuous variable (eg. degree of
    sweetness)
  • P(FiCi) can be any distribution.

6
The multi-variable Bernoulli model
  • Every feature is a binary variable big/small,
    red/not-red, presense/absence
  • Does not capture the number of occurances
  • number of parameters to train is small

7
The multinomial model
  • The frequency of word matters
  • bag of words assumption position in the
    document not utilized
  • P(FiCi) multinomial distribution
  • P(Ci) the same as Bernoulli model

8
Feature Selection
  • Goal reduce the vocabulary size
  • Method mutual information
  • Optimal of features depends on the dataset

9
Empirical Evaluations
  • Task text classification
  • Datasets webpages newsgroup newswire articles
  • Criteria recall/precision
  • precision-recall breakeven point

10
Results
  • Performance depends on the dataset and of
    features
  • In most cases, multinomial model is better than
    Bernoulli model
  • For Bernoulli model, performance usually
    downgrades as of features increases.
  • For some datasets, 100 features works best for
    all datasets, of features large than 10,000
    wont help

11
Conclusions
  • Naïve Bayes Classifier works well
  • No model and parameters are optimal for all
    situations
  • Feature selection can be critical

12
THE END
  • Discussion?
Write a Comment
User Comments (0)
About PowerShow.com