Nave Bayes Models for Probability Estimation - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Nave Bayes Models for Probability Estimation

Description:

Reminder that na ve Bayes can solve more general probabilistic ... Shrek. Toy Story. Kill Bill. Pulp Fiction. Etc... Want to determine: 11. Inference example ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 52
Provided by: Danie264
Category:

less

Transcript and Presenter's Notes

Title: Nave Bayes Models for Probability Estimation


1
Naïve Bayes Models for Probability Estimation
  • Daniel Lowd
  • January 20th, 2005
  • (Joint work with Pedro Domingos)

2
Outline
  • Contributions
  • Background
  • General probability estimation
  • Naïve Bayes and Bayesian networks
  • Naïve Bayes Estimation (NBE)
  • Experiments
  • Methodology
  • Results
  • Conclusion and Future Work

3
Contributions
  • Reminder that naïve Bayes can solve more general
    probabilistic problems.
  • NBE algorithm for training naïve Bayes models.
  • Extensive empirical evaluation on 50 datasets.

4
Outline
  • Contributions
  • Background
  • General probability estimation
  • Naïve Bayes and Bayesian networks
  • Naïve Bayes Estimation (NBE)
  • Experiments
  • Methodology
  • Results
  • Conclusion and Future Work

5
General purposeprobability estimation
  • Want to efficiently
  • Learn joint probability distribution from data
  • Infer marginal and conditional distributions
  • Applications
  • Collaborative filtering
  • Medical diagnosis
  • Demographic analysis

6
Bayesian networks
Family history
Smoking
Season
Cold
Lung cancer
Coughing
Sneezing
Runny nose
7
Bayesian network inference
  • Exact inference
  • Good correct, deterministic
  • Bad P complete
  • Gibbs sampling
  • Good simple, converges to true probabilities
  • Bad slow, hard to detect convergence
  • Belief propagation
  • Good faster than Gibbs
  • Bad no guarantees

8
Naïve Bayes
  • All other variables are independent, given
    class/cluster variable.
  • Typically applied to classification

Spam?
Vi_at_gra
m0rtgage
quals
Bayes
Pedro
9
Naïve Bayes clustering
C
Shrek
Toy Story
Kill Bill
Pulp Fiction
Etc
  • Model can be learned from data using expectation
    maximization (EM)

10
Inference example
  • Want to determine

11
Inference example
  • Want to determine
  • We know

12
Inference example
  • Want to determine
  • We know
  • Sum out C

13
Naïve Bayes inference
  • Marginal queries
  • Conditional queries
  • Running time

14
Naïve Bayes for general probabilistic modeling
C
Smoking
Fam.History
Cancer
Coughing
Etc
15
Related Work
  • AutoClass naïve Bayes clustering
  • (Cheeseman et al., 1988)
  • Naïve Bayes clustering applied to collaborative
    filtering
  • (Breese et al. 1998)
  • Mixture of Trees efficient alternative to
    Bayesian networks
  • (Meila and Jordan, 2000)

16
Outline
  • Contributions
  • Background
  • General probability estimation
  • Naïve Bayes and Bayesian networks
  • Naïve Bayes Estimation (NBE)
  • Experiments
  • Methodology
  • Results
  • Conclusion and Future Work

17
Generic clustering algorithm
  • Start with k randomly initialized clusters
  • Iterate the EM algorithm
  • Stop when hold-out data decreases in likelihood
  • Running time

18
The Expectation Maximization (EM) algorithm
  • Method for learning with missing data
  • Consists of two alternating steps
  • E-step predict missing data, given model
  • M-step adjust model, given complete data
  • In clustering, the missing data is the cluster
    variable, C
  • Analogous to k-means algorithm

19
Naïve Bayes Estimation (NBE)
  • Similar to generic clustering algorithm
  • Improvements
  • Use training examples to initialize clusters
  • Keep adding clusters as we go
  • Prune low-weight clusters

20
Outline
  • Contributions
  • Background
  • General probability estimation
  • Naïve Bayes and Bayesian networks
  • Naïve Bayes Estimation (NBE)
  • Experiments
  • Methodology
  • Results
  • Conclusion and Future Work

21
Experiments
  • Compare NBE to Bayesian networks (WinMine
    Toolkit)
  • Metrics
  • Learning time
  • Accuracy (log likelihood)
  • Query speed/accuracy
  • Query types
  • Marginal queries
  • Conditional queries

22
Datasets
  • 47 from UCI machine learning repository
  • 3 other real-world datasets EachMovie, Jester,
    and KDDCup 2000
  • Statistics
  • 5 to 1,648 variables
  • 57 to 67,507 examples
  • 2 to 41 states per variable
  • Processing
  • Discretized continuous variables
  • Created train/test/hold-out sets
  • Cross-validated small datasets

23
Learning time
24
Overall accuracy
25
Inference Details
  • NBE Exact inference
  • Bayesian networks
  • Gibbs sampling 3 configurations
  • 1 chain, 1,000 sampling iterations
  • 10 chains, 1,000 sampling iterations per chain
  • 10 chains, 10,000 sampling iterations per chain
  • Belief propagation, when possible

26
Marginal queries of up to 5 variables
  • Examples
  • Pr(Toy Story, Kill Bill)
  • Pr(Coughing, Sneezing, RunnyNose)
  • 1-5 query variables (e.g., Coughing)
  • No evidence
  • NBE exact inference vs. Gibbs

27
Marginal query accuracy
28
Marginal query speed
29
Marginal query speed
2,200
30
Marginal query speed
26,000
2,200
31
Marginal query speed
580,000
26,000
2,200
32
Marginal query speed
188,000,000
580,000
26,000
2,200
33
Conditional queries of 1 variable
  • Examples
  • 1 query variable (e.g., Sneezing)
  • 0-4 hidden variables
  • All other variables are evidence (e.g.,
    )
  • NBE exact inference vs. Gibbs sampling and belief
    propagation

34
Single variableconditional query accuracy
35
Detailed accuracy comparison
36
Single variableconditional query speed
37
Single variableconditional query speed
55
38
Single variableconditional query speed
5,200
55
39
Single variableconditional query speed
5,200
420
55
40
Single variableconditional query speed
200,000
5,200
420
55
41
Conditional queries of up to 5 variables
  • Examples
  • 1-5 query variables
  • All other variables are evidence
  • NBE exact inference vs. Gibbs sampling

42
Multiple variableconditional query accuracy
43
Detailed accuracy 1 variable
44
Detailed accuracy 5 variables
45
Multiple variableconditional query speed
46
Multiple variableconditional query speed
130,000
4,200
430
42
47
Summary of results
  • Marginal queries
  • NBE at least as accurate as Gibbs sampling
  • NBE thousands, even millions of times faster
  • Conditional queries
  • Easy for Gibbs few hidden variables
  • NBE almost as accurate as Gibbs
  • NBE still several orders of magnitude faster
  • Belief propagation often failed or ran slowly

48
Outline
  • Contributions
  • Background
  • General probability estimation
  • Naïve Bayes and Bayesian networks
  • Naïve Bayes Estimation (NBE)
  • Experiments
  • Methodology
  • Results
  • Conclusion and Future Work

49
Conclusion
  • Compared to Bayesian networks, NBE offers
  • Similar learning time
  • Similar accuracy
  • Exponentially faster inference
  • Lessons
  • Simple models are sometimes the best models.
  • Even when simple models do worse, the simplicity
    and speed may be worthwhile.
  • Naïve Bayes is good for a lot more than
    classification and clustering.

50
Contributions (yes, again)
  • Reminder that naïve Bayes can solve more general
    probabilistic problems.
  • NBE algorithm for training naïve Bayes models.
  • Extensive empirical evaluation on 50 datasets.

51
Future work
  • Compare related mixture models, such as Mixture
    of Trees (Meila Jordan, 2000).
  • Extend NBE to relational domains.
  • Investigate specific applications of NBE.
Write a Comment
User Comments (0)
About PowerShow.com