Title: Nave Bayes Models for Probability Estimation
1Naïve Bayes Models for Probability Estimation
- Daniel Lowd
- January 20th, 2005
- (Joint work with Pedro Domingos)
2Outline
- Contributions
- Background
- General probability estimation
- Naïve Bayes and Bayesian networks
- Naïve Bayes Estimation (NBE)
- Experiments
- Methodology
- Results
- Conclusion and Future Work
3Contributions
- Reminder that naïve Bayes can solve more general
probabilistic problems. - NBE algorithm for training naïve Bayes models.
- Extensive empirical evaluation on 50 datasets.
4Outline
- Contributions
- Background
- General probability estimation
- Naïve Bayes and Bayesian networks
- Naïve Bayes Estimation (NBE)
- Experiments
- Methodology
- Results
- Conclusion and Future Work
5General purposeprobability estimation
- Want to efficiently
- Learn joint probability distribution from data
- Infer marginal and conditional distributions
- Applications
- Collaborative filtering
- Medical diagnosis
- Demographic analysis
6Bayesian networks
Family history
Smoking
Season
Cold
Lung cancer
Coughing
Sneezing
Runny nose
7Bayesian network inference
- Exact inference
- Good correct, deterministic
- Bad P complete
- Gibbs sampling
- Good simple, converges to true probabilities
- Bad slow, hard to detect convergence
- Belief propagation
- Good faster than Gibbs
- Bad no guarantees
8Naïve Bayes
- All other variables are independent, given
class/cluster variable. - Typically applied to classification
Spam?
Vi_at_gra
m0rtgage
quals
Bayes
Pedro
9Naïve Bayes clustering
C
Shrek
Toy Story
Kill Bill
Pulp Fiction
Etc
- Model can be learned from data using expectation
maximization (EM)
10Inference example
11Inference example
- Want to determine
- We know
12Inference example
- Want to determine
- We know
- Sum out C
13Naïve Bayes inference
- Marginal queries
- Conditional queries
- Running time
14Naïve Bayes for general probabilistic modeling
C
Smoking
Fam.History
Cancer
Coughing
Etc
15Related Work
- AutoClass naïve Bayes clustering
- (Cheeseman et al., 1988)
- Naïve Bayes clustering applied to collaborative
filtering - (Breese et al. 1998)
- Mixture of Trees efficient alternative to
Bayesian networks - (Meila and Jordan, 2000)
16Outline
- Contributions
- Background
- General probability estimation
- Naïve Bayes and Bayesian networks
- Naïve Bayes Estimation (NBE)
- Experiments
- Methodology
- Results
- Conclusion and Future Work
17Generic clustering algorithm
- Start with k randomly initialized clusters
- Iterate the EM algorithm
- Stop when hold-out data decreases in likelihood
- Running time
18The Expectation Maximization (EM) algorithm
- Method for learning with missing data
- Consists of two alternating steps
- E-step predict missing data, given model
- M-step adjust model, given complete data
- In clustering, the missing data is the cluster
variable, C - Analogous to k-means algorithm
19Naïve Bayes Estimation (NBE)
- Similar to generic clustering algorithm
- Improvements
- Use training examples to initialize clusters
- Keep adding clusters as we go
- Prune low-weight clusters
20Outline
- Contributions
- Background
- General probability estimation
- Naïve Bayes and Bayesian networks
- Naïve Bayes Estimation (NBE)
- Experiments
- Methodology
- Results
- Conclusion and Future Work
21Experiments
- Compare NBE to Bayesian networks (WinMine
Toolkit) - Metrics
- Learning time
- Accuracy (log likelihood)
- Query speed/accuracy
- Query types
- Marginal queries
- Conditional queries
22Datasets
- 47 from UCI machine learning repository
- 3 other real-world datasets EachMovie, Jester,
and KDDCup 2000 - Statistics
- 5 to 1,648 variables
- 57 to 67,507 examples
- 2 to 41 states per variable
- Processing
- Discretized continuous variables
- Created train/test/hold-out sets
- Cross-validated small datasets
23Learning time
24Overall accuracy
25Inference Details
- NBE Exact inference
- Bayesian networks
- Gibbs sampling 3 configurations
- 1 chain, 1,000 sampling iterations
- 10 chains, 1,000 sampling iterations per chain
- 10 chains, 10,000 sampling iterations per chain
- Belief propagation, when possible
26Marginal queries of up to 5 variables
- Examples
- Pr(Toy Story, Kill Bill)
- Pr(Coughing, Sneezing, RunnyNose)
- 1-5 query variables (e.g., Coughing)
- No evidence
- NBE exact inference vs. Gibbs
27Marginal query accuracy
28Marginal query speed
29Marginal query speed
2,200
30Marginal query speed
26,000
2,200
31Marginal query speed
580,000
26,000
2,200
32Marginal query speed
188,000,000
580,000
26,000
2,200
33Conditional queries of 1 variable
- Examples
-
-
- 1 query variable (e.g., Sneezing)
- 0-4 hidden variables
- All other variables are evidence (e.g.,
) - NBE exact inference vs. Gibbs sampling and belief
propagation
34Single variableconditional query accuracy
35Detailed accuracy comparison
36Single variableconditional query speed
37Single variableconditional query speed
55
38Single variableconditional query speed
5,200
55
39Single variableconditional query speed
5,200
420
55
40Single variableconditional query speed
200,000
5,200
420
55
41Conditional queries of up to 5 variables
- Examples
-
-
- 1-5 query variables
- All other variables are evidence
- NBE exact inference vs. Gibbs sampling
42Multiple variableconditional query accuracy
43Detailed accuracy 1 variable
44Detailed accuracy 5 variables
45Multiple variableconditional query speed
46Multiple variableconditional query speed
130,000
4,200
430
42
47Summary of results
- Marginal queries
- NBE at least as accurate as Gibbs sampling
- NBE thousands, even millions of times faster
- Conditional queries
- Easy for Gibbs few hidden variables
- NBE almost as accurate as Gibbs
- NBE still several orders of magnitude faster
- Belief propagation often failed or ran slowly
48Outline
- Contributions
- Background
- General probability estimation
- Naïve Bayes and Bayesian networks
- Naïve Bayes Estimation (NBE)
- Experiments
- Methodology
- Results
- Conclusion and Future Work
49Conclusion
- Compared to Bayesian networks, NBE offers
- Similar learning time
- Similar accuracy
- Exponentially faster inference
- Lessons
- Simple models are sometimes the best models.
- Even when simple models do worse, the simplicity
and speed may be worthwhile. - Naïve Bayes is good for a lot more than
classification and clustering.
50Contributions (yes, again)
- Reminder that naïve Bayes can solve more general
probabilistic problems. - NBE algorithm for training naïve Bayes models.
- Extensive empirical evaluation on 50 datasets.
51Future work
- Compare related mixture models, such as Mixture
of Trees (Meila Jordan, 2000). - Extend NBE to relational domains.
- Investigate specific applications of NBE.