Improving the Reliability of Decision Trees and Na - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Improving the Reliability of Decision Trees and Na

Description:

Aim: To assess the improvement of the quality of probability ... Bagging. Boosting. Find Best Weights (FBW) Tested other base learners: SVM. Neural Networks ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 12

Provided by: dav5178

Category:

more less

Transcript and Presenter's Notes

Title: Improving the Reliability of Decision Trees and Na

1
Improving the Reliability of Decision Trees and
Naïve Bayes Learners
ICDM Nov 2004

David Lindsay and Siân Cox
Computer Learning Research Centre,
Royal Holloway University of London,
Egham, Surrey, UK
PhD Supervisors Alex Gammerman and Volodya Vovk

2
Motivation and Outline

Aim To assess the improvement of the quality of
probability forecasts by the VPM meta-learner
applied to Naïve Bayes and C4.5 base-learners.
Define problem of probability forecasting.
Introduce resolution and reliability.
Detail methods we used for assessing reliability
focus on ERC plots.
Summarise results.
Discuss conclusions and future work.

3
Reliable Probability Forecasts

Pattern recognition ? Probability Forecasting
Good quality forecasts should be
Resolute forecasts useful for ranking labels in
order of likelihood of occurring
Reliable do not lie, labels assigned with
forecast p should occur with frequency p
(a.k.a. ) calibrated

Predicted Probability
We focus on Reliability!
We focus on Reliability!
Empirical frequency
4
Methods for Assessing Quality of Probability
Forecasts
Assessment method What does it test?
ROC (Area Under Curve) FP/TP rates, Resolution
Loss Functions (square loss and log loss) Mixture of Reliability and Resolution
Empirical Reliability Curve (ERC) Under- and over- estimation, Reliability
5
Empirical Reliability Curves (ERC)
ERC Plots Generated from Naïve Bayes learners
forecasts on Abdominal Pain Medical Dataset
Naïve Bayes
VPM Naïve Bayes
ERC Dev. Area 0.153
ERC Dev. Area 0.006
6
Learners Tested

Naïve Bayes and C4.5 Decision Tree
Meta-learners applied
Binning
Venn Probability Machine (VPM)
Laplace (just to C4.5)
Tested on multi-class and binary data from UCI
repository

7
Results

Notice that VPM C4.5 error rate slightly more
than Binning C4.5 0.1
But VPM performs best in improving reliability!
And VPM improves resolution ? overall loss is
improved

Learner Error ERC Dev. ROC Area Sqr Loss Log Loss
C4.5 15.2 0.05 0.61 0.24 1.26
C4.5 Laplace 15.2 0.03 0.70 0.23 0.76
Binned C4.5 14.6 0.02 0.71 0.22 0.75
VPM C4.5 14.7 0.006 0.77 0.20 0.69
8
Conclusions

VPM is slow, but gives good improvement in
reliability and resolution.
ERC nice visualisation and measure of reliability
individually.
Reliability should not be overlooked,
classification accuracy is not always useful to
look at!

9
Current and Future Work

Have tested much larger set of meta-learners
ERC re-calibration
Bagging
Boosting
Find Best Weights (FBW)
Tested other base learners
SVM
Neural Networks
K-Nearest Neighbours
Bayesian Belief Networks
Developed extension to WEKA

10
Bibliography

A. P. Dawid. Calibration-based empirical
probability (with discussion). Annals of
Statistics, 131251-1285, 1985
V. Vovk, G. Shafer, and I. Nouretdinov.
Self-calibrating probability forecasting. In
Advances in Neural Information Processing Systems
16, 2003
D. Lindsay. Visualising and improving reliability
a machine learning perspective. CLRC-TR-04-01,
Royal Holloway University, England, 2004
A. H. Murphy. A new vector parturition of the
probability score. Journal of Applied
Meteorology, 12595-600 1973

11
VPM Compared With Underlying Naïve Bayes Learner
Patient Number Probability Forecasts For Each Class Probability Forecasts For Each Class Probability Forecasts For Each Class Probability Forecasts For Each Class
Patient Number Normal Benign Malignant Metastases
Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts
1653 8.2e-3 0.99 4.2e-3 1.38e-3
2490 0.06 0.77 0.17 2.2e-4
5831 0.93 0.02 0.03 0.02
VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts
1653 0.14 0.73 0.07 0.06
2490 0.16 0.14 0.14 0.56
5831 0.51 1.0e-3 0.46 0.029
Key Predcited underlined, Actual class

Write a Comment

User Comments (0)