Improving the Reliability of Decision Trees and Na - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Improving the Reliability of Decision Trees and Na

Description:

Aim: To assess the improvement of the quality of probability ... Bagging. Boosting. Find Best Weights (FBW) Tested other base learners: SVM. Neural Networks ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 12
Provided by: dav5178
Category:

less

Transcript and Presenter's Notes

Title: Improving the Reliability of Decision Trees and Na


1
Improving the Reliability of Decision Trees and
Naïve Bayes Learners
ICDM Nov 2004
  • David Lindsay and Siân Cox
  • Computer Learning Research Centre,
  • Royal Holloway University of London,
  • Egham, Surrey, UK
  • PhD Supervisors Alex Gammerman and Volodya Vovk

2
Motivation and Outline
  • Aim To assess the improvement of the quality of
    probability forecasts by the VPM meta-learner
    applied to Naïve Bayes and C4.5 base-learners.
  • Define problem of probability forecasting.
  • Introduce resolution and reliability.
  • Detail methods we used for assessing reliability
    focus on ERC plots.
  • Summarise results.
  • Discuss conclusions and future work.

3
Reliable Probability Forecasts
  • Pattern recognition ? Probability Forecasting
  • Good quality forecasts should be
  • Resolute forecasts useful for ranking labels in
    order of likelihood of occurring
  • Reliable do not lie, labels assigned with
    forecast p should occur with frequency p
    (a.k.a. ) calibrated

Predicted Probability
We focus on Reliability!
We focus on Reliability!
Empirical frequency
4
Methods for Assessing Quality of Probability
Forecasts
Assessment method What does it test?
ROC (Area Under Curve) FP/TP rates, Resolution
Loss Functions (square loss and log loss) Mixture of Reliability and Resolution
Empirical Reliability Curve (ERC) Under- and over- estimation, Reliability
5
Empirical Reliability Curves (ERC)
ERC Plots Generated from Naïve Bayes learners
forecasts on Abdominal Pain Medical Dataset
Naïve Bayes
VPM Naïve Bayes
ERC Dev. Area 0.153
ERC Dev. Area 0.006
6
Learners Tested
  • Naïve Bayes and C4.5 Decision Tree
  • Meta-learners applied
  • Binning
  • Venn Probability Machine (VPM)
  • Laplace (just to C4.5)
  • Tested on multi-class and binary data from UCI
    repository

7
Results
  • Notice that VPM C4.5 error rate slightly more
    than Binning C4.5 0.1
  • But VPM performs best in improving reliability!
  • And VPM improves resolution ? overall loss is
    improved

Learner Error ERC Dev. ROC Area Sqr Loss Log Loss
C4.5 15.2 0.05 0.61 0.24 1.26
C4.5 Laplace 15.2 0.03 0.70 0.23 0.76
Binned C4.5 14.6 0.02 0.71 0.22 0.75
VPM C4.5 14.7 0.006 0.77 0.20 0.69
8
Conclusions
  • VPM is slow, but gives good improvement in
    reliability and resolution.
  • ERC nice visualisation and measure of reliability
    individually.
  • Reliability should not be overlooked,
    classification accuracy is not always useful to
    look at!

9
Current and Future Work
  • Have tested much larger set of meta-learners
  • ERC re-calibration
  • Bagging
  • Boosting
  • Find Best Weights (FBW)
  • Tested other base learners
  • SVM
  • Neural Networks
  • K-Nearest Neighbours
  • Bayesian Belief Networks
  • Developed extension to WEKA

10
Bibliography
  • A. P. Dawid. Calibration-based empirical
    probability (with discussion). Annals of
    Statistics, 131251-1285, 1985
  • V. Vovk, G. Shafer, and I. Nouretdinov.
    Self-calibrating probability forecasting. In
    Advances in Neural Information Processing Systems
    16, 2003
  • D. Lindsay. Visualising and improving reliability
    a machine learning perspective. CLRC-TR-04-01,
    Royal Holloway University, England, 2004
  • A. H. Murphy. A new vector parturition of the
    probability score. Journal of Applied
    Meteorology, 12595-600 1973

11
VPM Compared With Underlying Naïve Bayes Learner
Patient Number Probability Forecasts For Each Class Probability Forecasts For Each Class Probability Forecasts For Each Class Probability Forecasts For Each Class
Patient Number Normal Benign Malignant Metastases
Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts Naïve Bayes Forecasts
1653 8.2e-3 0.99 4.2e-3 1.38e-3
2490 0.06 0.77 0.17 2.2e-4
5831 0.93 0.02 0.03 0.02
VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts VPM Naïve Bayes Forecasts
1653 0.14 0.73 0.07 0.06
2490 0.16 0.14 0.14 0.56
5831 0.51 1.0e-3 0.46 0.029
Key Predcited underlined, Actual class
Write a Comment
User Comments (0)
About PowerShow.com