Title: Bagging a Stacked Classifier
1Bagging a Stacked Classifier
- Lemmens A.,
- Joossens K.,
- and Croux C.
2Outline of the talk
- Combining different classifiers by stacking
- (Wolpert 1992, LeBlanc and Tibshirani 1996),
- We propose
- A new combination algorithm
- Optimal Stacking
- To apply bagging after stacking
- Bagged Optimal Stacking
3Stacked Classification
- The problem
- Training set of n observations
- The aim is to predict y for a new instance x
4Stacked Classification
- Several classification models exist
- Logistic regression
- Discriminant analysis
- Classification trees
- Neural Nets
- Support Vector Machine
-
- A solution is to combine
- Homogeneous classifiers e.g. Boosting
- (for a review, Hastie, Tibshirani and Friedman
2001) - Heterogeneous classifiers e.g. Stacking
Best ???
5Stacked Classification
- Stacking consists in combining K level-zero
classifiers
Level-zero
Level-one
6Stacked Classification
- Stacking consists in combining K level-zero
classifiers
Level-zero
Level-one
-i
-i
-i
Estimated by cross-validation
7How to combine classifiers?
- Optimal Weighted Average combination
- Find such that
- performs better than any level-zero
classifiers, - with
- and
8How to combine classifiers?
- Optimal Weighted Average combination
- Find such that
- performs better than any level-zero
classifiers, - with
- and
Cross-validated error rate on the training set
9How to combine classifiers?
- Finding by greedy algorithm
- Compute, by 10-fold cross validation, the scores
- Compute the cross-validated error rate to sort
the classifiers from the smallest to the highest
error rate . Set - Update cycle for
- find such that the error rate of the
combined classifier - is minimized.
- Set
- Iterate over different update cycles.
10Advantages of the Algorithm
- By construction, the cross-validated error rate
on the training data of the stacked classifier
outperforms any of its components. - Other optimization criteria can be chosen (ROC,
error rate, specificity, etc.). - Easy to implement.
- Remark other level-one classifiers exist.
11Results (1) Cross-Validated Error Rates
12Results (2) Test Error Rates
13Bagging the Stacked Classifiers
- Why Bagging (Breiman 1996) ?
- Bagging reduces the variance of the stacked
classifiers, - Takes profit from their instability to improve
predictive performance.
14Bagging Example 1Test Error Rate
15Bagging Example 2Test Error Rate
16Results (3) Test Error Rates
17Conclusions
- New optimal weighted combination algorithm for
stacking - Easy to implement,
- Optimizes any criterion of choice (ROC, error
rate, specificity, etc.), - By construction, systematic improvement of the
cross-validated error rate on the training data. - Bagging a stacked classifier
- Bagged optimal weighted average outperforms the
non-bagged version. Also holds for other
level-one classifiers.
18References
- Breiman, L. (1996), Bagging Predictors, Machine
Learning, 26, 123-140. - Hastie, T., Tibshirani, R. and Friedman, J.
(2001), The elements of statistical learning
data mining, inference, and prediction,
Springer-Verlag, New York. - LeBlanc, M. and Tibshirani, R. (1996), Combining
estimates in regression and classification,
Journal of the American Statistical Association,
91, 1641-1647. - Wolpert, D.H. (1992), Stacked generalization,
Neural Networks, 5, 241-259. - Proceedings, COMPSTAT 2004.
19(No Transcript)
20Results (2) Test Error Rates
21Results (3) Test Error Rates