Title: Diversity in Random Subspacing Ensembles
1Diversity in Random Subspacing Ensembles
DaWaK2004 Zaragoza, Spain September 1-3, 2004
- Alexey Tsymbal, Padraig Cunningham
- Department of Computer ScienceTrinity College
DublinIrelandMykola PechenizkiyDepartment of
Computer ScienceUniversity of Jyväskylä Finland
2Contents
- Introduction the task of classification
- Introduction to ensembles
- Ensemble feature selection and random subspacing
- Integration methods for ensembles
- Measures of diversity in classification ensembles
- Experimental results correlation between the
diversities and improvement due to ensembles - Conclusions and future work
3The task of classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases - heart
attack prediction, etc.
Class Membership of the new instance
4What is ensemble learning?
- Ensemble learning refers to a collection of
methods that learn a target function by training
a number of individual learners and combining
their predictions
5Ensemble learning
6Why ensemble learning?
- Accuracy a more reliable mapping can be
obtained by combining the output of multiple
experts - Efficiency a complex problem can be decomposed
into multiple sub-problems that are easier to
understand and solve (divide-and-conquer
approach). Mixture of experts, ensemble feature
selection. - There is no single model that works for all
pattern recognition problems! (no free lunch
theorem)
To solve really hard problems, well have to use
several different representations. It is time
to stop arguing over which type of
pattern-classification technique is best.
Instead we should work at a higher level of
organization and discover how to build managerial
systems to exploit the different virtues abd
evade the different limitations of each of these
ways of comparing things. Minsky, 1991.
7When ensemble learning?
- When you can build base classifiers that are more
accurate than chance (accuracy), and, more
importantly, - that are as much as possible independent from
each other (diversity)
8Why do ensembles work? 1/2
- The desired target function may not be
implementable with individual classifiers, but
may be approximated by ensemble averaging - Assume you want to build a decision boundary
with decision trees - The decision boundaries of decision trees are
hyperplanes parallel to the coordinate axes, as
in the figures - By averaging a large number of such staircases,
the diagonal decision boundary can be
approximated with arbitrarily good accuracy
Class 1
Class 1
Class 2
Class 2
a
b
9Why do ensembles work? 2/2
- Theoretical results by Hansen Solomon (1990)
- If we can assume that classifiers are independent
in predictions and their accuracy gt 50, we can
push accuracy arbitrarily high by combining more
classifiers - Key assumption
- classifiers are independent in their predictions
- not a very reasonable assumption
- more realistic for data points where classifiers
predict with gt50 accuracy, we can push accuracy
arbitrarily high (some data points are just too
difficult)
10How to make an effective ensemble?
- Two basic decisions when designing ensembles
- How to generate the base classifiers?
- How to integrate them?
11Methods for generating the base classifiers
- Subsampling the training examples
- Manipulating the input features
- Manipulating the output targets
- Modifying the learning parameters of the
classifier - Using heterogeneous models (not often used)
12Ensemble Feature Selection and RSM
- How to prepare inputs for the generation of the
base classifiers ? - Sample the training set
- Manipulate input features
- Manipulate output target (class values)
- Goal of traditional feature selection
- find and remove features that are unhelpful or
destructive to learning making one feature subset
for single classifier - Goal of ensemble feature selection
- find and remove features that are unhelpful or
destructive to learning making different feature
subsets for a number of classifiers - find feature subsets that will promote
disagreement between the classifiers - Random Subspace Method (RSM)
- Accuracy of ensemble members is compensated for
by their diversity
13Integration of classifiers
Integration
Selection
Combination
Dynamic Voting with Selection (DVS)
Static
Dynamic
Dynamic
Static
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting (DV)
- Motivation for the Dynamic Integration
- The main assumption is that each classifier is
the best in some sub-areas of the whole data set,
where its local error is comparatively less than
the corresponding errors of the other classifiers.
14The space model motivation for dynamic
integration
- Information about classifiers errors on training
instances can be used for learning just as
original instances are used for learning.
Motivation for the Dynamic Integration The
main assumption is that each classifier is the
best in some sub-areas of the whole data set,
where its local error is comparatively less than
the corresponding errors of the other classifiers.
15Dynamic integration of classifiers an example
- 3 base classifiers
- 2 features X1 and X2
X
2
(000)
(100)
(000)
(000)
NN
(010)
2
NN
3
d
d
2
3
(000)
(000)
P
d
NN
1
1
(0.30.60)
(000)
d
max
(110)
(000)
(010)
(001)
X
1
16Ensembles the need for diversity
- Overall error depends on average error of
ensemble members - Increasing ambiguity decreases overall error
- Provided it does not result in an increase in
average error - (Krogh and Vedelsby, 1995)
17Measuring ensemble diversity 1/4
- The ensemble ambiguity in regression is measured
as the weighted average of the squared
differences in the predictions of the base
networks and the ensemble (regression case) - The case of classification 1) plain
disagreement, and 2) fail/non-fail disagreement
18Measuring ensemble diversity 2/4
- The case of classification 3) double fault, and
4) Q statistic
19Measuring ensemble diversity 3/4
- The case of classification 5) correlation
coefficient, and 6) kappa statistic
20Measuring ensemble diversity 4/4
- The case of classification 7) entropy, and 8)
variance
21Experimental investigations
- 21 datasets from UCI Machine Learning Repository
- 70 test runs of Monte-Carlo cross-validation
- 70/30 train/test set division
- 5 different ensemble sizes 5, 10, 25, 50, and
100 - 6 integration methods Static Selection, SS,
Simple Voting, V, Weighted Voting, WV, Dynamic
Selection, DS, Dynamic Voting, DV, and Dynamic
Voting with Selection, DVS - the test environment of previously developed
ESF_SBC algorithm was used (Ensemble Feature
Selection with Simple Bayesian Classification) - the objective was to measure the correlations
between the diversity measures and improvement
due to ensemble
22Experiments results 1/2
Fig. 1. The correlations for the eight
diversities and five ensemble sizes averaged over
the data sets and integration methods
23Experiments results 2/2
Fig. 2. The correlations for the eight
diversities and six integration methods averaged
over the data sets and ensemble sizes
24Conclusions
- we have considered 8 ensemble diversity metrics,
6 of which are pairwise measures - to check the goodness of each measure of
diversity, we calculated its correlation with the
improvement in the classification accuracy due to
ensembles - the best correlations were shown by div_plain,
div_dis, div_ent, and div_amb - surprisingly, div_DF and div_Q had the worst
average correlation - the correlations changed with the change of the
integration method, showing the different use of
diversity by the integration methods - the best correlations were shown with dynamic
integration, and DV in particular - the correlations decreased almost linearly with
the increase in the ensemble size - different contexts as other ensemble generation
strategies and integration methods can be tired
in the future
25Contact info
- Alexey Tsymbal, Padraig Cunningham
- Dept of Computer Science
- Trinity College Dublin
- IrelandAlexey.Tsymbal_at_cs.tcd.ie,
Padraig.Cunningham_at_cs.tcd.ie
Mykola PechenizkiyDepartment of Computer
ScienceUniversity of Jyväskylä
Finland mpechen_at_cs.jyu.fi