Diversity in Random Subspacing Ensembles - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Diversity in Random Subspacing Ensembles

Description:

Diversity in Random Subspacing Ensembles. Alexey Tsymbal, Padraig Cunningham ... a large number of such 'staircases', the diagonal decision boundary can be ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 26

Provided by: tktl

Category:

more less

Transcript and Presenter's Notes

Title: Diversity in Random Subspacing Ensembles

1
Diversity in Random Subspacing Ensembles
DaWaK2004 Zaragoza, Spain September 1-3, 2004

Alexey Tsymbal, Padraig Cunningham
Department of Computer ScienceTrinity College
DublinIrelandMykola PechenizkiyDepartment of
Computer ScienceUniversity of Jyväskylä Finland

2
Contents

Introduction the task of classification
Introduction to ensembles
Ensemble feature selection and random subspacing
Integration methods for ensembles
Measures of diversity in classification ensembles
Experimental results correlation between the
diversities and improvement due to ensembles
Conclusions and future work

3
The task of classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases - heart
attack prediction, etc.
Class Membership of the new instance
4
What is ensemble learning?

Ensemble learning refers to a collection of
methods that learn a target function by training
a number of individual learners and combining
their predictions

5
Ensemble learning
6
Why ensemble learning?

Accuracy a more reliable mapping can be
obtained by combining the output of multiple
experts
Efficiency a complex problem can be decomposed
into multiple sub-problems that are easier to
understand and solve (divide-and-conquer
approach). Mixture of experts, ensemble feature
selection.
There is no single model that works for all
pattern recognition problems! (no free lunch
theorem)

To solve really hard problems, well have to use
several different representations. It is time
to stop arguing over which type of
pattern-classification technique is best.
Instead we should work at a higher level of
organization and discover how to build managerial
systems to exploit the different virtues abd
evade the different limitations of each of these
ways of comparing things. Minsky, 1991.
7
When ensemble learning?

When you can build base classifiers that are more
accurate than chance (accuracy), and, more
importantly,
that are as much as possible independent from
each other (diversity)

8
Why do ensembles work? 1/2

The desired target function may not be
implementable with individual classifiers, but
may be approximated by ensemble averaging
Assume you want to build a decision boundary
with decision trees
The decision boundaries of decision trees are
hyperplanes parallel to the coordinate axes, as
in the figures
By averaging a large number of such staircases,
the diagonal decision boundary can be
approximated with arbitrarily good accuracy

Class 1
Class 1
Class 2
Class 2
a
b
9
Why do ensembles work? 2/2

Theoretical results by Hansen Solomon (1990)
If we can assume that classifiers are independent
in predictions and their accuracy gt 50, we can
push accuracy arbitrarily high by combining more
classifiers
Key assumption
classifiers are independent in their predictions
not a very reasonable assumption
more realistic for data points where classifiers
predict with gt50 accuracy, we can push accuracy
arbitrarily high (some data points are just too
difficult)

10
How to make an effective ensemble?

Two basic decisions when designing ensembles
How to generate the base classifiers?
How to integrate them?

11
Methods for generating the base classifiers

Subsampling the training examples
Manipulating the input features
Manipulating the output targets
Modifying the learning parameters of the
classifier
Using heterogeneous models (not often used)

12
Ensemble Feature Selection and RSM

How to prepare inputs for the generation of the
base classifiers ?
Sample the training set
Manipulate input features
Manipulate output target (class values)
Goal of traditional feature selection
find and remove features that are unhelpful or
destructive to learning making one feature subset
for single classifier
Goal of ensemble feature selection
find and remove features that are unhelpful or
destructive to learning making different feature
subsets for a number of classifiers
find feature subsets that will promote
disagreement between the classifiers
Random Subspace Method (RSM)
Accuracy of ensemble members is compensated for
by their diversity

13
Integration of classifiers
Integration
Selection
Combination
Dynamic Voting with Selection (DVS)
Static
Dynamic
Dynamic
Static
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting (DV)

Motivation for the Dynamic Integration
The main assumption is that each classifier is
the best in some sub-areas of the whole data set,
where its local error is comparatively less than
the corresponding errors of the other classifiers.

14
The space model motivation for dynamic
integration

Information about classifiers errors on training
instances can be used for learning just as
original instances are used for learning.

Motivation for the Dynamic Integration The
main assumption is that each classifier is the
best in some sub-areas of the whole data set,
where its local error is comparatively less than
the corresponding errors of the other classifiers.
15
Dynamic integration of classifiers an example

3 base classifiers
2 features X1 and X2

X
2
(000)
(100)
(000)
(000)
NN
(010)
2
NN
3
d
d
2
3
(000)
(000)
P
d
NN
1
1
(0.30.60)
(000)
d
max
(110)
(000)
(010)
(001)
X
1
16
Ensembles the need for diversity

Overall error depends on average error of
ensemble members
Increasing ambiguity decreases overall error
Provided it does not result in an increase in
average error
(Krogh and Vedelsby, 1995)

17
Measuring ensemble diversity 1/4

The ensemble ambiguity in regression is measured
as the weighted average of the squared
differences in the predictions of the base
networks and the ensemble (regression case)
The case of classification 1) plain
disagreement, and 2) fail/non-fail disagreement

18
Measuring ensemble diversity 2/4

The case of classification 3) double fault, and
4) Q statistic

19
Measuring ensemble diversity 3/4

The case of classification 5) correlation
coefficient, and 6) kappa statistic

20
Measuring ensemble diversity 4/4

The case of classification 7) entropy, and 8)
variance

21
Experimental investigations

21 datasets from UCI Machine Learning Repository
70 test runs of Monte-Carlo cross-validation
70/30 train/test set division
5 different ensemble sizes 5, 10, 25, 50, and
100
6 integration methods Static Selection, SS,
Simple Voting, V, Weighted Voting, WV, Dynamic
Selection, DS, Dynamic Voting, DV, and Dynamic
Voting with Selection, DVS
the test environment of previously developed
ESF_SBC algorithm was used (Ensemble Feature
Selection with Simple Bayesian Classification)
the objective was to measure the correlations
between the diversity measures and improvement
due to ensemble

22
Experiments results 1/2
Fig. 1. The correlations for the eight
diversities and five ensemble sizes averaged over
the data sets and integration methods
23
Experiments results 2/2
Fig. 2. The correlations for the eight
diversities and six integration methods averaged
over the data sets and ensemble sizes
24
Conclusions

we have considered 8 ensemble diversity metrics,
6 of which are pairwise measures
to check the goodness of each measure of
diversity, we calculated its correlation with the
improvement in the classification accuracy due to
ensembles
the best correlations were shown by div_plain,
div_dis, div_ent, and div_amb
surprisingly, div_DF and div_Q had the worst
average correlation
the correlations changed with the change of the
integration method, showing the different use of
diversity by the integration methods
the best correlations were shown with dynamic
integration, and DV in particular
the correlations decreased almost linearly with
the increase in the ensemble size
different contexts as other ensemble generation
strategies and integration methods can be tired
in the future