Title: Sequential Genetic Search for Ensemble Feature Selection
1Sequential Genetic Search for Ensemble Feature
Selection
IJCAI2005, Edinburgh, Scotland August 1-5, 2005
- Alexey Tsymbal, Padraig Cunningham
- Department of Computer ScienceTrinity College
DublinIreland - Mykola PechenizkiyDepartment of Computer
ScienceUniversity of Jyväskylä Finland
2Contents
- Introduction
- Classification and Ensemble Classification
- Ensemble Feature Selection
- strategies
- sequential genetic search
- Our GAS-SEFS strategy
- Genetic Algorithm-based Sequential Search for
Ensemble Feature Selection - Experiment design
- Experimental results
- Conclusions and future work
3The Task of Classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases -
antibiotic resistance prediction
Class Membership of the new instance
4Ensemble classification
How to prepare inputs for generation of the base
classifiers?
5Ensemble classification
How to combine the predictions of the base
classifiers?
6Ensemble feature selection
- How to prepare inputs for generation of the base
classifiers ? - Sampling the training set
- Manipulation of input features
- Manipulation of output targets (class values)
- Goal of traditional feature selection
- find and remove features that are unhelpful or
misleading to learning (making one feature subset
for single classifier) - Goal of ensemble feature selection
- find and remove features that are unhelpful or
destructive to learning making different feature
subsets for a number of classifiers - find feature subsets that will promote diversity
(disagreement) between classifiers
7Search in EFS
Search space 2Features Classifiers Search
strategies include
- Ensemble Forward Sequential Selection (EFSS)
- Ensemble Backward Sequential Selection (EBSS)
- Hill-Climbing (HC)
- Random Subspacing Method (RSM)
- Genetic Ensemble Feature Selection (GEFS)
Fitness function
8Measuring Diversity
The fail/non-fail disagreement measure the
percentage of test instances for which the
classifiers make different predictions but for
which one of them is correct
9Random Subspace Method
- RSM itself is simple but effective technique for
EFS - the lack of accuracy in the ensemble members is
compensated for by their diversity - does not suffer from the curse of dimensionality
- RS is used as a base in other EFS strategies,
including Genetic Ensemble Feature Selection. - Generation of initial feature subsets using (RSM)
- A number of refining passes on each feature set
while there is improvement in fitness
10Genetic Ensemble Feature Selection
- Genetic search important direction in FS
research - GA as effective global optimization technique
- GA for EFS
- Kuncheva, 1993 Ensemble accuracy instead of
accuracies of base classifiers - Fitness function is biased towards particular
integration method - Preventive measures to avoid overfitting
- Alternative use of individual accuracy and
diversity - Overfitting of individual is more desirable than
overfitting of ensemble - Opitz, 1999 Explicitly used diversity in fitness
function - RSM for initial population
- New candidates by crossover and mutation
- Roulette-wheel selection (p proportional to
fitness)
11Genetic Ensemble Feature Selection
12Basic Idea behind GA for EFS
Ensemble (generation)
RSM
init
BC1
Current Population (diversity)
GA
BCi
New Population (fitness)
BCEns. Size
13Basic Idea behind GAS-SEFS
RSM
init
Generation
Ensemble
BC1
Current Population (accuracies)
New Population (fitness)
GAi1
diversity
new BC (fitness)
BCi
BCi1
BCi1
14GAS-SEFS 1 of 2
- GAS-SEFS (Genetic Algorithm-based Sequential
Search for Ensemble Feature Selection) - instead of maintaining a set of feature subsets
in each generation like in GA, consists in
applying a series of genetic processes, one for
each base classifier, sequentially. - After each genetic process one base classifier is
selected into the ensemble. - GAS-SEFS uses the same fitness function, but
- diversity is calculated with the base classifiers
already formed by previous genetic processes - In the first GA process accuracy only.
- GAS-SEFS uses the same genetic operators as GA.
15GAS-SEFS 2 of 2
- GA and GAS-SEFS peculiarities
- Full feature sets are not allowed in RS
- The crossover operator may not produce a full
feature subset. - Individuals for crossover are selected randomly
proportional to log(1fitness) instead of just
fitness - The generation of children identical to their
parents is prohibited. - To provide a better diversity in the length of
feature subsets, two different mutation operators
are used - Mutate1_0 deletes features randomly with a given
probability - Mutate0_1 adds features randomly with a given
probability.
16Computational complexity
Complexity of GA-based search does not depend on
the features GAS-SEFS GA where S is
the number of base classifiers, S is the number
of individuals (feature subsets) in one
generation, and Ngen is the number of
generations. EFSS and EBSS where S is the
number of base classifiers, N is the total
number of features, and N is the number of
features included or deleted on average in an FSS
or BSS search.
17Integration of classifiers
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting with Selection (DVS)
Motivation for the Dynamic Integration Each
classifier is best in some sub-areas of the whole
data set, where its local error is comparatively
less than the corresponding errors of the other
classifiers.
18Experimental Design
- Parameter settings for GA and GAS-SEFS
- a mutation rate - 50
- a population size 10
- a search length of 40 feature subsets/individuals
- 20 are offsprings of the current population of 10
classifiers generated by crossover, - 20 are mutated offsprings (10 with each mutation
operator). - 10 generations of individuals were produced
- 400 (GA) and 4000 (GAS-SEFS) feature subsets.
- To evaluate GA and GAS-SEFS
- 5 integration methods
- Simple Bayes as Base Classifier
- stratified random-sampling with 60/20/20 of
instances in the training/validation/test set - 70 test runs on each of 21 UCI data set for each
strategy and diversity.
19GA vs GAS-SEFS on two groups of datasets
DVS F/N-F disagreement
Ensemble Size
Ensemble accuracies for GA and GAS-SEFS on two
groups of data sets (1) lt 9 and (2) gt 9
features with four ensemble sizes
20GA vs GAS-SEFS for Five Integration Methods
Ensemble Size 10
- Ensemble accuracies for five integration methods
on Tic-Tac-Toe
21Conclusions and Future Work
- Diversity in ensemble of classifiers is very
important - We have considered two genetic search strategies
for EFS. - The new strategy, GAS-SEFS, consists in employing
a series of genetic search processes - one for each base classifier.
- GAS-SEFS results in better ensembles having
greater accuracy - especially for data sets with relatively larger
numbers of features. - one reason each of the core GA processes leads
to significant overfitting of a corresponding
ensemble member - GAS-SEFS is significantly more time-consuming
than GA. - GAS-SEFS ensemble_size GA
- Oliveira et al., 2003 better results for
single FSS based on Pareto-front dominating
solutions. - Adaptation of this technique to EFS is an
interesting topic for further research.
22Thank you!
- Alexey Tsymbal, Padraig Cunningham
- Dept of Computer ScienceTrinity College
DublinIrelandAlexey.Tsymbal_at_cs.tcd.ie, - Padraig.Cunningham_at_cs.tcd.ie
Mykola PechenizkiyDepartment of Computer
Scienceand Information Systems University of
Jyväskylä Finland mpechen_at_cs.jyu.fi
23Additional Slides
24References
- Kuncheva, 1993 Ludmila I. Kuncheva. Genetic
algorithm for feature selection for parallel
classifiers, Information Processing Letters 46
163-168, 1993. - Kuncheva and Jain, 2000 Ludmila I. Kuncheva and
Lakhmi C. Jain. Designing classifier fusion
systems by genetic algorithms, IEEE Transactions
on Evolutionary Computation 4(4) 327-336, 2000. - Oliveira et al., 2003 Luiz S. Oliveira, Robert
Sabourin, Flavio Bortolozzi, and Ching Y. Suen. A
methodology for feature selection using
multi-objective genetic algorithms for
handwritten digit string recognition, Pattern
Recognition and Artificial Intelligence 17(6)
903-930, 2003. - Opitz, 1999 David Opitz. Feature selection for
ensembles. In Proceedings of the 16th National
Conference on Artificial Intelligence, pages
379-384, 1999, AAAI Press.
25GAS-SEFS Algorithm
26Other interesting findings
- alpha
- were different for different data sets,
- for both GA and GAS-SEFS, alpha for the dynamic
integration methods is bigger than for the static
ones (2.2 vs 0.8 on average). - GAS-SEFS needs slightly higher values of alpha
than GA (1.8 vs 1.5 on average). - GAS-SEFS always starts with a classifier, which
is based on accuracy only, and the subsequent
classifiers need more diversity than accuracy. - of selected features falls as the ensemble size
grows, - this is especially clear for GAS-SEFS, as the
base classifiers need more diversity. - integration methods (for both GA and GAS-SEFS)
- the static, SS and WV, and the dynamic DS start
to overfit the validation set already after 5
generations and show lower accuracies, - accuracies of DV and DVS continue to grow up to
10 generations.
27Paper Summary
- New strategy for genetic ensemble feature
selection, GAS-SEFS, is introduced - In contrast with previously considered algorithm
(GA), it is sequential a serious of genetic
processes for each base classifier
- More time-consuming, but with better accuracy
- Each base classifier has a considerable level of
overfitting with GAS-SEFS, but the ensemble
accuracy grows - Experimental comparisons demonstrate clear
superiority on 21 UCI datasets, especially for
datasets with many features (gr1 vs gr2)
28Simple Bayes as Base Classifier
- Bayes theorem
- P(CX) P(XC)P(C) / P(X)
- Naïve assumption attribute independence
- P(x1,,xkC)
P(x1C)P(xkC) - If i-th attribute is categoricalP(xiC) is
estimated as the relative freq of samples having
value xi as i-th attribute in class C - If i-th attribute is continuousP(xiC) is
estimated thru a Gaussian density function - Computationally easy in both cases
29Datasets characteristics
Data set Instances Classes Features Features
Data set Instances Classes Categ. Num.
Balance 625 3 0 4
Breast Cancer 286 2 9 0
Car 1728 4 6 0
Diabetes 768 2 0 8
Glass Recognition 214 6 0 9
Heart Disease 270 2 0 13
Ionosphere 351 2 0 34
Iris Plants 150 3 0 4
LED 300 10 7 0
LED17 300 10 24 0
Liver Disorders 345 2 0 6
Lymphography 148 4 15 3
MONK-1 432 2 6 0
MONK-2 432 2 6 0
MONK-3 432 2 6 0
Soybean 47 4 0 35
Thyroid 215 3 0 5
Tic-Tac-Toe 958 2 9 0
Vehicle 846 4 0 18
Voting 435 2 16 0
Zoo 101 7 16 0
30GA vs GAS-SEFS for Five Integration Methods
Ensemble Size
- Ensemble accuracies for GA (left) and GAS-SEFS
(right) for five integration methods and four
ensemble sizes on Tic-Tac-Toe