Sequential Genetic Search for Ensemble Feature Selection - PowerPoint PPT Presentation

About This Presentation

Title:

Sequential Genetic Search for Ensemble Feature Selection

Description:

IJCAI'2005, Edinburgh, Scotland August 1-5, 2005. 2. IJCAI'2005 Edinburgh, Scotland, August 1-5, 2005 ... IJCAI'2005 Edinburgh, Scotland, August 1-5, 2005 ... – PowerPoint PPT presentation

Number of Views:185

Avg rating:3.0/5.0

Slides: 31

Provided by: mykolapec

Category:

more less

Transcript and Presenter's Notes

Title: Sequential Genetic Search for Ensemble Feature Selection

1
Sequential Genetic Search for Ensemble Feature
Selection
IJCAI2005, Edinburgh, Scotland August 1-5, 2005

Alexey Tsymbal, Padraig Cunningham
Department of Computer ScienceTrinity College
DublinIreland
Mykola PechenizkiyDepartment of Computer
ScienceUniversity of Jyväskylä Finland

2
Contents

Introduction
Classification and Ensemble Classification
Ensemble Feature Selection
strategies
sequential genetic search
Our GAS-SEFS strategy
Genetic Algorithm-based Sequential Search for
Ensemble Feature Selection
Experiment design
Experimental results
Conclusions and future work

3
The Task of Classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases -
antibiotic resistance prediction
Class Membership of the new instance
4
Ensemble classification
How to prepare inputs for generation of the base
classifiers?
5
Ensemble classification
How to combine the predictions of the base
classifiers?
6
Ensemble feature selection

How to prepare inputs for generation of the base
classifiers ?
Sampling the training set
Manipulation of input features
Manipulation of output targets (class values)
Goal of traditional feature selection
find and remove features that are unhelpful or
misleading to learning (making one feature subset
for single classifier)
Goal of ensemble feature selection
find and remove features that are unhelpful or
destructive to learning making different feature
subsets for a number of classifiers
find feature subsets that will promote diversity
(disagreement) between classifiers

7
Search in EFS
Search space 2Features Classifiers Search
strategies include

Ensemble Forward Sequential Selection (EFSS)
Ensemble Backward Sequential Selection (EBSS)
Hill-Climbing (HC)
Random Subspacing Method (RSM)
Genetic Ensemble Feature Selection (GEFS)

Fitness function
8
Measuring Diversity
The fail/non-fail disagreement measure the
percentage of test instances for which the
classifiers make different predictions but for
which one of them is correct

The kappa statistic

9
Random Subspace Method

RSM itself is simple but effective technique for
EFS
the lack of accuracy in the ensemble members is
compensated for by their diversity
does not suffer from the curse of dimensionality
RS is used as a base in other EFS strategies,
including Genetic Ensemble Feature Selection.
Generation of initial feature subsets using (RSM)
A number of refining passes on each feature set
while there is improvement in fitness

10
Genetic Ensemble Feature Selection

Genetic search important direction in FS
research
GA as effective global optimization technique
GA for EFS
Kuncheva, 1993 Ensemble accuracy instead of
accuracies of base classifiers
Fitness function is biased towards particular
integration method
Preventive measures to avoid overfitting
Alternative use of individual accuracy and
diversity
Overfitting of individual is more desirable than
overfitting of ensemble
Opitz, 1999 Explicitly used diversity in fitness
function
RSM for initial population
New candidates by crossover and mutation
Roulette-wheel selection (p proportional to
fitness)

11
Genetic Ensemble Feature Selection
12
Basic Idea behind GA for EFS
Ensemble (generation)
RSM
init
BC1
Current Population (diversity)
GA
BCi
New Population (fitness)
BCEns. Size
13
Basic Idea behind GAS-SEFS
RSM
init
Generation
Ensemble
BC1
Current Population (accuracies)
New Population (fitness)
GAi1
diversity
new BC (fitness)
BCi
BCi1
BCi1
14
GAS-SEFS 1 of 2

GAS-SEFS (Genetic Algorithm-based Sequential
Search for Ensemble Feature Selection)
instead of maintaining a set of feature subsets
in each generation like in GA, consists in
applying a series of genetic processes, one for
each base classifier, sequentially.
After each genetic process one base classifier is
selected into the ensemble.
GAS-SEFS uses the same fitness function, but
diversity is calculated with the base classifiers
already formed by previous genetic processes
In the first GA process accuracy only.
GAS-SEFS uses the same genetic operators as GA.

15
GAS-SEFS 2 of 2

GA and GAS-SEFS peculiarities
Full feature sets are not allowed in RS
The crossover operator may not produce a full
feature subset.
Individuals for crossover are selected randomly
proportional to log(1fitness) instead of just
fitness
The generation of children identical to their
parents is prohibited.
To provide a better diversity in the length of
feature subsets, two different mutation operators
are used
Mutate1_0 deletes features randomly with a given
probability
Mutate0_1 adds features randomly with a given
probability.

16
Computational complexity
Complexity of GA-based search does not depend on
the features GAS-SEFS GA where S is
the number of base classifiers, S is the number
of individuals (feature subsets) in one
generation, and Ngen is the number of
generations. EFSS and EBSS where S is the
number of base classifiers, N is the total
number of features, and N is the number of
features included or deleted on average in an FSS
or BSS search.
17
Integration of classifiers
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting with Selection (DVS)
Motivation for the Dynamic Integration Each
classifier is best in some sub-areas of the whole
data set, where its local error is comparatively
less than the corresponding errors of the other
classifiers.
18
Experimental Design

Parameter settings for GA and GAS-SEFS
a mutation rate - 50
a population size 10
a search length of 40 feature subsets/individuals
20 are offsprings of the current population of 10
classifiers generated by crossover,
20 are mutated offsprings (10 with each mutation
operator).
10 generations of individuals were produced
400 (GA) and 4000 (GAS-SEFS) feature subsets.
To evaluate GA and GAS-SEFS
5 integration methods
Simple Bayes as Base Classifier
stratified random-sampling with 60/20/20 of
instances in the training/validation/test set
70 test runs on each of 21 UCI data set for each
strategy and diversity.

19
GA vs GAS-SEFS on two groups of datasets
DVS F/N-F disagreement
Ensemble Size
Ensemble accuracies for GA and GAS-SEFS on two
groups of data sets (1) lt 9 and (2) gt 9
features with four ensemble sizes
20
GA vs GAS-SEFS for Five Integration Methods
Ensemble Size 10

Ensemble accuracies for five integration methods
on Tic-Tac-Toe

21
Conclusions and Future Work

Diversity in ensemble of classifiers is very
important
We have considered two genetic search strategies
for EFS.
The new strategy, GAS-SEFS, consists in employing
a series of genetic search processes
one for each base classifier.
GAS-SEFS results in better ensembles having
greater accuracy
especially for data sets with relatively larger
numbers of features.
one reason each of the core GA processes leads
to significant overfitting of a corresponding
ensemble member
GAS-SEFS is significantly more time-consuming
than GA.
GAS-SEFS ensemble_size GA
Oliveira et al., 2003 better results for
single FSS based on Pareto-front dominating
solutions.
Adaptation of this technique to EFS is an
interesting topic for further research.

22
Thank you!

Alexey Tsymbal, Padraig Cunningham
Dept of Computer ScienceTrinity College
DublinIrelandAlexey.Tsymbal_at_cs.tcd.ie,
Padraig.Cunningham_at_cs.tcd.ie

Mykola PechenizkiyDepartment of Computer
Scienceand Information Systems University of
Jyväskylä Finland mpechen_at_cs.jyu.fi
23
Additional Slides
24
References

Kuncheva, 1993 Ludmila I. Kuncheva. Genetic
algorithm for feature selection for parallel
classifiers, Information Processing Letters 46
163-168, 1993.
Kuncheva and Jain, 2000 Ludmila I. Kuncheva and
Lakhmi C. Jain. Designing classifier fusion
systems by genetic algorithms, IEEE Transactions
on Evolutionary Computation 4(4) 327-336, 2000.
Oliveira et al., 2003 Luiz S. Oliveira, Robert
Sabourin, Flavio Bortolozzi, and Ching Y. Suen. A
methodology for feature selection using
multi-objective genetic algorithms for
handwritten digit string recognition, Pattern
Recognition and Artificial Intelligence 17(6)
903-930, 2003.
Opitz, 1999 David Opitz. Feature selection for
ensembles. In Proceedings of the 16th National
Conference on Artificial Intelligence, pages
379-384, 1999, AAAI Press.

25
GAS-SEFS Algorithm
26
Other interesting findings

alpha
were different for different data sets,
for both GA and GAS-SEFS, alpha for the dynamic
integration methods is bigger than for the static
ones (2.2 vs 0.8 on average).
GAS-SEFS needs slightly higher values of alpha
than GA (1.8 vs 1.5 on average).
GAS-SEFS always starts with a classifier, which
is based on accuracy only, and the subsequent
classifiers need more diversity than accuracy.
of selected features falls as the ensemble size
grows,
this is especially clear for GAS-SEFS, as the
base classifiers need more diversity.
integration methods (for both GA and GAS-SEFS)
the static, SS and WV, and the dynamic DS start
to overfit the validation set already after 5
generations and show lower accuracies,
accuracies of DV and DVS continue to grow up to
10 generations.

27
Paper Summary

New strategy for genetic ensemble feature
selection, GAS-SEFS, is introduced
In contrast with previously considered algorithm
(GA), it is sequential a serious of genetic
processes for each base classifier

More time-consuming, but with better accuracy
Each base classifier has a considerable level of
overfitting with GAS-SEFS, but the ensemble
accuracy grows
Experimental comparisons demonstrate clear
superiority on 21 UCI datasets, especially for
datasets with many features (gr1 vs gr2)

28
Simple Bayes as Base Classifier

Bayes theorem
P(CX) P(XC)P(C) / P(X)
Naïve assumption attribute independence
P(x1,,xkC)
P(x1C)P(xkC)
If i-th attribute is categoricalP(xiC) is
estimated as the relative freq of samples having
value xi as i-th attribute in class C
If i-th attribute is continuousP(xiC) is
estimated thru a Gaussian density function
Computationally easy in both cases

29
Datasets characteristics
Data set Instances Classes Features Features
Data set Instances Classes Categ. Num.
Balance 625 3 0 4
Breast Cancer 286 2 9 0
Car 1728 4 6 0
Diabetes 768 2 0 8
Glass Recognition 214 6 0 9
Heart Disease 270 2 0 13
Ionosphere 351 2 0 34
Iris Plants 150 3 0 4
LED 300 10 7 0
LED17 300 10 24 0
Liver Disorders 345 2 0 6
Lymphography 148 4 15 3
MONK-1 432 2 6 0
MONK-2 432 2 6 0
MONK-3 432 2 6 0
Soybean 47 4 0 35
Thyroid 215 3 0 5
Tic-Tac-Toe 958 2 9 0
Vehicle 846 4 0 18
Voting 435 2 16 0
Zoo 101 7 16 0
30
GA vs GAS-SEFS for Five Integration Methods
Ensemble Size