Local Search-embedded Genetic Algorithms for Feature Selection - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Local Search-embedded Genetic Algorithms for Feature Selection

Description:

Ripple factor influencing the strength of local improvement ... The larger ripple factor r, the stronger local improvement we have ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 15
Provided by: cvChon
Category:

less

Transcript and Presenter's Notes

Title: Local Search-embedded Genetic Algorithms for Feature Selection


1
Local Search-embedded Genetic Algorithms for
Feature Selection Il-Seok Oh, Jin-Seon Lee,
Byung-Ro Moon Chonbuk National University,
Korea (isoh_at_moak.chonbuk.ac.kr) Woosuk
University, Korea Seoul National
University, Korea
  • Abstract This paper proposes a novel
    hybrid genetic algorithm for the feature
    selection. Local search operations used to
    improve chromosomes are defined and embedded in
    hybrid GAs. The hybridization gives two desirable
    effects improving the final performance
    significantly and acquiring control of subset
    size. For the implementation reproduction by
    readers, we provide detailed information of GA
    procedure and parameter setting. Experimental
    results reveal that the proposed hybrid GA is
    superior to a classical GA and sequential search
    algorithms.

2
Backgrounds
  • Feature selection algorithms
  • Enumeration algorithms
  • Exhaustive search
  • Branch-and-bound
  • Sequential search algorithms
  • SFS (sequential forward search) and SBF
  • PTA (plus-l take-away-r)
  • SFFS (sequential floating forward search) and
    SFBS Pudil94
  • GA (Genetic Algorithm)
  • Many versions available

3
Backgrounds
  • Conventional GAs for feature selection
  • Inconsistent assessment
  • GA is superior vs. GA is inferior
  • Due to many variations in GA implementation
  • e.g.) Jain and Zonker, Feature selection
    evaluation, application, and small sample
    performance, IEEE TPAMI, 1997.
  • GA reaching a peak performance at 7th or 8th
    generation..
  • Premature convergence due to improper
    implementation
  • Insufficient GA implementation specification
  • No problem-specific heuristics used
  • Our approach
  • Detailed implementation specification
  • Hybrid GA using heuristic local search operations

4
Simple GA
  • Problem
  • Selecting d ones out of original D features
  • steady_state_GA
  • Chromosome encoding
  • Binary string with D digits
  • e.g.) 00101000
  • Third and fifth features selected
  • X3,5 and Y1,2,4,6,7,8

steady_state_GA() initialize population
P repeat select two parents p1 and p2 from
P offspring crossover(p1,p2)
mutation(offspring) replace(P,
offspring) until (stopping_condition)
5
Simple GA
  • Initial population
  • Controlling the number of selected features at
    the initialization stage
  • Fitness evaluation and selection
  • Penalizing the chromosome whose size is not d
  • Rank-based roulette-wheel selection scheme
  • Probability of selection by P(i) q(1-q)i-1

Initial population for (i1 to P) for (each
gene g in i-th chromosome) if(random_normal()ltd/
D) g1 else g0
fitness(C) J(XC) penalty(XC) // C
chromosome penalty(XC) wXC-d //
XC selected feature set
corresponding
to C
Chromosome selection by roulette wheel 1.
Calculate accumulative probabilities for i-th
chromosome by pi?j1,iP(j) for
i1,..,P and p00. 2. Generate a random number
r within 0,1. 3. Select i-th chromosome such
that pi-1ltrltpi.
6
Simple GA
  • Genetic operators
  • m-points crossover
  • Mutation
  • Control the numbers of 1-0 and 0-1 conversions

m-point crossover 1. Generate m random integers
within 1,D-1 and sort them to get
Lltl1,l2, .., lmgt. 2. Exchange segment between li
and li1 for odd is to get two offspring.
Controlled mutation 1. Let n0 and n1 to be
numbers of 0-bits and 1-bits in the
chromosome. 2. p1pm p0pm?n1/n0 3. for (each
gene g in the chromosome) 4. Generate a
random number r within 0,1. 5. if(g1
and rltp1) convert g to 0 6. else if(g0
and rltp0) convert g to 1
7
Simple GA
  • Parameters
  • No systematic parameter optimization process
    attempted
  • Following parameter set suggested since the same
    values work well over a wide variety of datasets

Parameter setting population size 20 pc
(crossover probability) 1.0 (always applied) pm
(mutation probability) 0.1 q (in rank-based
selection) 0.25
8
Hybrid GA
  • Simple GA
  • Weak in fine-tuning near local optimum points
  • Hybrid GA
  • Aiming at improving the fine-tuning capability of
    a simple GA
  • Hybrid GAs developed in many applications
  • TSP
  • Image compression
  • Graph partitioning
  • Etc
  • Principle
  • Improving chromosomes using problem-specific
    local search operations

9
Hybrid GA
  • HGA for feature selection
  • Adding local search operation after genetic
    operations

HGA() initialize population P repeat
select two parents p1 and p2 from
P offspring crossover(p1,p2)
mutation(offspring) local-improvement(offsp
ring) replace(P, offspring) until
(stopping_condition)
10
Hybrid GA
  • HGA for feature selection
  • Three cases
  • Size requirement satisfied X is perturbed by
    applying the ripple_rem(r) and ripple_add(r).
  • Fewer features in X X is increased by applying
    ripple_add(r) a number of times.
  • More features in X X is decreased by applying
    ripple_rem(r) a number of times.
  • In the last two cases, the chromosome is
    repaired.

 local_improvement(C) / C a chromosome
/ put features of 1-bits in C into X put
features of 0-bits in C into Y switch case
Xd ripple_rem(r) ripple_add(r) case
Xltd repeat ripple_add(r) d-X times case
Xgtd repeat ripple_rem(r) X-d times set
bits in C for features in X to be 1 set bits in
C for features in Y to be 0
11
Hybrid GA
  • Local search operations

Local search operations ripple_rem(r) ? REM(r)
ADD(r-1), r?1 ripple_add(r) ? ADD(r)
REM(r-1), r?1
Local search operations rem Choose the least
significant feature x from X such that
J(X-x)maxxj?X J(X-xj) and move x to Y. add
Choose the most significant feature y in Y such
that J(X?y)maxyj?Y J(X?yj) and move y to
X. REM(k) Repeat rem k times successively. ADD(k)
Repeat add k times successively.
12
Hybrid GA
  • Ripple factors
  • Number of features decreased by 1 independent of
    r by ripple_rem(r)
  • Ripple factor influencing the strength of local
    improvement
  • Related directly to actual number of rem and add
    operations to be executed
  • The larger ripple factor r, the stronger local
    improvement we have
  • Two desirable effects by incorporating local
    search operations
  • Final performance improvement through local
    improvement of chromosomes
  • Controlling the subset size

13
Experimental results
  • Performance of eight algorithms for two databases
  • Sonar (in UCI repository) 60-D and 2 classes
  • CENAPRMI handwritten numerals 100-D and 10
    classes

Table 1. Performance of eight algorithms (unit
)
14
Discussions
  • Conclusions
  • SFFS is the best among sequential search
    algorithms.
  • GAs (including SGA and HGAs) outperform SFFS.
    Tuning genetic parameters suitable to particular
    dataset is expected to lead to a further
    improvement.
  • HGAs outperform SGA by significant amount. Ripple
    factor r?2 is recommended.
Write a Comment
User Comments (0)
About PowerShow.com