Wrappers for feature subset selection - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Wrappers for feature subset selection

Description:

This type is only feasible when the initial size of the feature set is small. ... 2) Let v = argmax f(w) (get the state from OPEN with maximum f(w) ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 17
Provided by: staffSci
Category:

less

Transcript and Presenter's Notes

Title: Wrappers for feature subset selection


1
Wrappers for feature subset selection
  • Ron Kohavi, George H.John

2
Agenda
  • What is feature subset selection
  • Why is it needed
  • Intuitive Solution
  • Filter Method and Wrapper Method
  • Wrapper method

3
What?
  • Consider a 3 year old child who has just been
    taught how to read the letters of English
    language. He is given the following picture

? ? ? ? ? ? ? ? ?
4
What
  • Machines also need some kind of sieve
  • Differentiate what is relevant and what is not to
    achieve good performance result.
  • Feature selection is used for dimension reduction

5
Why?
  • Performance Improvement
  • Experiments with a decision tree classifier have
    shown that adding a random binary feature to
    standard datasets can deteriorate the
    classification performance by 5 - 10. --Witten
    1999
  • (source http//www.llnl.gov/CASC/sapphire/dimred.
    html)
  • Lower Error rate

6
Intuitive Solution
  • Select the d features out of the given p
    variables.
  • Nd p! / ((p-d)! d!)
  • Search the optimal set using Exhaustive or
    Complete search or Heuristic Search based on
    some evaluation criteria

7
Evaluation Criteria
  • Goodness metrics for subset evaluation Metrics
    are evaluated in terms of the time complexities,
    the number of features and also the level of
    redundancy in the resultant selected feature set.
    Here are some of the metrics used for feature
    selection.
  • Relevance This measure give the dependency of
    the feature subset of the classes.
  • Distance The ability of the feature subset to
    separate the classes.
  • Information Entropy of the feature subset.
  • Accuracy 1 - error_rate.
  • Consistency This is given by (1 -
    inconsistencies/N)

8
Relevance
  • Definition of relevance?
  • Relevance does not imply optimality
  • Optimality does not imply relevance

9
Search Methods
  • Exhaustive or Complete search It is also called
    as Enumeration search. In this method all
    possible subsets are listed and evaluated, before
    the best one is selected. This type is only
    feasible when the initial size of the feature set
    is small. The search space increases
    exponentially as the size of the initial feature
    set increases.
  • Heuristic search Also called as the Sequential
    search. This method can be forward search, where
    the algorithm first selects one feature and then
    iteratively adds features to this set by using
    some heuristics. In the backward search, the
    algorithm starts with the entire feature set and
    then iteratively removes the irrelevant features
    from this set, also by using some heuristics.

10
Filter Model
Feature subset Selection
Input Features
Induction Algorithm
The search algorithm iteratively generates
feature subsets, which are evaluated by the
evaluation module. This is iterated until some
stopping criteria are reached and the final
feature subset is output. The final subset is
then tested with a classifier The main
disadvantage of the of the filter approach is
that it totally ignores the effects of the
selected feature subset on the performance of
the induction algorithm
11
Wrapper Model
Training Set
Induction Algorithm
Training Set
Feature selection Search
Feature Set
Feature set
Performance
Feature Evaluation
Feature set
Hypothesis
Induction Algorithm
Test Set
Final Evaluation
12
(No Transcript)
13
(No Transcript)
14
Search Engines
  • The wrapper approach conducts search over a space
    of possible parameters
  • A search requires a state space, an initial
    space, a termination condition and a search
    engine
  • The two search engines in focus are
  • Hill Climbing
  • Best Fit Search

15
Hill Climbing Search Engine
  • 1) Let v be the initial state
  • 2) Expand v apply all operators to v, giving vs
    children
  • 3) Apply evaluation function f to each child w of
    v
  • 4) let v be the child w with highest evaluation
    f(w)
  • 5) if f(v) gt f(w) then assign v to v goto
    step 2
  • 6) Return v

16
Best fit search engine
  • 1) Put the initial state on the OPEN list, CLOSED
    list ? 0, BEST ? Initial State
  • 2) Let v argmax f(w) (get the state from OPEN
    with maximum f(w)
  • 3) Remove v from OPEN, add v to CLOSED
  • 4) If F(v) e gt f(best) then BEST ? v
  • 5) Expand v apply all operators to v, giving vs
    children
  • 6) For each child not in CLOSED or OPEN list,
    evaluate and add to the OPEN list
  • 7) If BEST changed in the last k expansions, goto
    2
  • 8) Return BEST
Write a Comment
User Comments (0)
About PowerShow.com