Title: Feature Selection
1Feature Selection
Jamshid Shanbehzadeh, Samaneh Yazdani
Department of Computer Engineering, Faculty Of
Engineering, Khorazmi University (Tarbiat
Moallem University of Teheran)
2 Outline
3 Outline
- Part 1 Dimension Reduction
- Dimension
- Feature Space
- Definition Goals
- Curse of dimensionality
- Research and Application
- Grouping of dimension reduction methods
- Part 3 Application Of Feature Selection and
Software
- Part 2 Feature selection
- Parts of feature set
- Feature Selection Approach
4Part 1 Dimension Reduction
5- Dimension Reduction
- Dimension
- Dimension (Feature or Variable)
- A measurement of a certain aspect of an object
- Two feature of person
- weight
- hight
6- Dimension Reduction
- Feature Space
- Feature Space
- An abstract space where each pattern sample is
represented as point
7- Dimension Reduction
- Introduction
- Large and high-dimensional data
- Web documents, etc
- A large amount of resources are needed in
- Information Retrieval
- Classification tasks
- Data Preservation etc
- Dimension Reduction
8- Dimension Reduction
- Definition Goals
- Dimensionality reduction
- The study of methods for reducing the number
of dimensions describing the object
- General objectives of dimensionality reduction
- Reduce the computational cost
- Improve the quality of data for efficient
data-intensive processing tasks
9- Dimension Reduction
- Definition Goals
Class 1 overweight Class 2 underweight
Weight (kg)
60
50
Height (cm)
150
140
- Dimension Reduction
- preserves information on classification of
overweight and underweight as much as possible - makes classification easier
- reduces data size ( 2 features ? 1 feature )
10- Dimension Reduction
- Curse of dimensionality
- As the number of dimension increases, a fix
data sample becomes exponentially spars
Example
Observe that the data become more and more sparse
in higher dimensions
- Effective solution to the problem of curse of
dimensionality is -
Dimensionality reduction
11- Dimension Reduction
- Research and Application
Why dimension reduction is a subject of much
research recently?
- Massive data of large dimensionality in
- Knowledge discovery
- Text mining
- Web mining
- and . . .
12- Dimension Reduction
- Grouping of dimension reduction methods
- Dimensionality reduction approaches include
- Feature Selection
- Feature Extraction
13- Dimension Reduction
- Grouping of dimension reduction methods
Feature Selection
- Dimensionality reduction approaches include
- Feature Selection the problem of choosing a
small subset of features that ideally are
necessary and sufficient to describe the target
concept.
Example
- Feature Set X,Y
- Two Class
Goal Classification
- Feature X Or Feature Y ?
- Answer Feature X
14- Dimension Reduction
- Grouping of dimension reduction methods
Feature Selection
- Feature Selection (FS)
- Selects feature
- ex.
- Preserves weight
15- Dimension Reduction
- Grouping of dimension reduction methods
- Dimensionality reduction approaches include
- Feature Extraction Create new feature based on
transformations or combinations of the original
feature set.
New Feature
16- Dimension Reduction
- Grouping of dimension reduction methods
- Feature Extraction (FE)
- Generates feature
- ex.
- Preserves weight / height
17- Dimension Reduction
- Grouping of dimension reduction methods
- Dimensionality reduction approaches include
- Feature Extraction Create new feature based on
transformations or combinations of the original
feature set.
- N Number of original features
- M Number of extracted features
- MltN
18- Dimension Reduction
- Question Feature Selection Or Feature
Extraction
- Feature Selection Or Feature Extraction
- It is depend on the problem. Example
- Pattern recognition problem of dimensionality
reduction is to extract a small set of features
that recovers most of the variability of the
data. - Text mining problem is defined as selecting a
small subset of words or terms (not new features
that are combination of words or terms). - Image Compression problem is finding the best
extracted features to describe the image
19Part 2 Feature selection
20 Feature selection
- Thousands to millions of low level features
select the most relevant one to build better,
faster, and easier to understand learning
machines.
n
X
m
N
21- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Three disjoint categories of features
- Irrelevant
- Weakly Relevant
- Strongly Relevant
22- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Two Class Lion and Deer
- We use some features to classify a new instance
To which class does this animal belong
23- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Two Class Lion and Deer
- We use some feature to classify a new instance
So, number of legs is irrelevant feature
Q Number of legs? A 4
24- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Two Class Lion and Deer
- We use some features to classify a new instance
So, Color is an irrelevant feature
Q What is its color? A
- Feature 1 Number of legs
- Feature 2 Color
25- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Two Class Lion and Deer
- We use some features to classify a new instance
So, Feature 3 is a relevant feature
Q What does it eat? A Grass
- Feature 1 Number of legs
- Feature 2 Color
- Feature 3 Type of food
26- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Three Class Lion, Deer and Leopard
- We use some features to classify a new instance
To which class does this animal belong
27- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Three Class Lion, Deer and Leopard
- We use some features to classify a new instance
So, number of legs is an irrelevant feature
Q Number of legs? A 4
28- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Three Class Lion, Deer and Leopard
- We use some features to classify a new instance
So, Color is a relevant feature
Q What is its color? A
- Feature 1 Number of legs
- Feature 2 Color
29- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Three Class Lion and Deer and Leopard
- We use some features to classify a new instance
So, Feature 3 is a relevant feature
Q What does it eat? A meat
- Feature 1 Number of legs
- Feature 2 Color
- Feature3 Type of food
30- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Goal Classification
- Three Class Lion and Deer and Leopard
- We use some feature to classify a new instance
- Feature 1 Number of legs
- Feature 2 Color
- Feature3 Type of food
- Add new feature Felidae
- It is weakly relevant feature
- Optimal set Color, Type of food Or Color,
Felidae
31- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Traditionally, feature selection research has
focused on searching for relevant features.
Relevant
Irrelevant
Feature set
32- Feature selection
- Parts of feature set
- Irrelevant OR Relevant An Example for the
Problem
- Data set
- Five Boolean features
- C F1?F2
- F3 F2 , F5 F4
- Optimal subset
- F1, F2 or F1, F3
F1 F2 F3 F4 F5 C
0 0 1 0 1 0
0 1 0 0 1 1
1 0 1 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 1 0 1 0 1
1 0 1 1 0 1
1 1 0 1 0 1
33- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Formal Definition 1 (Irrelevance)
- Irrelevance indicates that the feature is not
necessary at all.
- In previous Example
- F4, F5 irrelevance
Relevant
F4 and F5
34- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Definition1(Irrelevance) A feature Fi is
irrelevant if
- Irrelevance indicates that the feature is not
necessary at all
- F be a full set of features
- Fi a feature
- Si F -Fi.
35- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Categories of relevant features
- Strongly Relevant
- Weakly Relevant
Strongly
Irrelevant
Weakly
Relevant
36- Feature selection
- Parts of feature set
- Irrelevant OR Relevant An Example for the
Problem
F1 F2 F3 F4 F5 C
0 0 1 0 1 0
0 1 0 0 1 1
1 0 1 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 1 0 1 0 1
1 0 1 1 0 1
1 1 0 1 0 1
- Data set
- Five Boolean features
- C F1?F2
- F3 F2 , F5 F4
37- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Formal Definition2 (Strong relevance)
- Strong relevance of a feature indicates that the
feature is always necessary for an optimal subset - It cannot be removed without affecting the
original conditional class distribution.
- In previous Example
- Feature F1 is strongly relevant
Weakly
F1
F4 and F5
38- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Definition 2 (Strong relevance) A feature Fi is
strongly relevant if
- Strong relevance of a feature cannot be removed
without affecting the original conditional class
distribution
39- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Formal Definition 3 (Weak relevance)
- Weak relevance suggests that the feature is not
always necessary but may become necessary for an
optimal subset at certain conditions.
- In previous Example
- F2, F3 weakly relevant
F2 and F3
F1
F4 and F5
40- Feature selection
- Parts of feature set
- Irrelevant OR Relevant
- Definition 3 (Weak relevance) A feature Fi is
weakly relevant if
- Weak relevance suggests that the feature is not
always necessary but may become necessary for an
optimal subset at certain conditions.
41- Feature selection
- Parts of feature set
- Optimal Feature Subset
- Example
- In order to determine the target concept
(Cg(F1, F2)) - F1 is indispensable
- One of F2 and F3 can be disposed
- Both F4 and F5 can be discarded.
optimal subset Either F1, F2 or F1, F3
- The goal of feature selection is to find either
of them.
42- Feature selection
- Parts of feature set
- Optimal Feature Subset
optimal subset Either F1, F2 or F1, F3
- An optimal subset should include all strongly
relevant features, none of irrelevant features,
and a subset of weakly relevant features.
which of weakly relevant features should be
selected and which of them removed
43- Feature selection
- Parts of feature set
- Redundancy
- Solution
- Defining Feature Redundancy
44- Feature selection
- Parts of feature set
- Redundancy
- Redundancy
- It is widely accepted that two features are
redundant to each other if their values are
completely correlated
- In previous Example
- F2, F3 ( )
45- Feature selection
- Parts of feature set
- Redundancy
Markov blanket
- The Markov blanket condition requires that Mi
subsume not only the information that Fi has
about C, but also about all of the other features.
46- Feature selection
- Parts of feature set
- Redundancy
- Redundancy definition further divides weakly
relevant features into redundant and
non-redundant ones.
III
II
Strongly
Irrelevant
Weakly
II Weakly relevant and redundant features III
Weakly relevant but non-redundant features
Optimal Subset Strongly relevant features
Weakly relevant but non-redundant features
47- Feature selection
- Approaches
48- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
- Framework of feature selection via subset
evaluation
49- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset Generation
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
50- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset search method -Exhaustive Search Example
- Examine all combinations of feature subset.
- Example
- f1,f2,f3 gt f1,f2,f3,f1,f2,f1,f3,f
2,f3,f1,f2,f3 - Order of the search space O(2d), d - feature.
- Optimal subset is achievable.
- Too expensive if feature space is large.
51- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset Evaluation
Measures the goodness of the subset Compares with
the previous best subset if found better, then
replaces the previous best subset
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
52- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset Evaluation
- Each feature and feature subset needs to be
evaluated based on importance by a criterion. - The existing feature selection algorithms, based
on criterion functions used in searching for
informative features can be generally categorized
as - Filter model
- Wrapper model
- Embedded methods
Note Different criteria may select different
features.
53- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Filter
- The filter approach utilizes the data alone to
decide which features should be kept, without
running the learning algorithm. - The filter approach basically pre-selects the
features, and then applies the selected feature
subset to the clustering algorithm.
- Evaluation function ltgt Classifier
- Ignored effect of selected subset on the
performance of classifier.
54- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Filter (1)- Independent Criterion
- Some popular independent criteria are
- Distance measures (Euclidean distance measure).
- Information measures (Entropy, Information
gain, etc.) - Dependency measures (Correlation coefficient)
- Consistency measures
55- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Wrappers
- In wrapper methods, the performance of a
learning algorithm is used to evaluate the
goodness of selected feature subsets.
- Evaluation function classifier
- Take classifier into account.
56- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Wrappers (2)
- Wrappers utilize a learning machine as a black
box to score subsets of features according to
their predictive power.
57- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Filters vs. Wrappers
- Filters
- Advantages
- Fast execution Filters generally involve a
non-iterative computation on the dataset, which
can execute much faster than a classifier
training session - Generality Since filters evaluate the intrinsic
properties of the data, rather than their
interactions with a particular classifier, their
results exhibit more generality the solution
will be good for a larger family of classifiers - Disadvantages
- The main disadvantage of the filter approach is
that it totally ignores the effects of the
selected feature subset on the performance of the
induction algorithm
58- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Filters vs. Wrappers
- Wrappers
- Advantages
- Accuracy wrappers generally achieve better
recognition rates than filters since they are
tuned to the specific interactions between the
classifier and the dataset - Disadvantages
- Slow execution since the wrapper must train a
classifier for each feature subset (or several
classifiers if cross-validation is used), the
method can become infeasible for computationally
intensive methods - Lack of generality the solution lacks
generality since it is tied to the bias of the
classifier used in the evaluation function.
59- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Embedded methods
All features
Yes, stop!
No, continue
Recursive Feature Elimination (RFE) SVM.
Guyon-Weston, 2000. US patent 7,117,188
60- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Stopping Criterion
- Based on Generation rcdure
- Pre-defined number of features
- Pre-defined number of iterations
- Based on Evaluation Function
- whether addition or deletion of a feature does
not produce a better subset - whether optimal subset based on some evaluation
function is achieved
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
61- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Result Validation
1
2
Original Feature Set
Generation
Subset
Evaluation
Basically not part of the feature selection
process itself - compare results with already
established results or results from competing
feature selection methods
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
62- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset Evaluation Advantage
- A feature subset selected by this approach
approximates the optimal subset
III
II
Strongly
Irrelevant
Weakly
II Weakly relevant and redundant features III
Weakly relevant but non-redundant features
Optimal Subset Strongly relevant features
Weakly relevant but non-redundant features
63- Feature selection
- Approaches Subset Evaluation (Feature Subset
Selection )
Subset Evaluation Disadvantages
- High computational cost of the subset search
makes subset evaluation approach inefficient for
high dimensional data.
64- Feature selection
- Approaches
65- Feature selection
- Approaches Individual Evaluation (Feature
Weighting/Ranking)
Individual method (Feature Ranking / Feature
weighting)
- Individual methods evaluate each feature
individually according to a criterion. - They then select features, which either satisfy
a condition or are top-ranked. - Exhaustive, greedy and random searches are
subset search methods because they evaluate each
candidate subset.
66- Feature selection
- Approaches Individual Evaluation (Feature
Weighting/Ranking)
Individual Evaluation Advantage
- linear time complexity in terms of
dimensionality N. - Individual method is efficient for
high-dimensional data.
67- Feature selection
- Approaches Individual Evaluation (Feature
Weighting/Ranking)
Individual Evaluation Disadvantages
- It is incapable of removing redundant
features. - For high-dimensional data which may contain a
large number of redundant features, this approach
may produce results far from optimal.
Select Weakly Strongly
68- Feature selection
- Approaches
69- Feature selection
- New Framework
New Framework
- New framework of feature selection composed of
two steps
- First Step (Relevance analysis) determines the
subset of relevant features by removing
irrelevant ones. - Second Step (redundancy analysis) determines
and eliminates redundant features from relevant
ones and thus produces the final subset.
70Part 3 Applications of Feature
Selection And Software
71- Feature selection
- Applications of Feature Selection
72- Feature selection
- Applications of Feature Selection
- Text categorization Importance
- Information explosive
- 80 information stored in text documents
journals, web pages, emails... - Difficult to extract special information
- Current technologies...
?
?
Internet
73- Feature selection
- Applications of Feature Selection
- Text categorization
- Assigning documents to a fixed set of
categories
sports
categorizer
cultures
News article
health
politics
economics
vacations
74- Feature selection
- Applications of Feature Selection
- Text categorization
- Text-Categorization
- Documents are represented by a vector of
dimension the size of the vocabulary containing
word frequency counts - Vocabulary 15.000 words (i.e. each document is
represented by a 15.000-dimensional vector) - Typical tasks
- Automatic sorting of documents into
web-directories - Detection of spam-email
74
75- Feature selection
- Applications of Feature Selection
- Text categorization
- Major characteristic, or difficulty of text
categorization
High dimensionality of the feature space
- Goal Reduce the original feature space without
sacrificing categorization accuracy
76- Feature selection
- Applications of Feature Selection
- Image retrieval
- Importance Rapid increase of the size and
amount of image collections from both civilian
and military equipments - Problem Cannot access to or make use of the
information unless it is organized. - Content-based image retrieval Instead of being
manually annotated by - text-based keywords, images would be indexed by
their own visual contents (features), such as
color, texture, shape, etc.
One of the biggest problems to make content-based
image retrieval truly scalable to large size
image collections is still the curse of
dimensionality
77- Feature selection
- Applications of Feature Selection
- Image retrieval
- Paper ReliefF Based Feature Selection In
Content-Based Image Retrieval - A. sarrafzadeh, Habibollah Agh Atabay, Mir
Mosen Pedram, Jamshid Shanbehzadeh
- Image dataset Coil-20 contains
- 1440 grayscale pictures from 20 classes of
objects.
78- Feature selection
- Applications of Feature Selection
- Image retrieval
- In this paper They use
- Legendre moments to extract features
- ReliefF algorithm to select the most relevant
and non-redundant features - Support vector machine to classify images.
The effects of features on classification
accuracy
79- Feature selection
- Weka Software What we can do with ?
- Weka is a piece of software, written in Java,
that provides an array of machine learning tools,
many of which can be used for data mining - Pre-processing data
- Features selection
- Features extraction
- Regression
- Classify data
- Clustering data
- Associate rules
- More functions
- Create random data set
- Connect data sets in other formats
- Visualize data
- .
80 References
1 M. Dash and H.Liu, Dimensionality Reduction,
in Encyclopedia of Computer Science and
Engineering, John Wiley Sons, Inc 2,958-966,
2009. 2H. Liu and L. Yu, "Toward Integrating
Feature Selection Algorithms for Classification
and Clustering", presented at IEEE Trans. Knowl.
Data Eng, vol. 17, no.4, pp.491-502,
2005. 3I.Guyon and A.Elisseeff, "An
introduction to variable and feature selection",
Journal of Machine Learning Research 3,
11571182, 2003. 4 L. Yu and H. Liu,
Efficient Feature Selection via Analysis of
Relevance and Redundancy", presented at Journal
of Machine Learning Research, vol. 5,
pp.1205-1224, 2004. 5 H. Liu, and H. Motoda,
"Computational methods of feature selection",
Chapman and Hall/CRC Press, 2007. 6 I.Guyon,
Lecture 2 Introduction to Feature
Selection. 7 M.Dash and H.liu, Feature
selection for classification.
81 References
8 Makoto Miwa, A Survey on Incremental Feature
Extraction 9 Lei Yu, Feature Selection and Its
Application in Genomic Data Analysis