Title: Feature Selection: Algorithms and Challenges
1Feature Selection Algorithms and Challenges
- Joint Work with Yanglan Gang, Hao Wang Xuegang
Hu - Xindong Wu
- University of Vermont, USA
- Hefei University of Technology, China
- ???????????????????
2Deduction Induction My Research Background
3Outlines
- Why feature selection
- What is feature selection
- Components of feature selection
- Some research efforts by myself
- Challenges in feature selection
41. Why Feature Selection?
- High-dimensional data often contain irrelevant or
redundant features - reduce the accuracy of data mining algorithms
- slow down the mining process
- be a problem in storage and retrieval
- hard to interpret
52. What Is Feature Selection?
- Select the most relevant subset of attributes
according to some selection criteria.
6Outlines
- Why feature selection
- What is feature selection
- Components of feature selection
- Some research efforts by myself
- Challenges in feature selection
7Traditional Taxonomy
- Wrapper approach
- Features are selected as part of the mining
algorithm - Filter approach
- Features selected before a mining algorithm,using
heuristics based on general characteristics of
the data, rather than a learning algorithm to
evaluate the merit of feature subsets - Wrapper approach is generally more accurate but
also more computationally expensive.
8Components of Feature Selection
- Feature selection is actually a search problem,
including four basic components - an initial subset
- one or more selection criteria ()
- a search strategy ()
- some given stopping conditions
9Feature Selection Criteria
- Selection criteria generally use relevance to
estimate the goodness of a selected feature
subset in one way or another - Distance Measure
- Information Measure
- Inconsistency Measure
- Relevance Estimation
- Selection Criteria related to Learning Algorithms
(wrapper approach) - Some unified framework for relevance has been
proposed recently.
10Search Strategy
- Exhaustive Search
- Every possible subset is evaluated and the best
one is chosen - Guarantee the optimal solution
- Low efficiency
- A modified approach BB
11Search Strategy (2)
- Heuristic search
- Sequential search, including SFS,SFFS,SBS and
SBFS - SFS Start with empty attribute set
- Add best of attributes
- Add best of remaining attributes
- Repeat until the maximum performance is
reached - SBS Start with the entire attribute set
- Remove worst of attributes
- Repeat until the maximum performance has
been reached.
12Search Strategy (3)
- Random search
- It proceeds in two different ways
- Inject randomness into classical sequential
approaches (simulated annealing, beam search, the
genetic algorithm , and random-start
hill-climbing) - Generate the next subset randomly
- The use of randomness can help to escape local
optima in the search space, and the optimality of
the selected subset would depend on the available
resources.
13Outlines
- Why feature selection
- What is feature selection
- Components of feature selection
- Some research efforts by myself
- Challenges in feature selection
14RITIO Rule Induction Two In One
- Feature selection using the information gain in a
reverse order - Delete features that are lest informative
- Results are significant compared to forward
selection - Wu et al 1999, TKDE.
15Induction as Pre-processing
- Use one induction algorithm to select attributes
for another induction algorithm - Can be a decision-tree method for rule induction,
or vice versa - Accuracy results are not as good as expected
- Reason feature selection normally causes
information loss - Details Wu 1999, PAKDD.
16Subspacing with Asysmetric Bagging
- When the number of examples is less than the
number of attributes - When the number of positive examples is smaller
than the number of negative examples - An example content-based information retrieval
- Details Tao et al., 2006, TPAMI.
17Outlines
- Why feature selection
- What is feature selection
- Components of feature selection
- Some research efforts by myself
- Challenges in feature selection
18Challenges in Feature Selection (1)
- Dealing with ultra-high dimensional data and
feature interactions - Traditional feature selection encounter two
major problems when the dimensionality runs into
tens or hundreds of thousands - curse of dimensionality
- the relative shortage of instances.
-
19Challenges in Feature Selection (2)
- Dealing with active instances (Liu et al., 2005)
- When the dataset is huge, feature selection
performed on the whole dataset is inefficient, - so instance selection is necessary
- Random sampling (pure random sampling without
exploiting any data characteristics) - Active feature selection (selective sampling
using data characteristics achieves better or
equally good results with a significantly smaller
number of instances).
20Challenges in Feature Selection (3)
- Dealing with new data types (Liu et al., 2005)
- traditional data type an NM data matrix
- Due to the growth of computer and Internet/Web
techniques, new data types are emerging - text-based data (e.g., e-mails, online news,
newsgroups) - semistructure data (e.g., HTML, XML)
- data streams.
21Challenges in Feature Selection (4)
- Unsupervised feature selection
- Feature selection vs classification almost every
classification algorithm - Subspace method with the curse of dimensionality
in classification - Subspace clustering.
22Challenges in Feature Selection (5)
- Dealing with predictive-but-unpredictable
attributes in noisy data - Attribute noise is difficult to process, and
removing noisy instances is dangerous - Predictive attributes essential to
classification - Unpredictable attributes cannot be predicted by
the class and other attributes - Noise identification, cleansing, and measurement
need special attention Yang et al., 2004
23Challenges in Feature Selection (6)
- Deal with inconsistent and redundant features
- Redundancy can indicate reliability
- Inconsistency can also indicate a problem for
handling - Researchers in Rough Set Theory What is the
purpose of feature selection? - Can you really demonstrate the usefulness of
reduction, in data mining accuracy, or what? - Removing attributes can well result in
information loss - When the data is very noisy, removals can cause a
very different data distribution - Discretization can possibly bring new issues.
24Concluding Remarks
- Feature selection is and will remain an important
issue in data mining, machine learning, and
related disciplines - Feature selection has a price in accuracy for
efficiency - Researchers need to have the bigger picture in
mind, not just doing selection for the purpose of
feature selection.