Feature%20Selection%20Focused%20within%20Error%20Clusters - PowerPoint PPT Presentation

About This Presentation
Title:

Feature%20Selection%20Focused%20within%20Error%20Clusters

Description:

Best k features chosen separately are usually not the best k when chosen ... To select the optimal subset, one has to exhaustively search through all ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: selen5
Category:

less

Transcript and Presenter's Notes

Title: Feature%20Selection%20Focused%20within%20Error%20Clusters


1
Feature Selection Focused within Error Clusters
  • Sui-Yu Wang and Henry Baird
  • Presented by Sui-Yu Wang

2
Feature Selection
  • Given a set of n features, find a subset of k lt n
    features that still performs well
  • Best k features chosen separately are usually not
    the best k when chosen together (Elashoff et. al,
    1967)
  • To select the optimal subset, one has to
    exhaustively search through all k-elements
    subsets (Cover and Campenhout, 1977)
  • Given limited number of training samples and
    features, finding the minimum subset of features
    without misclassifying any training sample is NP
    complete (Van Horn and Martinez, 1994)

3
Feature Selection
  • Methods can be divided into three categories
    wrappers, filters, and embedded methods. (Guyon
    and Elisseeff, 2003)
  • Filters rank features according to various
    metrics
  • Wrappers evaluate subset of features according
    to given classifier
  • Embedded methods similar to wrapper, but uses
    non-exhaustive search methods

4
A Motivating Example
  • Task Classify each pixel into handwriting or
    blank
  • We have to search in a diameter of 25 pixels to
    get any useful features D 450 pixel values
  • So possible features can be extremely numerous
    any combination of 450 pixel values

4
5
Popular Method PCA
  • Principal Components Analysis
  • PCA finds a small number of linear combinations
    of original features
  • PCA finds the dimension that represents the data
    best in a least square sense, but does not
    guarantee good separation of data (Pearson, 1901)
  • Most algorithms employee PCA first then operate
    respective feature selection algorithm on the
    reduced set
  • Could throw away potentially interesting
    information

6
Our Research Strategy
  • We want to find methods for guiding the search
    for a few strongly discriminating features.
  • We adopt a greedy heuristic constructing one
    feature at a time.
  • We focus our search on cases where the current
    features fail.

7
Formalities
  • We assume a two class problem
  • The original sample space is , D is huge
  • We are given d ltlt D hand-crafted features, all
    samples are projected into this feature space
    by feature extractor . We may lose
    information during the process
  • If there is any discriminating information in the
    sample space but not in the feature space
    , it is must be in the null space

8
Finding the Null Space
  • If is linear, the null space can be computed
    by linear algebra methods
  • Given , a singular value
    decomposition, or SVD, can be used to find the
    set of vectors spanning the null space of
  • can be factorized as
    where and
    are orthogonal matrices
  • And

9
Finding the Next Feature
  • Samples that fall at the same point in
    are not discriminated by the current feature set
  • Samples that lie in tight clusters in are
    only weakly discriminated by the current feature
    set
  • A tight cluster of errors of both classes
    indicates cases where the current feature set
    fails completely
  • Therefore, we use these tight clusters to guide
    the forward search for new features
  • Once we have projected samples from the tight
    error cluster into the null space, we find a
    hyperplane that best separates the data,
    and calculate a given sample xs distance to this
    hyperplane, , as the new feature

10
Operate on Points in the Null Space
  • There are many ways to projects points in the
    sample space into the null space of ,
  • The orthogonal projection onto a particular
    subspace is unique
  • Let where
    is an orthonormal basis for the subspace
    . Then

11
(No Transcript)
12
Outline of the Algorithm
  • Repeat
  • Draw enough samples to train a classifier
  • Draw enough samples to build a test set
  • Find clusters of errors in
  • Repeat
  • Choose a tight cluster with both types of
    errors
  • Draw enough samples to populate this cluster
    (if necessary)
  • Project the cluster into the null space
  • Find a separating hyperplane in the null space
    with
  • normal vector that best separates the
    samples in this cluster
  • Construct a new feature and examine its
    performance
  • Until the feature lowers the error rate
    sufficiently
  • Until the error rate is satisfactory to the user

13
Experiments
  • Experiments were conducted on a document image
    content extraction problem
  • Each image pixel is treated as a sample
  • The task is to classify each sample into
    handwriting or machine print
  • Possible features are extracted from a 25?25
    pixel square, D625

14
Experimental Results
PH
MP
BL
HW
15
Experiments
  • We divide the data into three sets training set,
    discovery set, and test set.
  • The training set consists of 4,469,740 MP samples
    and 943,178 HW samples
  • The feature discovery set consists of 4,980,418
    MP and 1,496,949 HW samples
  • The test set consists of 816,673 MP samples and
    649,113 HW samples

16
Experimental Results
17
Which Cluster is Best?
  • Experiments suggest that tight balanced clusters
    are best

Cluster 1 2 3 4 5
error rate 14.4 15.5 15.6 13.5 14.8
Balance 90 60 51 52 65
Tightness 83 90 83 70 80
18
Future Work
  • Apply the method to other problems
  • Continue the experiment to see how low the error
    can drop
  • Analyze cluster statistics to establish rules for
    selecting better cluster candidate
  • Try other hyperplane-finding methods
  • Establish theoretical framework as to when this
    approach is guaranteed to work and when it fails
Write a Comment
User Comments (0)
About PowerShow.com