Title: Genetic Feature Subset Selection for Gender Classification: A Comparison Study
1Genetic Feature Subset Selection for Gender
Classification A Comparison Study
- Zehang Sun, George Bebis, Xiaojing Yuan, and
Sushil Louis - Computer Vision Laboratory
- Department of Computer Science
- University of Nevada, Reno
- bebis_at_cs.unr.edu
- http//www.cs.unr.edu/CVL
2Gender Classification
- Problem statement
- Determine the gender of a subject from facial
images. - Potential applications
- Face Recognition
- Human-Computer Interaction (HCI)
- Challenges
- Race, age, facial expression, hair style, etc.
3Gender Classification by Humans
- Humans are able to make fast and accurate gender
classifications. - It takes 600 ms on the average to classify faces
according to their gender (Bruce et al.,1987). - 96 accuracy has been reported using photos of
non-familiar faces without hair information
(Bruce et. al., 1993). - Empirical evidence indicates that gender
decisions are always made much faster than
identity. - Computation of gender and identity might be two
independent processes. - There is evidence that gender classification is
carried out by a separate population of cells in
the inferior temporal cortex (Damasio et. al.,
1990).
4Designing a Gender Classifier
- The majority of gender classification schemes are
based on supervised learning. - Definition
- Feature extraction determines an appropriate
subspace of dimensionality m in the original
feature space of dimensionality d (m ltlt d).
5Previous Approaches
- Geometry-based
- Use distances, angles, and areas among facial
features. - Point-to-point distances discriminant analysis
(Burton 93, Fellous 97) - Feature-to-feature distances HyberBF NNs
(Brunelli 92) - Wavelet features elastic graph matching
(Wiskott 95) - Appearance-based
- Raw images NNs (Cottrell 90, Golomb 91, Yen
94) - PCA NNs (Abdi 95),
- PCA nearest neighbor (Valentin 97)
- Raw images SVMs (Moghaddam 02)
6What Information is Useful for Gender
Classification?
- Geometry-based approaches
- Representing faces as a set of features assumes
a-priori knowledge about what are the features
and/or what are the relationships between them. - There is no simple set of features that can
predict the gender of faces accurately. - There is no simple algorithm for extracting the
features automatically from images. - Appearance-based approaches
- Certain features are nearly characteristic of one
sex or the other (e.g., facial hair for men,
makeup or certain hairstyles for women). - Easier to represent this kind of information
using appearance-based feature extraction
methods. - Appearance-based features, however, are more
likely to suffer from redundant and irrelevant
information.
7Feature Extraction Using PCA
- Feature extraction is performed by projecting the
data in a lower-dimensional space using PCA. - PCA maps the data in a lower-dimensional space
using a linear transformation. - The columns of the projection matrix are the
best eigenvectors (i.e., eigenfaces) of the
covariance matrix of the data.
8Which Eigenvectors Encode Mostly Gender-Related
Information?
Sometimes, it is possible to determine what
features are encoded by specific eigenvectors.
9Which Eigenvectors Encode Mostly Gender-Related
Information? (contd)
- All eigenvectors contain information relative to
the gender of faces, however, only the
information conveyed by eigenvectors with large
eigenvalues can be generalized to new faces (Abdi
et al, 1995). - Removing specific eigenvectors could in fact
improve performance (Yambor et al, 2000)
10Critique of Previous Approaches
- No explicit feature selection is performed.
- Same features used for face identification are
also used for gender classification. - Some features might be redundant or irrelevant.
- Rely heavily on the classifier.
- Classification accuracy can suffer.
- Time consuming training and classification.
11Project Goal
- Improve the performance of gender classification
using feature subset selection.
12Feature Selection
What constitutes a good set of features for
classification?
- Definition
- Given a set of d features, select a subset of
size m that leads to the smallest classification
error. - Filter Methods
- Preprocessing steps performed independent of the
classification algorithm or its error criteria. - Wrapper Methods
- Search through the space of feature subsets using
the criterion of the classification algorithm to
select the optimal feature subset. - Provide more accurate solutions than filter
methods, but in general are more computationally
expensive.
13What are the Benefits?
- Eliminate redundant and irrelevant features.
- Less training examples are required.
- Faster and more accurate classification.
14Project Objectives
- Perform feature extraction by projecting the
images in a lower-dimensional space using
Principal Components Analysis (PCA). - Perform feature selection in PCA space using
Genetic Algorithms. - Test four traditional classifiers (Bayesian, LDA,
NNs, and SVMs). - Compare with traditional feature subset selection
approaches (e.g., Sequential Backward Floating
Search (SBFS)).
15Genetic Algorithms (GAs) Review
- What is a GA?
- An optimization technique for searching very
large spaces. - Inspired by the biological mechanisms of natural
selection and reproduction. - What are the main characteristics of a GA?
- Global optimization technique.
- Uses objective function information, not
derivatives. - Searches probabilistically using a population of
structures (i.e., candidate solutions using some
encoding). - Structures are modified at each iteration using
selection, crossover, and mutation.
16Structure of GA
- 10010110 10010110
- 01100010 01100010
- 10100100... 10100100
- 10010010 01111001
- 01111101 10011101
Evaluation and Selection
Crossover
Mutation
Current Generation
Next Genaration
17Encoding and Fitness Evaluation
- Encoding scheme
- Transforms solutions in parameter space into
finite length strings (chromosomes) over some
finite set of symbols. - Fitness function
- Evaluates the goodness of a solution.
18Selection Operator
- Probabilistically filters out solutions that
perform poorly, choosing high performance
solutions to exploit. - Chromosomes with high fitness are copied over to
the next generation.
fitness
19Crossover and Mutation Operators
- Generate new solutions for exploration.
- Crossover
- Allows information exchange between points.
- Mutation
- Its role is to restore lost genetic material.
Mutated bit
20Genetic Feature Subset Selection
- Binary encoding
- Fitness evaluation
EV1
EV250
(search using first 250 eigenvectors)
fitness104?accuracy 0.4 ? zeros
accuracy from validation set
number of features
21Genetic Feature Subset Selection (contd)
- Cross-generational selection strategy
- Assuming a population of size N, the offspring
double the size of the population, and we select
the best N individuals from the combined
parent-offspring population. - GA parameters
- Population size 350
- Number of generations 400
- Crossover rate 0.66
- Mutation rate 0.04
22Dataset
- 400 frontal images from 400 different people
- 200 male, 200 female
- Different races
- Different lighting conditions
- Different facial expressions
- Images were registered and normalized
- No hair information
- Account for different lighting conditions
23Experiments
- Gender classifiers
- Linear Discriminant Analysis (LDA)
- Bayes classifier
- Neural Network (NN) classifier
- Support Vector Machine (SVM) classifier
- Three - fold cross validation
- Training set 75 of the data
- Validation set 12.5 of the data
- Test set 12.5 of the data
24Classification Error Rates
22.4
17.7
14.2
13.3
11.3
8.9
9
6.7
4.7
ERM error rate using manually selected feature
subsets ERG error rate using GA selected
feature subsets
25Ratio of Features - Information Kept
69
61.2
42.8
38
36.4
32.4
31
17.6
13.3
8.4
RN percentage of number of features in the
feature subset RI percentage of information
contained in the feature subset.
26Distribution of Selected Eigenvectors
(a) LDA
(b) Bayes
(d) SVMs
(c) NN
27Reconstructed Images
Original images
Using top 30 EVs
Using EVs selected by LDA-PCAGA
Using EVs selected by B-PCAGA
28Reconstructed Images (contd)
Original images
Using top 30 EVs
Using EVs selected by NN-PCAGA
Using EVs selected by SVM-PCAGA
Reconstructed faces using GA-selected EVs have
lost information about identity but do disclose
strong gender information
Certain gender-irrelevant features do not appear
in the reconstructed images using GA-selected
EVs
29Comparison with SBFS
- Sequential Backward Floating Search (SBFS) is a
combination of two heuristic search schemes - (1) Sequential Forward Selection (SFS)
- - starts with an empty feature set and at each
set selects the best single feature to be added
to the feature subset. - (2) Sequential Backward Selection (SBS).
- - starts with the entire feature and at each step
drops the feature whose absence least decreases
the performance.
30Comparison with SBFS (contd)
- SBFS is an advanced version of plus l - take away
r method that first enlarges the feature subset
by l features using forward selection and then
removes r features using backward selection. - The number of forward and backward steps in SBFS
is dynamically controlled and updated based on
the classifiers performance.
31Comparison with SBFS (contd)
(b) SVMsGA
(a) SVMsSBFS
ERM error rate using the manually selected
feature subsets ERG error rate using GA
selected feature subsets. ERSBFS error rate
using SBFS
32Comparison with SBFS (contd)
Original images
Using top 30 EVs
Using EVs selected by SVM-PCAGA
Using EVs selected by SVM-PCASBFS
33Conclusions
- We have considered the problem of gender
classification from frontal facial images using
genetic feature subset selection. - GAs provide a simple, general, and powerful
framework for feature subset selection. - Very useful, especially when the number of
training examples is small. - We have tested four well-known classifiers using
PCA for feature extraction. - Genetic subset feature selection has led to lower
error rates in all cases.
34Future Work
- Generalize feature encoding scheme.
- Use weights instead of 0/1 encoding.
- Consider more powerful fitness functions.
- Use larger data sets.
- FERET data set.
- Apply feature selection using different features.
- Various features (e.g., Wavelet or Gabor
features) - Experiment with different data sets.
- Different data sets (e.g., vehicle detection)