Title: Greedy Feature Grouping for Optimal Discriminant Subspaces
1Greedy Feature Grouping for Optimal Discriminant
Subspaces
- Mahesan Niranjan
- Department of Computer Science
- The University of Sheffield
-
- European Bioinformatics Institute
2Overview
- Motivation
- Feature Selection
- Feature Grouping Algorithm
- Simulations
- Synthetic Data
- Gene Expression Data
- Conclusions and Future
3Motivation
- Many new high dimensional problems
- Language processing
- Synthetic chemical molecules
- High throughput experiments in genomics
- Discriminant information may well lie in a small
subspace - Better classifiers
- Better interpretation of classifier
4Curse of dimensionality
Density estimation in high dimensions is difficult
5Support Vector Machines
Classification, not density estimation
6Support Vector MachinesNonlinear Kernel Functions
7Classifier design
- Usually to minimize error rate
- Error rates can be misleading
- Large imbalance in classes
- Cost of misclassification can change
8Adverse Outcome
x
Benign Outcome
x
x
Class Boundary
x
x
x
x
x
x
x
x
x
Threshold
9True Positive
False Positive
Area under the ROC Curve Neat Statistical
Interpretation
10Convex Hull of ROC Curves
True Positive
False Positive
Provost Fawcette Scott, Niranjan Prager
11Feature selection in classification
- Filters
- select subset that scores high
- Wrappers
- Sequential Forward Selection / Backward deletion
- Parcel
- Scott, Niranjan Prager uses convex hulls of
ROC curves
12PARCEL Feature subset selection
- Area under Convex Hull of multiple ROCs
- Different classifier architectures (including
different features) in different operating
points. - Has been put to good use on independent
implementations - Oxford, UCL, Surrey
- Sheffield Speech Group
13Gene Expression Microarrays
14Inference problems in Microarray Data
- Clustering
- Similar expression patterns might imply
- similar function
- regulated in the same way
- e.g. activated by the same transcription
factor concentration maintained by same
mechanism etc - Classification
- diagnostics - e.g. disease / not
- prediction - e.g. survival
? discrimination with features that do cluster
15Subspaces of gene expressions
- Singular Value Decomposition (SVD)
- Robust SVD for missing values outliers
- Combining different datasets
- Pseudo-inverse Projection
- Generalized SVD
Eigenarrays Eigengenes
Alter, Brown Botstein PNAS,
2000 Alter Golub PNAS, 2004
16Yeast Gene Classification Switch to MATLAB
here
2000 yeast genes 79 experiments Ribosome /
Not (125) (1750) First use of SVM
Brown et al PNAS 1999
17Discriminant Subspaces
18Seemingly similar models
- Product of Experts ( Hinton )
- Modular Mixture Model ( Attias )
- mixture model in subspaces
- Combined by hidden nodes
Full feature set
None of these search for combinations of features
19Algorithm
Select M Initial Assignment -- one feature
per group Sequential search through remaining
-- which feature, which group -- maximize
average AUROC / Sum of Fisher Ratios Stopping
criterion
At random / domain knowledge
20Another view
Within Class Scatter
Separation of Means
21Another view
22Block diagonal scatter matrix
23Simulations
- Synthetic Data
- 50 dimensions
- 3 groups of 4 dimensions 38 irrelevant
- ( random, but valid, covariance matrices )
- 40 examples ( 20 / 20 )
- 100 simulations
- Results
- 70 of the time correct features identified
- Often incorrect group assignment
24Simulations
- Microarray Data Rosenwald et al 2002
- cancer patients, survival after chemotherapy
- 240 data points 7000 genes on array
- (filtered to 1000
genes) - two classes survived to 4 years / not
- 10 ? 20 subspaces random initial seeds 60 runs
- classification accuracy comparable to reported
results - 10 of runs failed to group the features
- ( discrimination based on single subspace )
25AML / ALL Leukaemia data
26Conclusions
- Searching for feature groups
- Desirable
- Feasible
- Achieves discriminant clustering
- Next Step
- Biological interpretation of groups
- comparison of genes in clusters with
- functional annotations, where known
- ( gene ontology )
- Careful initialization ( known pathways )