Correlation Aware Feature Selection - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Correlation Aware Feature Selection

Description:

effective variables modeling the classification function. N ... ( ignoring labelling) Induced by a classifier. Support Vector Machines. Classification function: ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 25

Provided by: menn3

Category:

more less

Transcript and Presenter's Notes

Title: Correlation Aware Feature Selection

1
Correlation Aware Feature Selection
Annalisa BarlaCesare FurlanelloGiuseppe
JurmanStefano MerlerSilvano Paoli
http//mpa.itc.it
Berlin 8/10/2005
2
Overview

On Feature Selection
Correlation Aware Ranking
Synthetic Example

3
Feature Selection

Step-wise variable selection

One feature vs. N features
nltN effective variables modeling the
classification function
N features

Step 1
Step N

N steps
4
Feature Selection
Step-wise selection of the features.
Ranked Features Discarded Features
Steps
5
Ranking

Classifier independent filters
Prefiltering is risky you might discard features
that turns out to be important. (ignoring
labelling)
Induced by a classifier

6
Support Vector Machines
Classification function
Optimal Separating Hyperplane
7
The classification/ranking machine

The RFE idea given N features (genes)
Train a SVM
Compute a cost function J from the weight
coefficients of the the SVM
Rank features in terms of contribution to J
Discard the feature less contributing to J
Reapply procedure on the N-1 features
This is called Recursive Feature Elimination
(RFE)
Features are ranked according to their
contribute to the classification, given the
training data.
Time and data consuming, and at risk of selection
bias

Guyon et al. 2002
8
RFE-based Methods

Considering chunks of data at a time
Parametrics
Sqrt(N) RFE
Bisection RFE
Non-Parametrics
E RFE (adapting to weight distribution)thresh
olding weights to a value w

9
Variable Elimination
Correlated genes
Given Fx1, x2, , xH such that
for a given threshold T.
Each single weight is negligible
w(x1)w(x2) e lt w
BUT
w(x1)w(x2) gtgt w
10
Correlated Genes (1)
11
Correlated Genes (2)
12
Synthetic Data

Binary problem
100 (50 50) samples of 1000 genes
genes 1?50 randomly extracted from N(1,1) and
N(-1,1) respectively
genes 50?100 randomly extracted from N(1,1)
and N(-1,1) respectively
(1 repeated 50 times)
genes 101? 1000 extracted from UNIF(-4,4)

1
1000
50
100
Class 1 50
51 significantfeatures
Class 2 50
50
1x50
13
Our algorithm
step j
14
Methodology

Implemented within the BioDCV system (50
replicates)
Realized through R - C code interaction

15
Synthetic Data
1
100
1000
50
steps
Gene 100 is consistently ranked as 2nd
16
Work in Progress

Preservation of high correlated genes with low
initial weights on microarrays datasets
Robust correlation measures
Different techniques to detect Fl families
(clustering, gene functions)

17
(No Transcript)
18
Synthetic Data
19
Synthetic Data
Features discarded at step 9 from E-RFE procedure

51 52 53 54 55 56
57 58 59 60
61 62 63 64 65 66 67
68 69 70
71 72 73 74 75 76
77 78 79 80
81 82 83 84 85 86
87 88 89 90
91 92 93 94 95 96
97 98 99 100
227 559 864 470 363 735

Correlation Correction
Saves feature 100
20
Challenges
Challenges for predictive profiling

INFRASTRUCTURE
MPACluster -gt available for batch jobs
Connecting with IFOM -gt 2005
Running at IFOM -gt 2005/2006
Production on GRID resources (spring 2005)

ALGORITHMS II
Gene list fusion suite of algebraic/statistical
methods
Prediction over multi-platform gene expression
datasets (sarcoma, breast cancer) large scale
semi-supervised analysis
New SVM Kernels for prediction on spectrometry
data within complete validation

21
Prefiltering is risky you might discard features
that turns out to be important.
Nevertheless, wrapper methods are quite
costing. Moreover, in the gene expression data,
you have to deal also with particular situations
like clones or highly correlated features that
may represent a pitfall for several selection
methods.
A classic alternative is to map into linear
combination of features,and then select.
Principal Component Analysis
Metagenes (a simplified model for pathways but
biological suggestions require caution)
eigen-craters for unexploded bomb risk maps
But we are not working anymore with the original
features.
22
(No Transcript)
23
A few issues in feature selectionwith a
particular interest on classificationof genomic
data
WHY?
To enhance information
To ease computational burden
Discard the (apparently) less significant
features and train in a simplified space
alleviate the curse of dimensionality
Highlight (and rank) the most important features
and improve the knowledge of the underlying
process.
HOW?
As a pre-processing step
As a learning step
Link the feature ranking to the classification
task wrapper methods,
Employ a statistical filter (t-test, S2N)
24
Prefiltering is risky you might discard features
that turns out to be important.
Nevertheless, wrapper methods are quite
costing. Moreover, in the gene expression data,
you have to deal also with particular situations
like clones or highly correlated features that
may represent a pitfall for several selection
methods.
A classic alternative is to map into linear
combination of features,and then select.
Principal Component Analysis
Metagenes (a simplified model for pathways but
biological suggestions require caution)
eigen-craters for unexploded bomb risk maps
But we are not working anymore with the original
features.
25
Feature Selection within Complete Validation
Experimental Setups
Complete Validation is needed to decouple model
tuning from (ensemble) model accuracy estimation
otherwise selection bias effects
Accumulating rel. importance from Random Forest
models for the identification of sensory
drivers(with P. Granitto, IASMA)

Write a Comment

User Comments (0)