Title: Classification and Feature Selection Algorithms for Multi-class CGH data
1Classification and Feature Selection Algorithms
for Multi-class CGH data
- Jun Liu, Sanjay Ranka, Tamer Kahveci
- http//www.cise.ufl.edu
2Gene copy number
- The number of copies of genes can vary from
person to person. - 0.4 of the gene copy numbers are different for
pairs of people. - Variations in copy numbers can alter resistance
to disease - EGFR copy number can be higher than normal in
Non-small cell lung cancer.
Lung images (ALA)
Cancer
Healthy
3Comparative Genomic Hybridization (CGH)
4Raw and smoothed CGH data
5Example CGH dataset
862 genomic intervals in the Progenetix database
6Problem description
- Given a new sample, which class does this sample
belong to? - Which features should we use to make this
decision?
7Outline
- Support Vector Machine (SVM)
- SVM for CGH data
- Maximum Influence Feature Selection algorithm
- Results
8SVM in a nutshell
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
9Classification with SVM
- Consider a two-class, linearly separable
classification problem - Many decision boundaries!
- The decision boundary should be as far away from
the data of both classes as possible - We should maximize the margin, m
Class 2
m
Class 1
10SVM Formulation
- Let x1, ..., xn be our data set and let yi Î
1,-1 be the class label of xi - Maximize J over ai
- The decision boundary can be constructed as
11SVM for CGH data
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
12Pairwise similarity measures
- Raw measure
- Count the number of genomic intervals that both
samples have gain (or loss) at that position.
Raw 3
13SVM based on Raw kernel
- Using SVM with the Raw kernel amounts to solving
the following quadratic program - The resulting decision function is
Maximize J over ai
Use Raw kernel to replace
Use Raw kernel to replace
Is this cool?
14Is Raw kernel valid?
- Not all similarity function can serve as kernel.
This requires the underlying kernel matrix M is
positive semi-definite. - M is positive semi-definite if for all vectors v,
vTMv 0
15Is Raw kernel valid?
- Proof define a function F() where
- F a ?1, 0, -1m ? b ?1, 02m,where
- F(gain) F(1) 01
- F(no-change) F(0) 00
- F(loss) F(-1) 10
- Raw(X, Y) F(X)T F(Y)
X 0 1 1 0 1 -1 Y 0 1 0 -1 -1 -1
Raw(X, Y) 2
F(X)T F(Y) 2
16Raw Kernel is valid!
- Raw kernel can be written as Raw(X, Y) F(X)T
F(Y) - Define a 2m by n matrix
- Therefore,
Let M denote the Kernel matrix of Raw
17MIFS algorithm
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
18MIFS for multi-class data
One-versus-all SVM
19Results
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
20Dataset Details
Data taken from Progenetix database
21Datasets
Similarity level Similarity level Similarity level Similarity level
cancers best good fair poor
2 478 466 351 373
4 1160 790 800 800
6 1100 850 880 810
8 1000 830 750 760
Dataset size
22Experimental results
- Comparison of linear and Raw kernel
On average, Raw kernel improves the predictive
accuracy by 6.4 over sixteen datasets compared
to linear kernel.
23Experimental results
Accuracy
Using 80 features results in accuracy that is
comparable or better than using all features
Using 40 features results in accuracy that is
comparable to using all features
Number of Features
(Fu and Fu-Liu, 2005)
(Ding and Peng, 2005)
24Using MIFS for feature selection
- Result to test the hypothesis that 40 features
are enough and 80 features are better
25A Web Server for Mining CGH Data
http//cghmine.cise.ufl.edu8007/CGH/Default.html
26Thank you
27Appendix
28Minimum Redundancy and Maximum Relevance (MRMR)
- Relevance V is defined as the average mutual
information between features and class labels - Redundancy W is defined as the average mutual
information between all pairs of features - Incrementally select features by maximizing (V /
W) or (V W)
Y
1
X
0
1
29Support Vector Machine Recursive Feature
Elimination (SVM-RFE)
Train a linear SVM based on feature set
Compute the weight vector
Compute the ranking coefficient wi2 for the ith
feature
Remove the feature with smallest ranking
coefficient
Is feature set empty?
N
Y
30Pairwise similarity measures
- Sim measure
- Segment is a contiguous block of aberrations of
the same type. - Count the number of overlapping segment pairs.
Sim 2
31Non-linear Decision Boundary
- How to generalize SVM when the two class
classification problem is not linearly separable? - Key idea transform xi to a higher dimensional
space to make life easier - Input space the space the point xi are located
- Feature space the space of f(xi) after
transformation
Input space