Classification and Feature Selection Algorithms for Multi-class CGH data PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Classification and Feature Selection Algorithms for Multi-class CGH data


1
Classification and Feature Selection Algorithms
for Multi-class CGH data
  • Jun Liu, Sanjay Ranka, Tamer Kahveci
  • http//www.cise.ufl.edu

2
Gene copy number
  • The number of copies of genes can vary from
    person to person.
  • 0.4 of the gene copy numbers are different for
    pairs of people.
  • Variations in copy numbers can alter resistance
    to disease
  • EGFR copy number can be higher than normal in
    Non-small cell lung cancer.

Lung images (ALA)
Cancer
Healthy
3
Comparative Genomic Hybridization (CGH)
4
Raw and smoothed CGH data
5
Example CGH dataset
862 genomic intervals in the Progenetix database
6
Problem description
  • Given a new sample, which class does this sample
    belong to?
  • Which features should we use to make this
    decision?

7
Outline
  • Support Vector Machine (SVM)
  • SVM for CGH data
  • Maximum Influence Feature Selection algorithm
  • Results

8
SVM in a nutshell
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
9
Classification with SVM
  • Consider a two-class, linearly separable
    classification problem
  • Many decision boundaries!
  • The decision boundary should be as far away from
    the data of both classes as possible
  • We should maximize the margin, m

Class 2
m
Class 1
10
SVM Formulation
  • Let x1, ..., xn be our data set and let yi Î
    1,-1 be the class label of xi
  • Maximize J over ai
  • The decision boundary can be constructed as

11
SVM for CGH data
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
12
Pairwise similarity measures
  • Raw measure
  • Count the number of genomic intervals that both
    samples have gain (or loss) at that position.

Raw 3
13
SVM based on Raw kernel
  • Using SVM with the Raw kernel amounts to solving
    the following quadratic program
  • The resulting decision function is

Maximize J over ai
Use Raw kernel to replace
Use Raw kernel to replace
Is this cool?
14
Is Raw kernel valid?
  • Not all similarity function can serve as kernel.
    This requires the underlying kernel matrix M is
    positive semi-definite.
  • M is positive semi-definite if for all vectors v,
    vTMv 0

15
Is Raw kernel valid?
  • Proof define a function F() where
  • F a ?1, 0, -1m ? b ?1, 02m,where
  • F(gain) F(1) 01
  • F(no-change) F(0) 00
  • F(loss) F(-1) 10
  • Raw(X, Y) F(X)T F(Y)

X 0 1 1 0 1 -1 Y 0 1 0 -1 -1 -1

Raw(X, Y) 2
F(X)T F(Y) 2
16
Raw Kernel is valid!
  • Raw kernel can be written as Raw(X, Y) F(X)T
    F(Y)
  • Define a 2m by n matrix
  • Therefore,

Let M denote the Kernel matrix of Raw
17
MIFS algorithm
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
18
MIFS for multi-class data
One-versus-all SVM
19
Results
Support Vector Machine (SVM) SVM for CGH
data Maximum Influence Feature Selection
algorithm Results
20
Dataset Details
Data taken from Progenetix database
21
Datasets
Similarity level Similarity level Similarity level Similarity level
cancers best good fair poor
2 478 466 351 373
4 1160 790 800 800
6 1100 850 880 810
8 1000 830 750 760
Dataset size
22
Experimental results
  • Comparison of linear and Raw kernel

On average, Raw kernel improves the predictive
accuracy by 6.4 over sixteen datasets compared
to linear kernel.
23
Experimental results
Accuracy
Using 80 features results in accuracy that is
comparable or better than using all features
Using 40 features results in accuracy that is
comparable to using all features
Number of Features
(Fu and Fu-Liu, 2005)
(Ding and Peng, 2005)
24
Using MIFS for feature selection
  • Result to test the hypothesis that 40 features
    are enough and 80 features are better

25
A Web Server for Mining CGH Data
http//cghmine.cise.ufl.edu8007/CGH/Default.html
26
Thank you
27
Appendix
28
Minimum Redundancy and Maximum Relevance (MRMR)
  • Relevance V is defined as the average mutual
    information between features and class labels
  • Redundancy W is defined as the average mutual
    information between all pairs of features
  • Incrementally select features by maximizing (V /
    W) or (V W)

Y
1
X
0
1
29
Support Vector Machine Recursive Feature
Elimination (SVM-RFE)
Train a linear SVM based on feature set
Compute the weight vector
Compute the ranking coefficient wi2 for the ith
feature
Remove the feature with smallest ranking
coefficient
Is feature set empty?
N
Y
30
Pairwise similarity measures
  • Sim measure
  • Segment is a contiguous block of aberrations of
    the same type.
  • Count the number of overlapping segment pairs.

Sim 2
31
Non-linear Decision Boundary
  • How to generalize SVM when the two class
    classification problem is not linearly separable?
  • Key idea transform xi to a higher dimensional
    space to make life easier
  • Input space the space the point xi are located
  • Feature space the space of f(xi) after
    transformation

Input space
Write a Comment
User Comments (0)
About PowerShow.com