Knowledge-based%20Analysis%20of%20Microarray%20Gene%20Expression%20Data%20using%20Support%20Vector%20Machines

About This Presentation

Title:

Knowledge-based%20Analysis%20of%20Microarray%20Gene%20Expression%20Data%20using%20Support%20Vector%20Machines

Description:

Knowledge-based Analysis of Microarray Gene Expression Data using Support Vector Machines Michael P. S. Brown, William Noble Grundy, David Lin, Nello Cristianini ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 16

Provided by: Yavar9

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge-based%20Analysis%20of%20Microarray%20Gene%20Expression%20Data%20using%20Support%20Vector%20Machines

1
Knowledge-based Analysis of Microarray Gene
Expression Data using Support Vector Machines

Michael P. S. Brown, William Noble Grundy, David
Lin, Nello Cristianini, Charles Sugnet, Terrence
S. Furey, Manuel Ares, Jr. David Haussler

Proceedings of the National Academy of Sciences.
2000
2
Overview

Objective Classify genes based on functionality
Observation Genes of similar function yield
similar expression pattern in microarray
hybridization experiments
Method Use SVM to build classifiers, using
microarray gene expression data.

3
Previous Methods

Most current methods employ unsupervised learning
methods (at the time of the publication)
Genes are grouped using clustering algorithms
based on a distance measure
Hierarchical clustering
Self-organizing maps

4
Advantages of Supervised Learning and SVMs

Supervised methods can take advantage of prior
knowledge
SVMs are well suited to extremely
high-dimensional feature space

5
DNA Microarray Data

Each data point represents the ratio of
expression levels of a particular gene in an
experimental condition and a reference condition
n genes on a single chip
m experiments performed
The results is an n by m matrix of
expression-level ratios

m experiments
m-element expression vector for a single gene
n genes
6
DNA Microarray Data

Normalized logarithmic ratio
For gene X, in experience i, define
Ei is the expression level in the experiment
Ri is the expression level in the reference state
Xi(x1, x2,..., xn) is the normalized logarithmic
ratio

Xi is positive when the gene is induced (turned
up)
Xi is negative when the gene is repressed (turned
down)

7
Support Vector Machines

Searches for a hyperplane that
Maximizes the margin
Minimizes the violation of the margin

Edda Leopold and Jörg Kindermann
8
Linear Inseparability

What if data points are not linearly separable?

Andrew W. Moore
9
Linear Inseparability

Map the data to higher-dimension space

Andrew W. Moore
10
Linear Inseparability

Problems with mapping data to higher-dimension
space
Overfitting
SVM chooses the maximum margin, and deals well
with overfitting
High computational cost
SVM kernels only involve dot products between
points (cheap!)

11
SVM Kernels

K(X, Y) is function that calculates a measure of
similarity between X and Y
Dot product
K(X,Y) X.Y
Simplest kernel. Linear hyperplane
Degree d polynomials
K(X,Y) (X.Y 1)d
Gaussian
K(X,Y) exp(-X - Y2/2?2)

12
Experimental Dataset

Expression data from the budding yeast
2467 genes (n)
79 experiments (m)
Dataset available on Stanford web site
Six functional classes
From the Munich Information Centre for Protein
Sequences Yeast Genome Database
Class definitions come from biochemical and
genetic studies
Training data
positive labels set of genes that have a common
function
Negative labels set of genes known not to be a
member of this function class

13
Experimental Design

Compare the performance of
SVM (with degree 1 kernel, i.e. linear))
SVM (with degree 2 kernel)
SVM (with degree 3 kernel)
SVM (Gaussian)
Parzen Windows
Fishers Linear Discriminate
C4.5 Decision Trees
MOC1 Decision Trees

14
Experimental Design

Define the cost of method M
C(M) fp(M) 2.fn(M)
False negatives are weighted higher because the
number of true negatives is larger
Cost of each method is compared to
C(N) cost of classifying everything as negative
Cost saving of method M is
S(M) C(N) - C(M)

15
Experimental Results

SVMs outperform other methods
All classifiers fail to recognize the HTH protein
this is expected
Members of this class are not similarly
regulated

16
Consistently Misclassified Genes

20 genes are consistently misclassified by 4 SVM
kernels, in different experiments
Difference between the expression data and
definitions based on protein structures.
Many of the false positives are known to be
important for the functional class (even though
they are not included as part of the class)

Write a Comment

User Comments (0)