BiologyDriven Clustering of Microarray Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: BiologyDriven Clustering of Microarray Data

1
Biology-Driven Clustering of Microarray Data

Applications to the NCI60 Data Set

K.R. Coombes, K.A. Baggerly, D.N. Stivers, J.
Wang, D. Gold, H.G. Sung, and S.J. Lee
2
Introduction

Microarray data is more than a large,
unstructured matrix.
We already know many genes important for studying
cancer through their involvement in specific
biological processes
We also know that reproducible chromosomal
abnormalities play an important role in cancer
Need analytical methods that use biological
information early

3
Methods

First, updated the annotations of the genes on
the microarray
Performed separate analyses
using genes on individual chromosomes
using genes involved in different biological
processes
Developed ways to assess how well each set of
genes classified samples

4
Quality of Annotations

Problem
I.M.A.G.E. clone IDs and GenBank accession
numbers are archival
UniGene clusters, gene names, descriptions,
functions, etc., are changeable
Solution
Download latest UniGene (build 137) and LocusLink
to update annotations

5
How many genes on the array have good annotations?
Only trust the 7478 spots where the UniGene
clusters match.
6
Where are the genes located?
7
How do we determine the functions of genes?

UniGene -gt LocusLink -gt GeneOntology
GeneOntology is a structured, hierarchical
vocabulary to describe gene functions in three
broad areas
biological process (why)
molecular function (what)
cellular component (where)

8
What kinds of genes are on the microarray?
9
Data Preprocessing

Remove spots with poor annotations and spots with
median intensity below the 97th percentile of
empty spots.
Normalize each array so median log ratio between
channels is one
Center each gene so mean log ratio across
experiments is zero
Use (1-correlation)/2 as distance metric

10
How well does a set of genes distinguish types of
cancer?

Three methods for assessment
Qualitative (PCA, MDS)
Quantitative (PCA ANOVA)
Semi-quantitative (Grading Dendrograms)

11
Multidimensional Scaling
12
PCANOVA
13
How good is a dendrogram?

A cluster contains all and only one kind of
cancer
B all, with extras
C all except one
D all except one, with extras
E all except two
F all except two, with extras

14
Can cancers be distinguished by genes on one
chromosome?
15
Heterogeneity of different types of cancer

Some cancers (colon, leukemia) are fairly easy to
distinguish from others
Some (breast, lung) are so heterogeneous as to be
almost impossible to distinguish
Some chromosomes (1, 2, 6, 7, 9, 12, 17) can
distinguish many cancers.
Some (16, 21) are essentially random

16
(No Transcript)
17
(No Transcript)
18
Can cancers be distinguished by genes of one
function?

Table for functional categories looks a lot like
the table for chromosomes
Some biological process categories (signal
transduction, cell proliferation, cell cycle,
protein metabolism) can distinguish many types of
cancer
Others (apoptosis, energy pathways) cannot

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Conclusions (I)

Multiple views into the data provide substantial
insight into differences in cancer types and gene
sets.
Cancer types differ greatly in their degree of
heterogeneity, ranging from homogeneous (colon,
leukemia) through moderately heterogeneous
(renal, melanoma) to extremely heterogeneous
(breast and lung).

24
Conclusions (II)

Homogeneous cancers exhibit strong identifying
signals across most views of the data.
There are large difference in the ability of
genes of different chromosomes or involved in
different biological processes to distinguish
cancer types.

25
Supplementary Material

Complete results of each analysis by chromosome
and by function are available no our web site
http//www.mdanderson.org
/depts/cancergenomics

Write a Comment

User Comments (0)

About PowerShow.com

BiologyDriven Clustering of Microarray Data PowerPoint PPT Presentation