BiologyDriven Clustering of Microarray Data - PowerPoint PPT Presentation

About This Presentation
Title:

BiologyDriven Clustering of Microarray Data

Description:

melanoma.loximvi. melanoma.uacc577. melanoma.m14. melanoma.skmel2. melanoma.skmel5. melanoma.malme3m. melanoma.skmel28. melanoma.uacc62. nsclc.h322 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 26
Provided by: bioinforma2
Category:

less

Transcript and Presenter's Notes

Title: BiologyDriven Clustering of Microarray Data


1
Biology-Driven Clustering of Microarray Data
  • Applications to the NCI60 Data Set

K.R. Coombes, K.A. Baggerly, D.N. Stivers, J.
Wang, D. Gold, H.G. Sung, and S.J. Lee
2
Introduction
  • Microarray data is more than a large,
    unstructured matrix.
  • We already know many genes important for studying
    cancer through their involvement in specific
    biological processes
  • We also know that reproducible chromosomal
    abnormalities play an important role in cancer
  • Need analytical methods that use biological
    information early

3
Methods
  • First, updated the annotations of the genes on
    the microarray
  • Performed separate analyses
  • using genes on individual chromosomes
  • using genes involved in different biological
    processes
  • Developed ways to assess how well each set of
    genes classified samples

4
Quality of Annotations
  • Problem
  • I.M.A.G.E. clone IDs and GenBank accession
    numbers are archival
  • UniGene clusters, gene names, descriptions,
    functions, etc., are changeable
  • Solution
  • Download latest UniGene (build 137) and LocusLink
    to update annotations

5
How many genes on the array have good annotations?
Only trust the 7478 spots where the UniGene
clusters match.
6
Where are the genes located?
7
How do we determine the functions of genes?
  • UniGene -gt LocusLink -gt GeneOntology
  • GeneOntology is a structured, hierarchical
    vocabulary to describe gene functions in three
    broad areas
  • biological process (why)
  • molecular function (what)
  • cellular component (where)

8
What kinds of genes are on the microarray?
9
Data Preprocessing
  • Remove spots with poor annotations and spots with
    median intensity below the 97th percentile of
    empty spots.
  • Normalize each array so median log ratio between
    channels is one
  • Center each gene so mean log ratio across
    experiments is zero
  • Use (1-correlation)/2 as distance metric

10
How well does a set of genes distinguish types of
cancer?
  • Three methods for assessment
  • Qualitative (PCA, MDS)
  • Quantitative (PCA ANOVA)
  • Semi-quantitative (Grading Dendrograms)

11
Multidimensional Scaling
12
PCANOVA
13
How good is a dendrogram?
  • A cluster contains all and only one kind of
    cancer
  • B all, with extras
  • C all except one
  • D all except one, with extras
  • E all except two
  • F all except two, with extras

14
Can cancers be distinguished by genes on one
chromosome?
15
Heterogeneity of different types of cancer
  • Some cancers (colon, leukemia) are fairly easy to
    distinguish from others
  • Some (breast, lung) are so heterogeneous as to be
    almost impossible to distinguish
  • Some chromosomes (1, 2, 6, 7, 9, 12, 17) can
    distinguish many cancers.
  • Some (16, 21) are essentially random

16
(No Transcript)
17
(No Transcript)
18
Can cancers be distinguished by genes of one
function?
  • Table for functional categories looks a lot like
    the table for chromosomes
  • Some biological process categories (signal
    transduction, cell proliferation, cell cycle,
    protein metabolism) can distinguish many types of
    cancer
  • Others (apoptosis, energy pathways) cannot

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Conclusions (I)
  • Multiple views into the data provide substantial
    insight into differences in cancer types and gene
    sets.
  • Cancer types differ greatly in their degree of
    heterogeneity, ranging from homogeneous (colon,
    leukemia) through moderately heterogeneous
    (renal, melanoma) to extremely heterogeneous
    (breast and lung).

24
Conclusions (II)
  • Homogeneous cancers exhibit strong identifying
    signals across most views of the data.
  • There are large difference in the ability of
    genes of different chromosomes or involved in
    different biological processes to distinguish
    cancer types.

25
Supplementary Material
  • Complete results of each analysis by chromosome
    and by function are available no our web site
  • http//www.mdanderson.org
  • /depts/cancergenomics
Write a Comment
User Comments (0)
About PowerShow.com