A Statistical Framework for Expression-Based Molecular Classification - PowerPoint PPT Presentation

About This Presentation

Title:

A Statistical Framework for Expression-Based Molecular Classification

Description:

A Statistical Framework for Expression-Based Molecular Classification Elizabeth Garrett Sidney Kimmel Cancer Center Johns Hopkins University Molecular Classification ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 25

Provided by: ElizabethS164

Learn more at: http://people.musc.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Statistical Framework for Expression-Based Molecular Classification

1
A Statistical Framework for Expression-Based
Molecular Classification

Elizabeth Garrett
Sidney Kimmel Cancer Center
Johns Hopkins University

2
Molecular Classification of Cancer

Goals
Short term
To use gene expression array data to identify and
hypothesize subtypes of cancer
To discover new cancer classes that are
interpretable and amenable to further biological
analysis
To translate classes into clinical tools
Long term
To eventually refine individualized prognosis and
therapy

3
Outline of Talk

Molecular Classifications
the role of statistics in molecular
classification
defining a molecular profile
Modeling latent classes POE (Probability of
Expression)
Bayesian mixture models
visualization tools
Mining using latent classes
Using POE to combine across platforms

4
Botstein-Brown style of visualizing gene
expression data
(Garber et al. PNAS 2001)
5
The fine print
6
Motivating Datasets

Unclassified cancer samples Are the gene
expressions patterns informative about
subclasses?
Ductal breast cancers
Adenocarcinomas of the lung
Diffuse large B-cell lymphoma
Related tissues Are subtypes associated with
prognosis?
Normal tissues and cancers tissues
Outcome data (e.g. survival, recurrence,
response)
Genes Are hypothesized genes associated with
cancer types?
Functional information
Custom array

7
General Approach of POE (Probability of
Expression)

Define a reference expression value
normal vs. over expressed vs. under expressed
unsupervised in nature
Use scale-independent measures of expression
allows combination of data across platforms
incorporates measurement errors
Choose molecular profile that predicts cancer
class based on a small number of genes
yields clinical implications
choose genes using combination of statistical and
biological evidence
Caveat NOT intended for gene clustering and not
for manual clustering of genes

8
Molecular Profiles (based on 3 genes A, B, and C)
27 33 possible profiles
Gene A Gene B Gene C
Profile 1 -1 -1 -1
Profile 2 -1 -1 0
Profile 3 -1 -1 1
Profile 4 -1 0 -1
Profile 5 -1 0 0
Profile 6 -1 0 1
. . . .
Profile 24 1 0 1
Profile 25 1 1 -1
Profile 26 1 1 0
Profile 27 1 1 1
9
Mixture of Normal and Two Uniform Distributions
10
Empirical Density of Expression Levels in One
Gene Across 203 Lung Samples
Bhattacharjee, PNAS 2001
11
Latent Expression Classes

Notation
Modeling observed gene expression, agt
For gene g, the proportions of differentially
expressed tumors in the population of
unclassified tumors are

12
Probability Scale for Expression Data
Interpretation The probability that gene g in
tumor t is over expressed given observed
expression and the model parameters
Interpretation The probability that gene g in
tumor t is under expressed given observed
expression and the model parameters
13
Distributional Assumptions
Samples Normal/Uniform mixture
Genes Second stage model
14
(No Transcript)
15
Original Scale
After Transformation
16
Harvard Lung Cancer Data (Bhattacharjee, PNAS,
2001)
17
MCMC Estimation Approach

Relatively straightforward
A couple comments
Data augmentation using unknown expression
variables egt. Sampling of ?s unconditional on
es
Starting conditions are critical. K-means
clustering (k2 or 3) useful for picking starting
centers and spread
Constrain min(?g,?g- ) gt k?g

18
Denoising Expression Data
Provides cleaner version of the original
expression level data.
19
Mining for Genes

Two quantities of interest in looking for and
grouping genes.
Probability that gene g follows a specified
pattern
Probability that all genes in set G0 have the
same pattern across samples

20
Identifying Gene Groups

Preselect proportions of over and under expressed
genes (e.g. 20 under, 5 over)
Select genes consistent with proportions via
P(eg1,.,egT?)
Chose genes which are similar in expression
pattern to add to group via q(G0).
Look at mining plot to identify genes which are
sensible (biologically).

21
5 underexpressed, 15 overexpressed, 4 sets
22
Molecular Profiles
23
Combining Across Platforms

Example Stanford, Harvard, Michigan lung cancer
datasets
Publicly available
Different platforms Affymetrix, cDNA glass
slides
POE rescales to probability metric
With some caveats, can combine data

Statistics G. Parmigiani, E. Garrett
Arrays, Biology E. Gabrielson, R. Anbazhagan
http//astor.som.jhmi.edu/poe
G. Parmigiani, E. Garrett, R. Anbazhagan, E.
Gabrielson. A statistical framework for
expression-based molecular classification in
cancer. JRSS, in press.
E. Garrett, G. Parmigiani. POE Statistical
Methods for Qualitative Analysis of Gene
Expression. In The Analysis of Gene Expression
Data Methods and Software (eds. G Parmigiani,
E. Garrett, R. Irrizarry, S. Zeger). To appear
2003.