A Statistical Framework for Expression-Based Molecular Classification - PowerPoint PPT Presentation

About This Presentation
Title:

A Statistical Framework for Expression-Based Molecular Classification

Description:

A Statistical Framework for Expression-Based Molecular Classification Elizabeth Garrett Sidney Kimmel Cancer Center Johns Hopkins University Molecular Classification ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 25
Provided by: ElizabethS164
Learn more at: http://people.musc.edu
Category:

less

Transcript and Presenter's Notes

Title: A Statistical Framework for Expression-Based Molecular Classification


1
A Statistical Framework for Expression-Based
Molecular Classification
  • Elizabeth Garrett
  • Sidney Kimmel Cancer Center
  • Johns Hopkins University

2
Molecular Classification of Cancer
  • Goals
  • Short term
  • To use gene expression array data to identify and
    hypothesize subtypes of cancer
  • To discover new cancer classes that are
    interpretable and amenable to further biological
    analysis
  • To translate classes into clinical tools
  • Long term
  • To eventually refine individualized prognosis and
    therapy

3
Outline of Talk
  • Molecular Classifications
  • the role of statistics in molecular
    classification
  • defining a molecular profile
  • Modeling latent classes POE (Probability of
    Expression)
  • Bayesian mixture models
  • visualization tools
  • Mining using latent classes
  • Using POE to combine across platforms

4
Botstein-Brown style of visualizing gene
expression data
(Garber et al. PNAS 2001)
5
The fine print
6
Motivating Datasets
  • Unclassified cancer samples Are the gene
    expressions patterns informative about
    subclasses?
  • Ductal breast cancers
  • Adenocarcinomas of the lung
  • Diffuse large B-cell lymphoma
  • Related tissues Are subtypes associated with
    prognosis?
  • Normal tissues and cancers tissues
  • Outcome data (e.g. survival, recurrence,
    response)
  • Genes Are hypothesized genes associated with
    cancer types?
  • Functional information
  • Custom array

7
General Approach of POE (Probability of
Expression)
  • Define a reference expression value
  • normal vs. over expressed vs. under expressed
  • unsupervised in nature
  • Use scale-independent measures of expression
  • allows combination of data across platforms
  • incorporates measurement errors
  • Choose molecular profile that predicts cancer
    class based on a small number of genes
  • yields clinical implications
  • choose genes using combination of statistical and
    biological evidence
  • Caveat NOT intended for gene clustering and not
    for manual clustering of genes

8
Molecular Profiles (based on 3 genes A, B, and C)
27 33 possible profiles
Gene A Gene B Gene C
Profile 1 -1 -1 -1
Profile 2 -1 -1 0
Profile 3 -1 -1 1
Profile 4 -1 0 -1
Profile 5 -1 0 0
Profile 6 -1 0 1
. . . .
Profile 24 1 0 1
Profile 25 1 1 -1
Profile 26 1 1 0
Profile 27 1 1 1
9
Mixture of Normal and Two Uniform Distributions
10
Empirical Density of Expression Levels in One
Gene Across 203 Lung Samples
Bhattacharjee, PNAS 2001
11
Latent Expression Classes
  • Notation
  • Modeling observed gene expression, agt
  • For gene g, the proportions of differentially
    expressed tumors in the population of
    unclassified tumors are

12
Probability Scale for Expression Data
Interpretation The probability that gene g in
tumor t is over expressed given observed
expression and the model parameters
Interpretation The probability that gene g in
tumor t is under expressed given observed
expression and the model parameters
13
Distributional Assumptions
Samples Normal/Uniform mixture
Genes Second stage model
14
(No Transcript)
15
Original Scale
After Transformation
16
Harvard Lung Cancer Data (Bhattacharjee, PNAS,
2001)
17
MCMC Estimation Approach
  • Relatively straightforward
  • A couple comments
  • Data augmentation using unknown expression
    variables egt. Sampling of ?s unconditional on
    es
  • Starting conditions are critical. K-means
    clustering (k2 or 3) useful for picking starting
    centers and spread
  • Constrain min(?g,?g- ) gt k?g

18
Denoising Expression Data
Provides cleaner version of the original
expression level data.
19
Mining for Genes
  • Two quantities of interest in looking for and
    grouping genes.
  • Probability that gene g follows a specified
    pattern
  • Probability that all genes in set G0 have the
    same pattern across samples

20
Identifying Gene Groups
  • Preselect proportions of over and under expressed
    genes (e.g. 20 under, 5 over)
  • Select genes consistent with proportions via
    P(eg1,.,egT?)
  • Chose genes which are similar in expression
    pattern to add to group via q(G0).
  • Look at mining plot to identify genes which are
    sensible (biologically).

21
5 underexpressed, 15 overexpressed, 4 sets
22
Molecular Profiles
23
Combining Across Platforms
  • Example Stanford, Harvard, Michigan lung cancer
    datasets
  • Publicly available
  • Different platforms Affymetrix, cDNA glass
    slides
  • POE rescales to probability metric
  • With some caveats, can combine data

24
  • Statistics G. Parmigiani, E. Garrett
  • Arrays, Biology E. Gabrielson, R. Anbazhagan
  • http//astor.som.jhmi.edu/poe
  • G. Parmigiani, E. Garrett, R. Anbazhagan, E.
    Gabrielson. A statistical framework for
    expression-based molecular classification in
    cancer. JRSS, in press.
  • E. Garrett, G. Parmigiani. POE Statistical
    Methods for Qualitative Analysis of Gene
    Expression. In The Analysis of Gene Expression
    Data Methods and Software (eds. G Parmigiani,
    E. Garrett, R. Irrizarry, S. Zeger). To appear
    2003.
Write a Comment
User Comments (0)
About PowerShow.com