Normalization in the Presence of Differential Expression in a Large Subset of Genes - PowerPoint PPT Presentation

About This Presentation
Title:

Normalization in the Presence of Differential Expression in a Large Subset of Genes

Description:

Normalization in the Presence of Differential Expression in a Large Subset of Genes Elizabeth Garrett Giovanni Parmigiani Motivation (again) Class discovery: Find ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 18
Provided by: Garr8
Learn more at: http://people.musc.edu
Category:

less

Transcript and Presenter's Notes

Title: Normalization in the Presence of Differential Expression in a Large Subset of Genes


1
Normalization in the Presence of Differential
Expression in a Large Subset of Genes
  • Elizabeth Garrett
  • Giovanni Parmigiani

2
Motivation (again)
  • Class discovery Find breast cancer subtypes
    within 81 samples of previously unclassified
    breast cancer tumor samples
  • Gene selection Find small subset of genes which
    allows us to cluster tumor samples
  • Gene clustering Look for genes which are
    differentially expressed and genes that behave
    similarly.

3
Raw data log gene expression median versus log
gene expression in sample i
4
Problem with raw data
  • V pattern in many of the slides
  • Curvature
  • Non-constant variance

5
V Patterns
  • Debate
  • We thought..Oops, something went wrong in the
    lab. We should either
  • correct the Vs so that we see only one line
  • remove the genes that are causing the V
  • They (i.e. experts) thought..Its REAL
    differential expression!
  • Assuming it is real, how do we normalize to
    straighten and stabilize variance?

6
Crude Initial Approach
  • Approach
  • Fit a regression to each plot and identify points
    with large negative (positive) residuals.
  • Remove the genes with negative (positive)
    residuals (and high abundance?) and normalize
    using the remaining points.
  • Problem Points near origin get truncated in odd
    way and there is no obvious way to decide how to
    include exclude near origin.

7
High abundance 3 or greater
8
A better (and not hard to implement) approach

class 0
1. Assume 2 classes of genes
class 1
2. Take subset of samples where V is obvious
(we picked four samples) 3. Fit a latent
variable model using MCMC to predict which
genes are in class 1 and which in class 0.
9
Latent Variable Model
  • Allow different slopes and intercepts for the two
    classes of genes
  • Details

10
Results
  • Goal is to estimate gene classes, cg
  • ?s are nuisance parameters
  • Based on chain, we estimate ?g P(cg 1)
  • at each iteration, each gene is assigned to class
    0 or class 1
  • by averaging class assignments over iterations,
    we get posterior probability of class membership
  • To do normalization, we restrict attention to
    genes with ?g lt 0.95

11
Posterior Probabilities of Class Membership
12
(No Transcript)
13
Normalization
  • Use loess normalization where class 0 genes are
    the reference

rsg residuals ysg - loess
Sample 43
14
Before and after loess normalization (R function
loess with weights 1 - c_g)
Before
After
15
Variance Stabilization
  • Take residuals from previous loess fit.
  • Fit loess to squared residuals versus median
  • Square-root of fitted value approximates standard
    deviation.
  • Rescale so that overall slide variability is not
    lost by dividing by average slide variance.

16
Final Step
  • Calculate normalized data

Slide median
Residual from first loess
gene median
Variance stabilizer from second loess
17
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com