Quantitation of Gene Expression for HighDensity Oligonucleotide Arrays: A SAFER Approach

1 / 32
About This Presentation
Title:

Quantitation of Gene Expression for HighDensity Oligonucleotide Arrays: A SAFER Approach

Description:

Create POS and NEG groups as best we can. How to compare (depends on down ... SAFER provides same raw materials (fitted values and residuals) for QC as Li and ... –

Number of Views:43
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Quantitation of Gene Expression for HighDensity Oligonucleotide Arrays: A SAFER Approach


1
Quantitation of Gene Expression for High-Density
Oligonucleotide ArraysA SAFER Approach
  • Daniel Holder, Bill Pikounis, Richard Raubertas,
    Vladimir Svetnik, and Keith Soper
  • Biometrics Research
  • Merck Research Laboratories

2
Scale Matters Additive Fits (probes and
chips) Experimental-Unit Variability Robustness
and Resistance
3
Goals of Data Analysis
  • Which genes have we detected?
  • Which genes have changed ?
  • Which genes change together?
  • Prerequisites
  • Quantify transcript abundance (gene expression
    index)
  • Quantify precision
  • Assess quality

4
Our Data Analysis Method
  • Normalize chips for overall fluorescence (based
    on MM)
  • Transform data (linear-log hybrid scale)
  • Fit probe-specific model using all chips (highly
    resistant to outliers)
  • Normalize for chip bias (scatterplot smooth)
  • Assess differences (Include between-EU
    variability, e.g., ANOVA)

offers opportunities for QC
5
Fig 1Hybrid Transformation (knot at c20)
f(x)cln(x/c)c
f(x)
f(x)x
f(x)hybrid(0,c)
x
6
Linear-log Hybrid Scale
  • f(x) a if xlta
  • x if x in a,c)
  • cln(x/c)c if x ?c
  • Typically choose a0
  • Value of c chosen for additivity
  • Improved homogeneity of variance
  • For low expression genes compare differences, not
    ratios

7
Probe Specific Effects
  • Probe specific biasesare highly reproducible
    and predictable, and their adverse effect can be
    reduced by proper modeling and analysis methods
    -Li and Wong (PNAS 2000)
  • Multiplicative model for PM - MM, for each
    probeset, (ith chip, jth probe)
  • Resistance achieved by iteratively omitting
    extreme points (or chips) and refitting using
    least squares

8
Probe Specific Effects (Our Approach)
  • For each probeset, resistant, additive fit to PM
    - MM

  • Use a fitting procedure that is highly resistant
    to extreme values (median polish)

Since logs are undefined for non-positive values
and unstable for small values, we use a
linear-log hybrid scale
9
Adjusting for Chip Bias
  • Initial centering of chips
  • Chip bias may depend on gene expression level
  • Plot chip effects vs. Overall expression level
    (grand median) for each probeset
  • Omit probesets that appear to change
  • Between group dev/Within group dev
  • Omit probesets in top 25
  • Fit a resistant scatterplot smoother (loess)

10
Fig 4 Typical Chip Normalization Plot
Chip Effects (Hybrid scale)
Grand Median
5 groups ? 2 chips/group, 7.1K probesets
11
Terry Speed questions
3. How do you tell that one approach to
quantifying expression at the probe set level
(e.g. SAFER), is better than another (e.g. dChip)?
  • Compare on data for which we know the answer
  • Spiking experiments (limited genes)
  • Validation (eg TaqMan)
  • Create POS and NEG groups as best we can.
  • How to compare (depends on down-stream usage)
  • repeatibility
  • eg. signal to noise ? t-statistic ? p-value
  • fold changes

12
Fibroblast/Adipocyte Mixing Expt
  • Mixture s (100/0, 75/25, 50/50, 25/75, 0/100)
  • 3 chips/mix (15 chips total, Mg74A)
  • 3 methods (SAFER, SAFER(log), dCHIP)
  • Create groups of probesets using 100/0 vs. 0/100
  • POS (max p lt 0.01, correct oligos, n1049)
  • NEG (incorrect oligos, n2611)
  • p-value from t-test (pooled variance, hybrid
    scale)
  • We will change the POS, NEG and p-value
    definitions on some of the later slides

13
Fibroblast/Adipocyte Mixing Expt (2)
  • Performance based on 75/25 vs 25/75
  • p-values from t-test (pooled variance, hybrid)
  • for POS require same sign as 100/0 vs 0/100
  • pos rate, false pos rate (FPR), pos rate vs FPR
  • Linearity?

14
Fig 5 CDF for 0 vs 100 (all probesets)
SAFER log
dChip
SAFER
n 12,654
15
Fig 6 CDFs for POS and NEG probesets
0 vs 100 POS
0 vs 100 NEG
SAFER
dChip
dChip
SAFER log
SAFER log
SAFER
25 vs 75 POS
25 vs 75 NEG
SAFER
Uniform dist.
SAFER log
dChip
SAFER
POS maxp lt 0.01 (n 1049) NEG wrong
sequence (n 2611)
16
Fig 7 Positive Rate vs False Positive Rate
25 vs 75
SAFER
dChip
SAFER log
POS maxp lt 0.01 (n 1049) NEG wrong seq. (n
2611))
17
Fig 8 Positive Rate vs False Positive Rate
(log scale)
25 vs 75
SAFER
dChip
SAFER log
POS maxp lt 0.01 (n 1049) NEG wrong seq. (n
2611)
log scale
18
Fig 9 Positive Rate vs False Positive Rate
(log scale)
25 vs 75, dChip p-values used for dChip
SAFER
dChip
SAFER log
POS maxp lt 0.01 (n 1038) NEG wrong seq. (n
2611)
log scale
19
Fig 10 Positive Rate vs False Positive Rate
(log scale)
25 vs 75
SAFER
SAFER log
dChip
POS rank (dChip(p)) lt 1000NEG wrong seq.
rank (dChip(p)) gt 2611-1000
log scale
20
Fig 11 Boxplot of R2 values for POS probesets
R2
SAFER SAFER(log)
dCHIP
POS maxp lt 0.01 (n 1049)
21
Fig 12 Boxplot of R2 values for POS probesets
exclude 100/0 and 0/100 groups
R2
SAFER SAFER(log)
dCHIP
POS maxp lt 0.01 (n 1049)
22
Terry Speed questions
1. Do you lose anything not being able to
down-weight non-performing probe pairs in the way
Li Wong can with their phi's (ie, probe
effect)?
Li Wong
SAFER
Response We dont know.
  • Down-weighting non-performing probes seems like a
    good idea.
  • Is up-weighting bright probes good?
    (variability, saturation)
  • Possible to incorporate weighting in polishing
    step.

23
Terry Speed questions
2. Is SAFER QC as thorough as Li Wong's (in
detecting aberrant chips, probe-sets, probe
pairs)?
Response QC is not as thorough, but
  • Primary goal is to quantitate mRNA detection
    (and error). Explicit QC methods aimed at
    avoiding the effects of aberrant arrays, probes,
    individual observations are less important when
    resistant methods are used.
  • SAFER provides same raw materials (fitted values
    and residuals) for QC as Li and Wong. QC
    summaries can easily be made available.

24
Conclusions
  • For these data, it appears that the SAFER method
    performs better than dChip.
  • Better sensitivity (ROC Curve)
  • Slightly Better Linearity
  • Caveat This is one analysis of one dataset.

25
Acknowledgments
  • Biometrics Research
  • Bert Gunter
  • Other
  • David Gerhold (Pharmacology)
  • John Thompson (Immunology)
  • Eric Muise (Immunology)
  • Karen Richards (Drug Metabolism)
  • Jian Xu (Pharmacology)
  • Yuhong Wang (Bioinformatics)

26
Backups
27
Example Median Polish
grandmedian
probe
probe effects
1 2 3 4 5
36
-34 -8 0 57 73
0 26 29 92 111
123
0 0 -5 1 4
-2 015
chipeffects
chip
0 0 36 93 109
-2 28 0 0 0
31 43 51 106 121
14 0 0 -2 -3
residuals
intensities
28
Fig 2 Choose c using P-values from Tukey
Non-additivity Test
P-value
Hybrid(0,1)
Hybrid(0,20)
Hybrid(0,40)
Raw
Scale
5 groups ? 2 chips/group, 7.1K probesets
29
Fig 3 Within Group SD, Hybrid Scale
Within Group SD
Grand effect
5 groups ? 2 chips/group, 7.1K probesets
30
Fig 9 Between EU variability as a percentage of
Total variability
All probesets
Probesets with meangt50 (hybrid)
100VarBetween/(VarBetween VarWithin )
Grand Median
Grand Median
Pknown expressed
Line loess smooth
15 human livers ? 2 chips/liver, 1.5K probesets
31
dChip vs SAFER differences
0 vs 100 (all probesets)
0 vs 100 (POS probesets)
25 vs 75 (all probesets)
25 vs 75 (POS probesets)
POS maxp lt 0.01 (n 1049)
32
Positive Rate vs False Positive Rate (log
scale)
25 vs 75
SAFER
dChip
SAFER log
POS maxp lt 0.01 (n 1049) NEG wrong seq.
minp gt 0.5 (n 270)
log scale
Write a Comment
User Comments (0)
About PowerShow.com