Title: Designing Mestimators for expression analysis: PLIER
1Designing M-estimators for expression analysis
PLIER
- Earl Hubbell
- Principal Statistician
- Affymetrix
2Outline
- Drive our intuition (basic data)
- Formalize the intuition
- Check functionality
- Look at results
- Bonus tricks and stunts
3Wafers, Chips, and Features
Chips / wafer
4Expression Probes
Sequence
Probes
Perfect Match Mismatch
Chip
5Components of Stray Signal
6Components of Bound Target Signal and Noise
7Hybridization is mostly linear, with some stray
signal saturation
8One probe (pair) PM-MM reduces bias
9Probes not very informative about concentration
near background!
10Probes have systematic differences
11Affinity compensates for first-order probe
differences
12Likelihood summarizes knowledge of expression
13A pause before jumping into equations
- statistics, whatever their mathematical
sophistication and elegance, cannot make bad
variables into good ones. - H.T. Reynolds, Analysis of Nominal Data
14Fun with Statistics
- Money What should I estimate?
- M-estimators Statistics by Optimization
- Model Linking Intensity to Concentration
- Mismatches Faking Subtraction
- Mayhem Does it work?
- More Tricks!
15Estimator Goals
- Handle zero/near-zero concentrations
- Handle arithmetic noise at low end
- Minimum bias (avoid sample trouble)
- can always variance stabilize later
- Resist outliers
- Avoid lots of parameters!
16How to estimate?
- (-5373473)/3 280.3 (Mean)
- 280.3 is the value minimizing (x5)2(x-373)2
(x-473)2 - median(-5,373,473) 373
- 373 is the value minimizing x5x-373x-473
17M-estimator
- Optimizes some function of the data sum(f(y,xi))
for y - y is then an estimate of some interesting
property of the data (we hope) - Looks like Maximum Likelihood estimates (but
can tune for utility)
18Designing the M-estimator PLIER
- M-estimator minimizes some function of the data
and the estimator(s) - Our case sum( f(PM,MM, a,c,z) )
- Choose f to model reasonable error
- Choose tail of f to handle outliers
- PLIER Probe Logarithmic Intensity ERror
19Assumptions (approximations?)
- Concentration never negative! cgt0
- linear link between true signal concentration
Tac - Background (not constant) adds to signal I
TB - Background same for PM and MM
- Multiplicative intensity error log(I)
normal(log(TB),s2)
20Assumption Multiplicative Error
- Widely agreed that replicate observations of
probes (PM,MM) are approximately log-normal - I.e. PM varies by 10 of PM
- Does not imply that derived quantities (PM-MM or
PM-B) are also log-normal! - I.e. PM-MM varies by 7 of (PMMM) not by 10 of
(PM-MM)!
21No obvious need for arithmetic noise for raw
intensities
22Simplified model PM-MM
- PM acMM
- MM a2cB
- If B can vary wildly (experiment-experiment,
probe-probe) , left with - PM-MM ac
- Incorporating multiplicative error
- e1PM-e2MM ac
23Key concept good fits have small multiplicative
errors
- Trying to minimize log(e1)2log(e2)2
- The actual minimum is a complicated function, so
we(I) dont want to solve for it - And we dont have to - M-estimators can be chosen
for computational convenience - Therefore, let log(e1)2log(e2)2
24How good is the fit? 2 possible
- log(e1)log(e2) (log transform)- no solution
for MMgtPM, always worse fit than - log(e1)-log(e2) (PLIER)
- gt e acsqrt((ac)24PMMM)/2PM
- log(e) exists for any PM,MMgt0, any a,c
- effective error model changes from arithmetic
near zero to multiplicative far from zero
25PM - MM Goodness of fit
26Define center of f
- Residual rlog(e)
- Under log-normal assumption, fit for least r2
- But we should fix the tails (where outliers show
up, and the approximation breaks down)
27Robustness
- Want to discount outliers compared to
sum-of-squares - Off-the-shelf Geman-McClure transformation
f(r,z) r2/(1r2/z) - Looks like least-squares for r small
- bounds influence of residual to at most z
28Transformation f(r) and its Influence Function
29PLIER Goodness of fit for 11 probe pairs
30PLIER on a t-shirt
- y ac
- e ysqrt(y24PMMM)/2PM
- r log(e)
- f(r,z) r2/(1r2/z)
- argmin(sum(f(r,z))) over all a,c gt0
- yields PLIER estimate of affinity and
concentration
31Optimizing (finding minima)
- Many ways to find best fit
- Easiest to explain is cyclic coordinate ascent
aka polishing the data - Can start anywhere (but best to start with a good
guess)
32Finding affinity/concentration Dont I need to
know one to start?
33Observed PM/MM values
34Compare observed to predicted (find where to
improve predictions)
35Polishing the table
- Guess initial values (a 1.0, c0)
- Find best concentrations (with current
affinities) - Find best affinities (for current concentrations)
- Repeat until minimized (or bored)
- remember values non-negative!
36How does it work on real data?
- Gold-standard data generated by spiking in known
transcript - Example is one of the transcripts (6th)
- Look at residuals to find outliers
37Latin Square Experimental Design
Spike Groups
Exp
Affymetrix Confidential
38Model fitA1.0, C0.0
C-values
A-values
39Residuals Fit Concentration
40Residuals Fit Probes
41Fit concentration and affinities - data fits
except for outliers clearly revealed
42What are the outliers?
43Final results (value)
44Know everything (approx)
1.0 2.0 4.0
Likelihood
0 .25 .5
Estimate
Estimate
45Trick P/A calls by fit
46Trick models are good for residuals!
47 Optimization (harder to illustrate)
- Current implementation uses descent optimization
(Newton-like) - Start with a good initial guess (median polish)
- Improve by descent
- Try jumps to escape local minima
48Evaluating performance
- MvA plots (unbiased/biased)
- Receiver Operating Characteristic (ROC)
- Area Under Curve (AUC) (global/stratified)
- Benchmark results
49MvA plots
- Scatterplot turned 45
- Plotting A vs B
- M log(A)-log(B)
- A (log(A)log(B))/2 average
- Allows easy visualization of changes
50MVA (bias added for stabilization)
51Receiver Operating Characteristic
- ROC curves measure separation of distributions
for two states - Changed or unchanged between pair(s) of
experiments - Depends on the variation of the signal within an
experiment, and the separation between the two
states - Note that just measuring variation or just
measuring separation can be misleading! - One popular method of defining changed is a
fold-change threshold - ROC curves can be summarized by area under
curve
52Overall performance good (ROC)
53Specific performance regimes are of interest
- Low, medium, high concentrations
- Relatively small fold-changes (2-fold, 4-fold)
- Thresholds defined by fold-change
- Thresholds defined by change relative to
variation (t-like statistic)
54PLIER (16) has excellent discrimination at the
low end for differential change
FC
TSTAT
55Output characteristics of some standard methods
- MAS 5.0 Not variance stabilized(), some bias,
runs on single chips - PLIER Not variance stabilized(), minimal bias,
reduced variance, runs on multiple chips - RMA Variance stable, noticeable bias, low
variance, runs on multiple chips - ()Can always apply stabilizing transformation
56PLIER unbiased - Can always add bias to
stabilize variance
57MVA plots show the distribution of variation in
the data, but also show (some) of the compression
of the dynamic range
58Drawing fold-change lines leads to ROC curves
which are heavily biased towards results with
stable variance
59Differential change can be detected using the
variation in the data by intensity
60Thresholds based on the variance (t-stat analog)
yield a better estimate of the relative
performance of methods
61Works fine on U133 too
62Bonus M-estimator tricks
- Handle PM-only (PMMM, PM-B) just fine by
replacing error model in f - Play Bayesian games (affinity penalties,
concentration penalties)
63M-estimator PM only
- PM-B ac
- background estimate perfect
- ePM-B ac
- e (acB)/PM
- proceed in the same framework using e
- Note that B can be zero for (acgt0)
64PM-only global background biased
65Can play Bayesian games
- Probe affinities likely to be log-normal
distributed - Add a penalty term to avoid overweighting any
single probe - Good when insufficient data
- sum(log(e)2) (penalty)sum(log(a)2)
- Can do the same for concentration
66Bayesian prior on probes
Avoid overweighting probes in problem cases
Doesnt affect real data
67PLIER
- M-estimators form a very flexible framework for
analysis - Can handle PM-B, PM-MM, PM-only approaches in
same framework - Handles zero/near-zero concentration affinities
in model directly - Seems to produce good results
68PLIER obtaining an implementation
- PLIER algorithm SDK is now available under a GPL
open source license. - The code is available as C without windows
dependencies. Documentation is included at the
site. - All of us at Affymetrix hope that releasing PLIER
in this manner promotes all of the values that
the Bioconductor community embraces. - http//www.affymetrix.com/support/developer/index.
affx
69Thanks
- David Kulp
- Sejal Shah
- Simon Cawley
- David Finkelstein
- Mike Lelivelt
- Teresa Webster
- Rui Mei
- Suzanne Dee
- Stefan Bekiranov
- Xiaojun Di
- Alex Cheung
- Steve Lincoln
- Many, many others!