Title: Significance Testing of Microarray Data
1Significance Testing of Microarray Data
- BIOS 691 Fall 2008
- Mark Reimers
- Dept. Biostatistics
2Outline
- Multiple Testing
- Family wide error rates
- False discovery rates
- Application to microarray data
- Practical issues correlated errors
- Computing FDR by permutation procedures
- Conditioning t-scores
3Reality Check
- Goals of Testing
- To identify genes most likely to be changed or
affected - To prioritize candidates for focused follow-up
studies - To characterize functional changes consequent on
changes in gene expression - So in practice we dont need to be exact
- but we do need to be principled!
4Multiple comparisons
- Suppose no genes really changed
- (as if random samples from same population)
- 10,000 genes on a chip
- Each gene has a 5 chance of exceeding the
threshold at a p-value of .05 - Type I error
- The test statistics for 500 genes should exceed
.05 threshold by chance
5Distributions of p-values
Real Microarray Data
Random Data
6Characterizing False Positives
- Family-Wide Error Rate (FWE)
- probability of at least one false positive
arising from the selection procedure - Strong control of FWE
- Bound on FWE independent of number changed
- False Discovery Rate
- Proportion of false positives arising from
selection procedure - ESTIMATE ONLY!
7Corrected p-Values for FWE
- Sidak (exact correction for independent tests)
- pi 1 (1 pi)N if all pi are independent
- pi _at_ 1 (1 Npi ) gives Bonferroni
- Bonferroni correction
- pi Npi, if Npi lt 1, otherwise 1
- Expectation argument
- Still conservative if genes are co-regulated
(correlated) - Both are too conservative for array use!
8Holms FWER Procedure
- Order p-values p(1), , p(N)
- If p(1) lt a/N, reject H(1) , then
- If p(2) lt a/(N-1), reject H(2) , then
- Let k be the largest n such that p(n) lt a/n, for
all n lt k - Reject p(1) p(k)
- Then P( at least one false positive) lt a
- Step-up procedure
- Proof doesnt depend on distributions
9Hochbergs FWER Procedure
- Find largest k p(k) lt a / (N k 1 )
- Then select genes (1) to (k)
- Step-down procedure starting from largest
p-values and working down - More powerful than Holms procedure
- But requires assumptions independence or
positive dependence - When one type I error, could have many
10Simes Lemma
- Suppose we order the p-values from N independent
tests using random data - p(1), p(2), , p(N)
- Pick a target threshold a
- P( p(1) lt a /N p(2) lt 2 a /N p(3) lt 3 a /N
) a
a/2
A a/2 a/2 a2/4 a2/4
a/2
11Simes FWER Procedure
- Pick a target threshold a
- Order the p-values p(1), p(2), , p(N)
- If p(1) lt a /N then
- If p(2) lt 2 a /N then
-
- if p(k) lt k a /N
- Select the corresponding genes (1) to (k)
- Step-up procedure
- starting with the smallest p-values and working up
12 Truth vs. Decision
Decision
Truth
13False Discovery Rate
- In genomic problems a few false positives are
often acceptable. - Want to trade-off power .vs. false positives
- Could control
- Expected number of false positives
- Expected proportion of false positives
- What to do with E(V/R) when R is 0?
- Actual proportion of false positives
14Catalog of Type I Error Rates
- Per-family Error Rate
- PFER E(V)
- Per-comparison Error Rate
- PCER E(V)/m
- Family-wise Error Rate
- FWER p(V 1)
- False Discovery Rate
- i) FDR E(Q), where
- Q V/R if R gt 0 Q 0 if R 0 (B-H)
- ii) FDR E( V/R R gt 0) (Storey)
15Benjamini-Hochberg
- Cant know what FDR is for a particular sample
- B-H suggest procedure specifying Average FDR
- Order the p-values p(1), p(2), , p(N)
- If any p(k) lt k a /N
- Then select genes (1) to (k)
- q-value smallest FDR at which the gene becomes
significant - NB acceptable FDR may be much larger than
acceptable p-value (e.g. 0.10 )
16Argument for B-H Method
- If no true changes (all null Hs hold)
- Q 1 condition of Simes lemma holds
- P lt a
- If all true changes (no null Hs hold)
- Q 0 lt a
- Build argument by induction
17Storeys pFDR
- Storey argues that E(Q V gt 0 ) is the quantity
of real interest - Sometimes quite different from B-H
18A Bayesian Interpretation
- Suppose nature generates true nulls with
probability p0 and false nulls with P p1 - Then pFDR P( H true procedure)
19Storeys Procedure
20Practical Issues
- Actual proportion of false positives varies from
data set to data set - Mean FDR could be low but could be high in your
data set
21The Effect of Correlation
- If all genes are uncorrelated, Sidak is exact
- If all genes were perfectly correlated
- p-values for one are p-values for all
- No multiple-comparisons correction needed
- Typical gene data is highly correlated
- First eigenvalue of SVD may be more than half the
variance - Distribution of p-values may differ from uniform
- True FDR more variable
22Symptoms of Correlated Tests
P-value Histograms
23Distributions of numbers of p-values below
threshold
- 10,000 genes
- 10,000 random drawings
- L Uncorrelated R Highly correlated
24Permutation Tests
- We dont know the true distribution of gene
expression measures within groups - We simulate the distribution of samples drawn
from the same group by pooling the two groups,
and selecting randomly two groups of the same
size we are testing. - Need at least 5 in each group to do this!
25Permutation Tests How To
- Suppose samples 1,2,,10 are in group 1 and
samples 11 20 are from group 2 - Permute 1,2,,20 say
- 13,4,7,20,9,11,17,3,8,19,2,5,16,14,6,18,12,15,10
- Construct t-scores for each gene based on these
groups - Repeat many times to obtain Null distribution of
t-scores - This will be a t-distribution ? original
distribution has no outliers
26Multivariate Permutation Tests
- Want a null distribution with same correlation
structure as given data but no real differences
between groups - Permute group labels among samples
- redo tests with pseudo-groups
- repeat ad infinitum (10,000 times)
27Critiques of Permutations
- Variances of permuted values for truly changed
genes are inflated - artificially low p-values
- Permuted t -scores for many genes may be lower
than from random samples from the same population
28Permutations for FWER
- Typically tests are correlated
- Extreme case all tests highly correlated
- One test is proxy for all
- Corrected p-values are the same as
uncorrected - Intermediate case some correlation
- Usually probability of obtaining a p-value by
chance is in between Sidak and uncorrected values
29Westfall-Young Approach
- How often is smallest p-value less than a given
p-value if tests are correlated to the same
extent and all Nulls are true? - Construct permuted samples n 1,,N
- Determine p-values pn for each sample
30Permutations for FDR - B-H Style
- Estimate p-values in the spirit of W-Y (but
without multiple testing correction - t.j is the permutation p-value for gene j
- N is the number of tests
- I is the number of permutations
- Apply B-H procedure to these p-values
31Permutations FDR Korn Style
- B-H procedure only guarantees long-term behavior
of method - can be quite badly wrong
- Korn addresses issue of correlations
32Moderated Tests
- Many false positives with t-test arise because of
under-estimate of variance - Most gene variances are comparable
- (but not equal)
- Can we use pooled information about all?
33Steins Lemma
- Whenever you have multiple variables with
comparable distributions, you can make a more
efficient joint estimator by shrinking the
individual estimates toward the common mean - Can formalize this using Bayesian analysis
- Suppose true values come from prior distrib.
- Mean of all parameter estimates is a good
estimate of prior mean
34SAM
- Statistical Analysis of Microarrays
- Uses a fudge factor to shrink individual SD
estimates toward a common value - di (x1,i x2,i / ( si s0)
- Patented!
35limma
- Empirical Bayes formalism
- Depends on prior estimate of number of genes
changed - Bioconductors approach free!