Multiple Hypothesis Testing in Microarray Data Analysis - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Multiple Hypothesis Testing in Microarray Data Analysis

Description:

... for all H0j , assess each null hypothesis using a rejection region which is ... adjustments depend on the observed data , construct reject regions based ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 35
Provided by: Jia110
Category:

less

Transcript and Presenter's Notes

Title: Multiple Hypothesis Testing in Microarray Data Analysis


1
Multiple Hypothesis Testing in Microarray Data
Analysis
  • Caixia Xu

2
outline
  • Introduction
  • Multiple hypothesis testing framework
  • Analyses of Dataset GSE5245

3
INTRODUCTION
  • Special problems that arise from the multiplicity
    aspect include defining an appropriate Type I
    error rate and devising powerful multiple testing
    procedures that control this error rate and
    account for the joint distribution of the test
    statistics
  • develop resampling-based single-step and stepwise
    multiple testing procedures (MTP) for controlling
    a broad class of type I error rates

4
Multiple hypothesis testing framework
5
Provide data set
  • X1 , . . . ,Xn be a random sample of n
    independent and identically distributed (i.i.d.)
    random variables, X P M, where M is a set of
    possibly non-parametric distribution and P is the
    data generating distribution
  • pairs (Xi, Yi)i1...,n, formed by the
    expression profiles Xi (X is a vector of gene
    expression measurements) and responses or
    covariates Yi

6
Define parameters of interest
  • Parameters are defined as arbitrary functions of
    the unknown data generating distribution P
    ?(p)?(?(m)m1,,M) where M is the number of
    genes
  • Parameters of interest include (functions of )
    means, differences in means, correlations, and
    regression parameters

7
(No Transcript)
8
Define null and alternative hypotheses, Ho(m) and
H1(m)
  • M null hypotheses Ho(m) IP M(m)
  • Alternative hypotheses
  • E.g. 1. H0j gene j has equal mean expression
    levels in K different types of tumors
  • E.g. 2. H0j gene j is not associated with
    survival for a particular type of cancer
  • E.g. 3. H0j H0(1) H1(1) H0(m) 0

9
Specify test statistics, Tn(m)

  • depend on the experimental design and the type of
    response or covariate
  • binary covariates t-statistic
  • two-sample Welch t-statistics
  • polytomous responses F-statistic
  • x2 statistics or likelihood ratio statistics

10
Estimate test statistics null distribution, Qon
  • In practice, the true distribution Qn Qn(P),
    for the test statistics Tn is unknown and
    estimated by a null distribution Q0
  • resampling procedures (e.g., bootstrap and
    permutation) are particularly useful to estimate
    null distribution

11
(No Transcript)
12
Select Type I error rate
13
Select Type I error rate
  • The family-wise error rate (FWER) FWER Pr(V gt
    1).
  • Generalized family-wise error rate
    (gFWER)gFWER(k) Pr(V gt k) 1 - FV (k).
  • The false discovery rate (FDR) FDR E(V/R),
  • Tail probabilities for the proportion of false
    positives (TPPFP) TPPFP(q) Pr(V/R gt q) 1 -
    FV/R (q), q (0, 1)

14
Apply MTP
  • Single-step procedures equivalent adjustments
    for all H0j , assess each null hypothesis using a
    rejection region which is independent of the
    tests of other hypothesis.
  • Stepwise procedures adjustments depend on the
    observed data , construct reject regions based on
    the acceptance/rejection of other hypotheses , be
    applied to smaller nested subsets of tests.

15
Apply MTP
  • maxT- based on ordered test statistics
  • minP based on ordered P-values
  • step-down - correspond to the most significant
    test statistics
  • step-up - correspond to the least significant
    test statistics

16
Analyses of Dataset GSE5245
  • parameters of interest means of gene j
    expression level
  • H0 gene j has equal mean expression levels in
    two different treatments
  • using two-sample Welch t-statistics

17
Bootstrap vs. permutation
The reason the sample sizes are not equal for
the two groups Or the expression measures may
have different covariance structures in the two
populations
18
memory.limit(size4000) windows(recordT) library(
affy) cels lt- list.celfiles("E\\GSE5245") cels da
ta lt- ReadAffy(celfile.path"E\\GSE5245") abatch.
raw lt- data rma.eset lt- rma(abatch.raw) small.eset
lt- exprs (rma.eset) T.cell lt- c(rep(0,5),rep(1,11
)) 0 for None 1 for transient or persistent
filter keep genes with cv between .7 and 10,
and where 20 of samples had exprs. gt
100 library(genefilter) e.mat lt- 2 small.eset
ffun lt- filterfun(pOverA(0.20,100),
cv(0.7,10)) t.fil lt- genefilter(e.mat,ffun) small.
eset lt- log2(e.matt.fil,) dim(e.mat) dim(small.e
set) 1 532 16
19
permutation resampling mlt-MTP(Xsmall.eset,YT.c
ell,typeone'fwer',B100,method'sd.maxT', nulldis
t'perm',seed99) summary(m) m.diff lt-
m_at_adjplt0.05 sum(m.diff) 1 55 r lt-
m_at_reject sum(r) 1 55 bootstrap
resampling m3lt-MTP(Xsmall.eset,YT.cell,typeone'
fwer',B100,method'sd.maxT', nulldist'boot',seed
1) summary(m3) r3 lt- m3_at_reject sum(r3) 37 m3.diff
lt- m3_at_adjplt0.05 sum(m3.diff) 37
20
FWER vs. gFWER vs. TPPFP vs. FDR
The results illustrate that stepwise MTPs are
less conservative than their single-step
analogues because the numbers of genes rejected
from stepwise MTPs are bigger than that from
their single-step analogues.
21
m1lt-MTP(Xsmall.eset,YT.cell,typeone'fwer',B100
,method'ss.maxT', seed1) summary(m1) r1 lt-
m1_at_reject sum(r1) 35 m2lt-MTP(Xsmall.eset,YT.cell
,typeone'fwer',B100,method'ss.minP',
seed1) summary(m2) r2 lt- m2_at_reject sum(r2) 129 m3
lt-MTP(Xsmall.eset,YT.cell,typeone'fwer',B100,m
ethod'sd.maxT', seed1) summary(m3) r3 lt-
m3_at_reject sum(r3) 37 m4lt-MTP(Xsmall.eset,YT.cell
,typeone'fwer',B100,method'sd.minP',
seed1) summary(m4) r4 lt- m4_at_reject sum(r4) 129
22
m5lt-MTP(Xsmall.eset,YT.cell,typeone'gfwer',B10
0,method'ss.maxT', k5, seed1) summary(m5) r5
lt- m5_at_reject sum(r5) 40 m6lt-MTP(Xsmall.eset,YT.c
ell,typeone'gfwer',B100,method'ss.minP', k5,
seed1) summary(m6) r6 lt- m6_at_reject sum(r6) 134 m
7lt-MTP(Xsmall.eset,YT.cell,typeone'gfwer',B100
,method'sd.maxT', k5, seed1) summary(m7) r7
lt- m7_at_reject sum(r7) 42 m8lt-MTP(Xsmall.eset,YT.c
ell,typeone'gfwer',B100,method'sd.minP', k5,
seed1) summary(m8) r8 lt- m8_at_reject sum(r8) 134
23
m9lt-MTP(Xsmall.eset,YT.cell,typeone'tppfp',B10
0,method'ss.maxT',q0.1, seed1) summary(m9) r9
lt- m9_at_reject sum(r9) 38 m10lt-MTP(Xsmall.eset,YT.
cell,typeone'tppfp',B100,method'ss.minP',q0.1,
seed1) summary(m10) r10 lt- m10_at_reject sum(r10)
143 m11lt-MTP(Xsmall.eset,YT.cell,typeone'tppfp'
,B100,method'sd.maxT',q0.1, seed1) summary(m1
1) r11 lt- m11_at_reject sum(r11) 41 m12lt-MTP(Xsmall.
eset,YT.cell,typeone'tppfp',B100,method'sd.min
P',q0.1, seed1) summary(m12) r12 lt-
m12_at_reject sum(r12) 143
24
m13lt-MTP(Xsmall.eset,YT.cell,typeone'fdr',B100
,method'ss.maxT', seed1) summary(m13) r13 lt-
m13_at_reject sum(r13) 25 m14lt-MTP(Xsmall.eset,YT.c
ell,typeone'fdr',B100,method'ss.minP',
seed1) summary(m14) r14 lt- m14_at_reject sum(r14) 13
2 m15lt-MTP(Xsmall.eset,YT.cell,typeone'fdr',B1
00,method'sd.maxT', seed1) summary(m15) r15 lt-
m15_at_reject sum(r15) 25 m16lt-MTP(Xsmall.eset,YT.c
ell,typeone'fdr',B100,method'sd.minP',
seed1) summary(m16) r16 lt- m16_at_reject sum(r16) 13
2
25
How many genes are in common among these methods?
  • Ratiothe number of genes rejected in common
    between two methods / the number of genes
    rejected in the second method

26
(No Transcript)
27
(No Transcript)
28
mbt.mat lt- cbind(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r1
1,r12,r13,r14,r15, r16) res lt- matrix(ncol16,
nrow16) for(i in 116) for(j in 116)
resi,j lt- sum(mbt.mat,imbt.mat,j)/sum(mbt.
mat,j) res library(cluster) library(RColorBr
ewer) hmcol lt- colorRampPalette(brewer.pal(10,"RdB
u"))(256) colnames(res) lt- rownames(res) lt-
c("fwer.ss.maxT","fwer.ss.minP", "fwer.sd.maxT","f
wer.sd.minP","gfwer.ss.maxT","gfwer.ss.minP", "gfw
er.sd.maxT","gfwer.sd.minP","tppfp.ss.maxT","tppfp
.ss.minP", "tppfp.sd.maxT","tppfp.sd.minP","fdr.ss
.maxT","fdr.ss.minP", "fdr.sd.maxT","fdr.sd.minP")
heatmap(res1, RowvNA,ColvNA)
29
the influence of k in gFWER
the number of rejected hypotheses increases
linearly with the number k of allowed false
positives
30
the influence of k in gFWER
k lt- c(5, 10, 50, 100) M3lt-MTP(Xsmall.eset,YT.ce
ll,typeone'fwer',B100, method 'sd.minP',
seed1) summary(m3) cyto.gfwer lt- fwer2gfwer(adjp
m3_at_adjp, k k) comp.gfwer lt- cbind(m3_at_adjp,
cyto.gfwer) mtps lt- paste("gFWER(", c(0, k), ")",
sep "") mt.plot(adjp comp.gfwer, teststat
m24btt_at_statistic, proc mtps, leg c(0.5,
200), col 15,lty 15, lwd
3) title("Comparison of gFWER(k)-controlling
AMTPs based on SD minP MTP")
31
the influence of q in TPPEP
The result the number of rejections increases
with the allowed proportion q of false positives,
though not linearly.
32
q lt- c(0.05, 0.1, 0.5) m3lt-MTP(Xsmall.eset,YT.ce
ll,typeone'fwer',B100, method 'sd.minP',
seed1) summary(m3) cyto.tppfp lt- fwer2tppfp(adjp
m3_at_adjp,q q) comp.tppfp lt- cbind(m3 _at_adjp,
cyto.tppfp) mtps lt- c("FWER", paste("TPPFP(", q,
")", sep "")) mt.plot(adjp comp.tppfp,
teststat m3 _at_statistic, proc mtps, leg
c(0.5, 200), col 14, lty 14, lwd
3) title("Comparison of TPPFP(q)-controlling
AMTPs based on SD minP MTP")
33
Summary
  • control of an appropriate and precisely defined
    Type I error rate
  • control of this error rate under any combination
    of true and false null hypotheses
  • accounting for the joint distribution of the test
    statistics
  • reporting the results in terms of adjusted
    p-values
  • availability of efficient resampling algorithms
    for nonparametric procedures

34
References
  • Chapters 15 of Bioconductor Monograph
  • Sandrine Dudoit. et.al Multiple Hypothesis
    Testing in Microarray Data Analysis
  • Yongchao Ge et.al Resampling-based multiple
    testing for microarray data analysis
  • Katherine S. Pollard et.al Applications of
    Multiple Testing Procedures ALL Data
  • Sandrine Dudoit et.al statistical science 18(1),
    71-103
Write a Comment
User Comments (0)
About PowerShow.com