Multiple Hypothesis Testing in Microarray Data Analysis - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Multiple Hypothesis Testing in Microarray Data Analysis

Description:

... for all H0j , assess each null hypothesis using a rejection region which is ... adjustments depend on the observed data , construct reject regions based ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 35

Provided by: Jia110

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Hypothesis Testing in Microarray Data Analysis

1
Multiple Hypothesis Testing in Microarray Data
Analysis

Caixia Xu

2
outline

Introduction
Multiple hypothesis testing framework
Analyses of Dataset GSE5245

3
INTRODUCTION

Special problems that arise from the multiplicity
aspect include defining an appropriate Type I
error rate and devising powerful multiple testing
procedures that control this error rate and
account for the joint distribution of the test
statistics
develop resampling-based single-step and stepwise
multiple testing procedures (MTP) for controlling
a broad class of type I error rates

4
Multiple hypothesis testing framework
5
Provide data set

X1 , . . . ,Xn be a random sample of n
independent and identically distributed (i.i.d.)
random variables, X P M, where M is a set of
possibly non-parametric distribution and P is the
data generating distribution
pairs (Xi, Yi)i1...,n, formed by the
expression profiles Xi (X is a vector of gene
expression measurements) and responses or
covariates Yi

6
Define parameters of interest

Parameters are defined as arbitrary functions of
the unknown data generating distribution P
?(p)?(?(m)m1,,M) where M is the number of
genes
Parameters of interest include (functions of )
means, differences in means, correlations, and
regression parameters

7
(No Transcript)
8
Define null and alternative hypotheses, Ho(m) and
H1(m)

M null hypotheses Ho(m) IP M(m)
Alternative hypotheses
E.g. 1. H0j gene j has equal mean expression
levels in K different types of tumors
E.g. 2. H0j gene j is not associated with
survival for a particular type of cancer
E.g. 3. H0j H0(1) H1(1) H0(m) 0

9
Specify test statistics, Tn(m)

depend on the experimental design and the type of
response or covariate
binary covariates t-statistic
two-sample Welch t-statistics
polytomous responses F-statistic
x2 statistics or likelihood ratio statistics

10
Estimate test statistics null distribution, Qon

In practice, the true distribution Qn Qn(P),
for the test statistics Tn is unknown and
estimated by a null distribution Q0
resampling procedures (e.g., bootstrap and
permutation) are particularly useful to estimate
null distribution

11
(No Transcript)
12
Select Type I error rate
13
Select Type I error rate

The family-wise error rate (FWER) FWER Pr(V gt
1).
Generalized family-wise error rate
(gFWER)gFWER(k) Pr(V gt k) 1 - FV (k).
The false discovery rate (FDR) FDR E(V/R),
Tail probabilities for the proportion of false
positives (TPPFP) TPPFP(q) Pr(V/R gt q) 1 -
FV/R (q), q (0, 1)

14
Apply MTP

Single-step procedures equivalent adjustments
for all H0j , assess each null hypothesis using a
rejection region which is independent of the
tests of other hypothesis.
Stepwise procedures adjustments depend on the
observed data , construct reject regions based on
the acceptance/rejection of other hypotheses , be
applied to smaller nested subsets of tests.

15
Apply MTP

maxT- based on ordered test statistics
minP based on ordered P-values
step-down - correspond to the most significant
test statistics
step-up - correspond to the least significant
test statistics

16
Analyses of Dataset GSE5245

parameters of interest means of gene j
expression level
H0 gene j has equal mean expression levels in
two different treatments
using two-sample Welch t-statistics

17
Bootstrap vs. permutation
The reason the sample sizes are not equal for
the two groups Or the expression measures may
have different covariance structures in the two
populations
18
memory.limit(size4000) windows(recordT) library(
affy) cels lt- list.celfiles("E\\GSE5245") cels da
ta lt- ReadAffy(celfile.path"E\\GSE5245") abatch.
raw lt- data rma.eset lt- rma(abatch.raw) small.eset
lt- exprs (rma.eset) T.cell lt- c(rep(0,5),rep(1,11
)) 0 for None 1 for transient or persistent
filter keep genes with cv between .7 and 10,
and where 20 of samples had exprs. gt
100 library(genefilter) e.mat lt- 2 small.eset
ffun lt- filterfun(pOverA(0.20,100),
cv(0.7,10)) t.fil lt- genefilter(e.mat,ffun) small.
eset lt- log2(e.matt.fil,) dim(e.mat) dim(small.e
set) 1 532 16
19
permutation resampling mlt-MTP(Xsmall.eset,YT.c
ell,typeone'fwer',B100,method'sd.maxT', nulldis
t'perm',seed99) summary(m) m.diff lt-
m_at_adjplt0.05 sum(m.diff) 1 55 r lt-
m_at_reject sum(r) 1 55 bootstrap
resampling m3lt-MTP(Xsmall.eset,YT.cell,typeone'
fwer',B100,method'sd.maxT', nulldist'boot',seed
1) summary(m3) r3 lt- m3_at_reject sum(r3) 37 m3.diff
lt- m3_at_adjplt0.05 sum(m3.diff) 37
20
FWER vs. gFWER vs. TPPFP vs. FDR
The results illustrate that stepwise MTPs are
less conservative than their single-step
analogues because the numbers of genes rejected
from stepwise MTPs are bigger than that from
their single-step analogues.
21
m1lt-MTP(Xsmall.eset,YT.cell,typeone'fwer',B100
,method'ss.maxT', seed1) summary(m1) r1 lt-
m1_at_reject sum(r1) 35 m2lt-MTP(Xsmall.eset,YT.cell
,typeone'fwer',B100,method'ss.minP',
seed1) summary(m2) r2 lt- m2_at_reject sum(r2) 129 m3
lt-MTP(Xsmall.eset,YT.cell,typeone'fwer',B100,m
ethod'sd.maxT', seed1) summary(m3) r3 lt-
m3_at_reject sum(r3) 37 m4lt-MTP(Xsmall.eset,YT.cell
,typeone'fwer',B100,method'sd.minP',
seed1) summary(m4) r4 lt- m4_at_reject sum(r4) 129
22
m5lt-MTP(Xsmall.eset,YT.cell,typeone'gfwer',B10
0,method'ss.maxT', k5, seed1) summary(m5) r5
lt- m5_at_reject sum(r5) 40 m6lt-MTP(Xsmall.eset,YT.c
ell,typeone'gfwer',B100,method'ss.minP', k5,
seed1) summary(m6) r6 lt- m6_at_reject sum(r6) 134 m
7lt-MTP(Xsmall.eset,YT.cell,typeone'gfwer',B100
,method'sd.maxT', k5, seed1) summary(m7) r7
lt- m7_at_reject sum(r7) 42 m8lt-MTP(Xsmall.eset,YT.c
ell,typeone'gfwer',B100,method'sd.minP', k5,
seed1) summary(m8) r8 lt- m8_at_reject sum(r8) 134
23
m9lt-MTP(Xsmall.eset,YT.cell,typeone'tppfp',B10
0,method'ss.maxT',q0.1, seed1) summary(m9) r9
lt- m9_at_reject sum(r9) 38 m10lt-MTP(Xsmall.eset,YT.
cell,typeone'tppfp',B100,method'ss.minP',q0.1,
seed1) summary(m10) r10 lt- m10_at_reject sum(r10)
143 m11lt-MTP(Xsmall.eset,YT.cell,typeone'tppfp'
,B100,method'sd.maxT',q0.1, seed1) summary(m1
1) r11 lt- m11_at_reject sum(r11) 41 m12lt-MTP(Xsmall.
eset,YT.cell,typeone'tppfp',B100,method'sd.min
P',q0.1, seed1) summary(m12) r12 lt-
m12_at_reject sum(r12) 143
24
m13lt-MTP(Xsmall.eset,YT.cell,typeone'fdr',B100
,method'ss.maxT', seed1) summary(m13) r13 lt-
m13_at_reject sum(r13) 25 m14lt-MTP(Xsmall.eset,YT.c
ell,typeone'fdr',B100,method'ss.minP',
seed1) summary(m14) r14 lt- m14_at_reject sum(r14) 13
2 m15lt-MTP(Xsmall.eset,YT.cell,typeone'fdr',B1
00,method'sd.maxT', seed1) summary(m15) r15 lt-
m15_at_reject sum(r15) 25 m16lt-MTP(Xsmall.eset,YT.c
ell,typeone'fdr',B100,method'sd.minP',
seed1) summary(m16) r16 lt- m16_at_reject sum(r16) 13
2
25
How many genes are in common among these methods?

Ratiothe number of genes rejected in common
between two methods / the number of genes
rejected in the second method

26
(No Transcript)
27
(No Transcript)
28
mbt.mat lt- cbind(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r1
1,r12,r13,r14,r15, r16) res lt- matrix(ncol16,
nrow16) for(i in 116) for(j in 116)
resi,j lt- sum(mbt.mat,imbt.mat,j)/sum(mbt.
mat,j) res library(cluster) library(RColorBr
ewer) hmcol lt- colorRampPalette(brewer.pal(10,"RdB
u"))(256) colnames(res) lt- rownames(res) lt-
c("fwer.ss.maxT","fwer.ss.minP", "fwer.sd.maxT","f
wer.sd.minP","gfwer.ss.maxT","gfwer.ss.minP", "gfw
er.sd.maxT","gfwer.sd.minP","tppfp.ss.maxT","tppfp
.ss.minP", "tppfp.sd.maxT","tppfp.sd.minP","fdr.ss
.maxT","fdr.ss.minP", "fdr.sd.maxT","fdr.sd.minP")
heatmap(res1, RowvNA,ColvNA)
29
the influence of k in gFWER
the number of rejected hypotheses increases
linearly with the number k of allowed false
positives
30
the influence of k in gFWER
k lt- c(5, 10, 50, 100) M3lt-MTP(Xsmall.eset,YT.ce
ll,typeone'fwer',B100, method 'sd.minP',
seed1) summary(m3) cyto.gfwer lt- fwer2gfwer(adjp
m3_at_adjp, k k) comp.gfwer lt- cbind(m3_at_adjp,
cyto.gfwer) mtps lt- paste("gFWER(", c(0, k), ")",
sep "") mt.plot(adjp comp.gfwer, teststat
m24btt_at_statistic, proc mtps, leg c(0.5,
200), col 15,lty 15, lwd
3) title("Comparison of gFWER(k)-controlling
AMTPs based on SD minP MTP")
31
the influence of q in TPPEP
The result the number of rejections increases
with the allowed proportion q of false positives,
though not linearly.
32
q lt- c(0.05, 0.1, 0.5) m3lt-MTP(Xsmall.eset,YT.ce
ll,typeone'fwer',B100, method 'sd.minP',
seed1) summary(m3) cyto.tppfp lt- fwer2tppfp(adjp
m3_at_adjp,q q) comp.tppfp lt- cbind(m3 _at_adjp,
cyto.tppfp) mtps lt- c("FWER", paste("TPPFP(", q,
")", sep "")) mt.plot(adjp comp.tppfp,
teststat m3 _at_statistic, proc mtps, leg
c(0.5, 200), col 14, lty 14, lwd
3) title("Comparison of TPPFP(q)-controlling
AMTPs based on SD minP MTP")
33
Summary

control of an appropriate and precisely defined
Type I error rate
control of this error rate under any combination
of true and false null hypotheses
accounting for the joint distribution of the test
statistics
reporting the results in terms of adjusted
p-values
availability of efficient resampling algorithms
for nonparametric procedures

34
References

Chapters 15 of Bioconductor Monograph
Sandrine Dudoit. et.al Multiple Hypothesis
Testing in Microarray Data Analysis
Yongchao Ge et.al Resampling-based multiple
testing for microarray data analysis
Katherine S. Pollard et.al Applications of
Multiple Testing Procedures ALL Data
Sandrine Dudoit et.al statistical science 18(1),
71-103

Write a Comment

User Comments (0)