Title: Steilkurs in praktischer MikroarrayAnalyse
1Steilkurs in praktischer Mikroarray-Analyse
Mainz, 22.6.2006
2Differential Gene Expression
Setting Two or more types of samples (e.g.
treated cell lines, biopsies) For each sample we
have thousands of gene expresssion
levels Goal Which genes are differentially
expressed ?
3Trade Off or Everything is a Compromise
Sensitivity simply take all ?
Specificity simply take none ?
Here statistical testing framework - ranking of
genes (ordered list wrt up/down regulation) -
cut off, i.e. significant genes
4T-test difference of the means, but takes
variance into account
5(No Transcript)
6Multiple Testing
statistical test for each gene g yields a p-value
for each gene under the null hypothesis of no
differential expression p-value is a uniformly
distributed random number between 0 and 1 under
the null hypothesis of no differential expression
7Multiple Testing The Problem
Multiplicity problem thousands of hypotheses are
tested simultaneously. Increased chance of
false positives. E.g. suppose you have 10,000
genes on a chip and not a single one is
differentially expressed. You would expect
100000.01 100 of them to have a p-value lt
0.01.
8Andreas Buneß
Multiple Testing - Error Control
Null hypothesis no differential expression
between the two types/groups
R all which are called significant
V Type I error, false positives (FP)
T Type II error, miss, false negatives (FN)
power of the test
9What kind of error do we exactly want to control
? error concepts, like FWER, FDR
How to we achieve this error control
? procedures or methods, like Bonferroni, Holm,
Benjamini-Hochberg, ... R packages
multtest, qvalue / limma,
samr
10Family Wise Error Rate (FWER)
FWER Pr(V gt 0) The probability of at least one
Type I error (false positive) among the genes
selected as significant.
11False Discovery Rate (FDR)
FDR E(Q), where QV/R if R gt 0 and Q0 if
R 0
The expected proportion of Type I errors among
the rejected hypotheses. The expected proportion
of false positives among all genes called
significant.
12p-values refer to a single gene FDR and FWER
error control refers to a list of genes (the FDR
corresponding to the ordered list up to the
particular gene is often referred to as its
q-value)
13FDR Horizontal cutoff
14FWER The Bonferroni Correction
15Example
Golub data, 27 ALL vs. 11 AML samples, 3,051
genes.
98 genes with Bonferroni-adjusted p lt 0.05, praw
lt 0.000016
16FDR The Optimal Discovery Procedure
John Storey, software edge
17FWER or FDR ?
Choose FWER control if high confidence in all
selected genes is desired. Choose FDR control
if a certain proportion of false positives is
tolerable - frequently used in practice.
18Statistical Tests
- Standard t-test assumes normally distributed
data in each class (almost always questionable,
but may be a good approximation), equal variances
within classes - Welch t-test as above, but allows for unequal
variances - Wilcoxon test nonparametric, rankbased
- Permutation test estimate the distribution of
the test statistic (e.g., the t-statistic) under
the null hypothesis by permutations of the sample
labelsThe pvalue is given as the fraction of
permutations yielding a test statistic that is at
least as extreme as the observed one.
19 SAM (significance analysis of microarrays)
permutation test regularized
t-statistics multiple testing correction R
package samr
20T-test difference of the means, but takes
variance into account
21Few replicates regularized/moderated
tstatistics
- With the ttest, we estimate the variance of each
gene individually. This is fine if we have enough
replicates, but with few replicates (say 25 per
group), the variance estimates are unstable. - In a moderated tstatistic, the estimated
genespecific variance s2g is augmented with s20,
a global variance estimator obtained from pooling
all genes. This gives an interpolation between
the tstatistic and a foldchange criterion
22Permutation tests
test statistic
true class labels
null distribution of test statistic
2.2
(random) permutations of class labels
1.5 -0.4 2.3 0.7 0.2 -1.2
2.2
23SAM typical plot
Expected random score vs observed scores
Deviations from the main diagonal are evidence
for differentially expressed genes
24What you typically observe
No differential gene expression
A lot of differential gene expression
Global changes in gene expression
25Statistical tests Different settings
- comparison of two classes (e.g. tumor vs. normal)
- paired observations from two classes e.g. the
ttest for paired samples is based on the
withinpair differences. - more than two classes and/or more than one factor
(categorical or continuous) tests may be based
on linear models
paired samples
26LIMMA (Linear models for microarray data)
moderated t-statistic multiple testing
correction analysis of complex designs and
factorial experiments
27Linear models
- Linear models are a flexible framework for
assessing the associations of phenotypic
variables with gene expression. - The expression yi of a given gene in sample i is
modeled as linearly depending on one or several
factors (e.g. cell type, treatment, encoded in
xij) of the sample yi
a1xi1 amxim ei. - Estimated coefficients aj and their standard
errors are obtained using least squares, assuming
normally distributed errors ei (R function lm)
or with a robust method (R function rlm).
28Linear models
- Contrasts, that is, differences/linear
combinations of the coefficients, express the
differences between phenotypes and can be tested
for significance (ttest). - Example Consider a study of three different
types of kidney cancer. For each gene set up a
linear model yi a1xi1 a2xi2
a3xi3 ei,where xij 1 if tumor sample i is
of type j, and 0 otherwise. - The least squares estimates of the coefficients
ai are the mean expression levels in the classes. - The contrast a1 - a2 expresses the mean
difference between class 1 and 2.
29Linear model analysis with the Bioconductor
package limma
- The phenotype information for the samples is to
be entered as a design matrix (xij from the above
formula). The rows of the matrix correspond to
the samples, and the columns to the coefficients
of the linear model. - Contrasts are extracted after fitting the linear
model. - The significance of contrasts is assessed with a
moderated tstatistic.
30Experimental Design Complex Designs
Aim of the experiment Robustness Extensibility Ef
ficiency
31Aim of the experiment
Major focus differential expression treatment
vs. control (multiple treatments) tumor vs.
normal (tumor subtypes) time series of multiple
treatments Well defined goal or competing goals
? several comparisons (different
subgroups) one factor per comparison (skipping
others) statistical modelling (various
factors) exact subdivision of all samples
32Efficiency
statistical efficiency (pooling, direct
indirect) cost efficiency (microarray, mRNA
source)
33Replicates required for statistical
inference independent biological
replicates technical replicates may occur
on different levels in the experimental
hierarchy Pooling limited number of mRNA source
or limited number of microarrays
? amplification ? independent pools to estimate
variance one mRNA source may spoil the whole pool
34Blocking Factors ? technical factors (slide
batches, hybridisation day, labelling days)
should not be (completely) confounded with your
comparison of interest Randomisation control
of unknown covariates Balancing control of all
covariates (all are known) Statistical modelling
35single channel/one color microarrays (e.g.
Affymetrix) experimental design one
independent biological sample per microarray/hyb
ridisation pooling/blocking ? two color
microarrays (e.g. spotted cDNA arrays) experiment
al design may become more complex pooling/blockin
g ?
36Dye effect (two color microarrays) different
(gene-specific) labelling efficiencies
37Two colour microarrays
Typical designs reference design loop
design dye-swaps
38Graphical Representation two colour microarrays
node mRNA sample edge hybridisation direction
dye assignment.
39Dye-Swap/(mini-) Loop design
Reference design
A
R
B
two groups A and B independent biological
replicates of A and B
40Reference Design one independent biological
sample against the reference per
hybridizationmicroarray (possible exception
pooling) extendable (e.g. ongoing study) any
unknown/unpredictable comparisons same
efficiency for any comparisons analysis as for
single channel microarray experiments no dye
effect often used with large sample
size simple, i.e. minimizes experimental
confusion
41Dye Swap Design
often refers to a technical replicate where two
slides are used for two samples each labelled
twice (red/green) sometimes used to control the
dye effect, i.e. the different labelling
efficiencies for each gene, via averaging of the
two slides prefer "biological" replicates
42Dye Swap Design
43Loop Design
well defined experimental setting/comparisons
often relatively small smaple size (due to
manageability) requires statistical modelling in
general dye effect is addressed with the
statistical model
44(Mini-) Loop Design
Design matrix A-B dye
A1
B1
B2
A2
(
)
1 -1 1 -1
1 1 1 1
A3
B3
B4
A4
45C
A
C
B
A
B
R
A-R B-R C-R Contrast A-BA-R-(B-R)
C-A A-B (direct) B-C A-BB-C-(C-A) (indirect)
46Recommendations
Large patient sample collective reference
design Unknown/unpredictable comparisons
reference design Small scale cell-line
experiments direct comparisons
Recall statistical analysis is limited by the
number of independent biological replicates !
47(No Transcript)
48- Thanks to ...
- Anja von Heydebreck,
- Rainer Spang
- Tim Beißbarth
- for some of the slides.
49Links
www.r-project.org/ www.bioconductor.org/ bioinf.w
ehi.edu.au/limma/ www-stat.stanford.edu/tibs/SAM
/ www.biostat.washington.edu/software/jstorey/edg
e/ NGFN course material http//compdiag.molgen.m
pg.de/ngfn/
50References
- Y. Benjamini and Y. Hochberg (1995). Controlling
the false discovery rate a practical and
powerful approach to multiple testing. Journal of
the Royal Statistical Society B, Vol. 57,
289300. - S. Dudoit, J.P. Shaffer, J.C. Boldrick (2003).
Multiple hypothesis testing in microarray
experiments. Statistical Science, Vol. 18,
71103. - J.D. Storey and R. Tibshirani (2003). SAM
thresholding and false discovery rates for
detecting differential gene expression in DNA
microarrays. In The analysis of gene expression
data methods and software. Edited by G.
Parmigiani, E.S. Garrett, R.A. Irizarry, S.L.
Zeger. Springer, New York. - V.G. Tusher et al. (2001). Significance analysis
of microarrays applied to the ionizing radiation
response. PNAS, Vol. 98, 51165121. - M. Pepe et al. (2003). Selecting differentially
expressed genes from microarray experiments.
Biometrics, Vol. 59, 133142.
51References
- T. P. Speed and Y. H Yang (2002). Direct versus
indirect designs for cDNA microarray experiments.
Sankhya The Indian Journal of Statistics, Vol.
64, Series A, Pt. 3, pp 706-720 - Y.H. Yang and T. P. Speed (2003). Design and
analysis of comparative microarray Experiments In
T. P Speed (ed) Statistical analysis of gene
expression microarray data, Chapman Hall. - R. Simon, M. D. Radmacher and K. Dobbin (2002).
Design of studies using DNA microarrays. Genetic
Epidemiology 2321-36. - F. Bretz, J. Landgrebe and E. Brunner (2003).
Efficient design and analysis of two color
factorial microarray experiments. Biostaistics. - G. Churchill (2003). Fundamentals of experimental
design for cDNA microarrays. Nature genetics
review 32490-495. - G. Smyth, J. Michaud and H. Scott (2003) Use of
within-array replicate spots for assessing
differential experssion in microarray
experiments. Technical Report In WEHI. - Glonek, G. F. V., and Solomon, P. J. (2002).
Factorial and time course designs for cDNA
microarray experiments. Technical Report,
Department of Applied Mathematics, University of
Adelaide. 10/2002
52(No Transcript)
53Verschiedene Designs 4 Bedingungen
A.B
B
A
C
A
A.B
B
C
C
A
C
A
A.B
B
A.B
B
54Graphische Representation
- Die Struktur des Graphen legt fest, welche
Effekte geschätzt werden können und wie präzise
die Schätzungen sind. - Zwei mRNA Samples können nur verglichen werden,
wenn es einen Pfad gibt, welcher die zugehörigen
Knoten verbindet. - Die Präzision der geschätzten Kontraste hängt
direkt von der Anzahl der Pfade, welche die
Knoten verbinden und zur Länge dieser Pfade. - Direkte Vergleiche auf demselben Slide geben
präzisere Messungen als indirekte Vergleiche.
55Strong control any combination of true and
false hypotheses
Weak control complete null hypotheses, all
null hypotheses in the family are true