Title: Design of microarray gene expression profiling experiments
1Design of microarray gene expression profiling
experiments
2Lay-out
- Practical considerations
- Pooling
- Randomization
- One-color vs Two-colors
- Two-color hybridization designs
- Ratio-based vs Intensity-based analysis
3Think before you start
- research question
- choice of technology
- controls and replicates
- Ref Churchill. 2002. Nature Genetics Supplement
32 490-495
4Research question
- Limit your (initial) number of question /
conditions - choose best timepoint for mRNA regulation
- can be different from protein/activity
- pilots using RT-qPCR
- experimental follow-up
- what will you do with the data?
- verification of differential gene expression
- in vitro experiments to study mechanism
- "in vivo" verification in tissue sections
5Choice of technology
- What is affordable?
- Do a pilot to estimate the variance for your
samples, experimental set-up and platform - Calculate your power What is the lower border of
the effect size that you can pick up?
6Controls
- positive genes whose regulation is known
- check on biological experiment data analysis
- positive spikes in mRNA and/or hyb mix
- check labeling procedure and hybridization
- detection range (sensitivity) and dynamic range
- "landing lights" for gridding software
- negative controls non-specific binding
- check cross-hybridization buffer, non-homologous
DNA
7Spikes
Spiked 2-fold change (copies/cell)
Spiked 3-fold change (copies/cell)
RCA
Cab
rbcL
LTP4
LTP6
XCP2
RPC1
NAC1
TIM
PRK
2 1
10 5
60 30
100 50
300 150
3 1
15 5
60 20
150 50
300 100
cDNA probe synth. hybridize
8Spikes
Van de Peppel et al. EMBO Reports 4, 387 (2003)
9Controls
- positive genes whose regulation is known
- check on biological experiment data analysis
- positive spikes in mRNA and/or hyb mix
- check labeling procedure and hybridization
- detection range (sensitivity) and dynamic range
- "landing lights" for gridding software
- negative controls non-specific binding
- check cross-hybridization buffer, non-homologous
DNA
10Replicates
- Include sufficient replicates, based on pilot
experiment - Biological replicates are preferred over
technical replicates - Control experimental variables with possible
unintended effects - genetic background
- gender
- age
11Randomization
- Randomize samples with respect to experimental
influences - experimenter
- day of hybridization
- batch of arrays
- dye
- etc
12Pooling
- Often done because of lack of sufficient amounts
of RNA, but good amplification protocols are
available - Advantages
- dampening of individual variation, may increase
statistical power - Generally not recommended
- outliers in the population may result in large
and significant effects - information on the differences in the population
is lost and is probably biologically relevant - in fact, it is an artificial way to increase the
significance of your findings
13Hybridization design
- One color not many difficulties expected
- Two color what to hybridize with what in which
color? - Reference design
- Paired design
- Loop design
- Mixed design
-
- Read Yang Speed (2002). Design issues for cDNA
microarray experiments. Nature Reviews Genetics
3, 579-588
14Hybridization design general issues
- Comparisons on the same array are more precise
than comparisons on different arrays - Identify most important comparisons
- Hybridize those on the same slide
- Dye swap
- A dye-effect is always there
- Balance designs with respect to dye (exception
some common reference designs)
15Common reference vs direct hybridizations
Variance log(A/B) for slide s2 then the
variance of the average of the two measurements
is s2 /2
B
A
A
log(A/B) log(A/R) log(B/R) and variance of
log(A/B) is variance log(A/R) variance
log(B/R) s2 s2 2 s2
R
B
16More samples
A
A
R
B
B
C
C
Log (A/B) 2/3 log (A/B) 1/3 log (A/C) log
(B/C) Assuming that all variances are
equal Variance log(A/B) 4/9 (s2 / 2) 1/9
(s2) 1/3 s2
Variance log(A/B) Variance log(A/C)
Variance log(B/C) 0.5s2 0.5s2 s2
17Common reference vs direct hybridizations
- Theoretical Considerations
- A design is optimal when it minimizes the
variance of the effect of interest - Look for designs leading to small variance of
log(A/B) - Practical considerations
- Common reference may be desired when experiment
is extended in the future or when a lot of
different conditions have to be compared - Choose a biologically relevant common reference
(say your control sample). In that case, your
ratios are of interest and better interpretable
18Time-course designs
- Take 4 time points
- T1 T2 T3 T4
- The best choice of design depends on the
comparisons of interest and on the number of
slides available
19Time-course designs
- Using 3 slides
- T1 T2 T3 T4
- which is the best to estimate changes relative
to the initial time point T2 / T1, T3 / T1,
T4 / T1
20Time-course designs
- Using 3 slides
- T1 T2 T3 T4
- which is the best to estimate relative changes
between successive time points T2 / T1, T3 /
T2, T4 / T3
21Time course designs
- Using 4 slides
- T1 T2 T3 T4
-
- R
- which is the reference design
- All comparisons have equal precision
22Time course design
- Using 4 slides
- T1 T2 T3 T4
- which is the loop design, balanced wrt dye
- Distant comparisons have lower precision
23Time course designs
- Using 4 slides
- T1 T2 T3 T4
- also uses exactly 2 hybridizations per treatment,
- balanced wrt dye.
- Most precise estimates 1/2, 1/3, 2/4, 3/4
24Factorial designs
- Designs for studies which involve factors as
explanatory variables - Age group
- gender
- Cell line
- Tumor types
25Factorial designs
- Glonek Solomon (2004)
- Admissible design using the same number of
arrays, there are no other designs yielding
smaller variances of all parameters
Glonek et al.Biostatistics 5, 89-111 (2004)
26Factorial design example
- Time
- 0h
- 24h
- Cell lines
- I (non-leukaemic)
- II (leukaemic)
- Find genes diff. expressed at 24 but not at 0
interaction between time and cell line
27Factorial design possible samples
- All combinations of factor levels. In this case,
4 are possible
28Factorial design analysis model
- (log-)linear model is used
- experimental conditions correspond to parameter
combinations as in
29Factorial design possible arrays
(2)
I,24
I,0
(3)
(6)
(4)
(1)
II,0
II,24
(5)
30Optimal admissible design
- Designs that are not worse than others, and for
which the variance of the parameter of interest
is (one of the) smallest - In the example wish to find admissible designs
for which the interaction term has one of the
smallest variances
31Glonek et al.Biostatistics 5, 89-111 (2004)
32Optimal admissible design
Glonek et al.Biostatistics 5, 89-111 (2004)
33Factorial designs conclusions
- Design with all pairwise comparisons is not the
best in this case - Best design can only be found with respect to a
model - if model does not fit the data well, design
choice may not be the best - make sure model chosen is adequate
34How to compare efficiently many different
conditions?
- Common reference not efficient
- Loop and mixed designs not all
- comparisons have equal precisions
GA Churchill, Nat Genet. 2002 Dec32 Suppl490-5
35Possible solution
- Randomized design
- Intensity-based rather than ratio-based
calculations
- Requires
- Hybridization of two samples independent no
competition for binding sites - Absence of large spot and array effects
- To be tested for each platform
36Our favourite platform
- Spotted collection of 65-mer oligonucleotides
(Sigma-Compugen collection) - 22K
37Design used to demonstrate independent hyb
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
38Distribution of signal intensities is similar
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
39Correlation of intensities is high
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
40Effect of addition of unlabelled target
Two targets on microarray
Single target on microarray
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
41Correlation of ratios calculated from different
hyb designs
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
42Intensity-based analysis
- Hybridizations of two targets on the array are
independent - No saturation and no competition
- Intensity readings show high inter-array
correlation - Comparisons on the same array have highest
precision and all other comparisons have equal
precision
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
43Example of randomized design
- Mouse models for muscular dystrophy
Turk et al. FASEB J 20, 127-129 (2006)
44Our design
- Randomly assign samples to the arrays, avoiding
co-hybridization of sample from the same group - 2 biological replicates
- 4 technical replicates (dye-swap replicate
spotting)
Turk et al. FASEB J 20, 127-129 (2006)
45Intensity-based analysis can go wrong
Vinciotti et al. Bioinformatics 21492-501 (2005)
46Intensity-based analysis can go wrong
Vinciotti et al. Bioinformatics 21492-501 (2005)
47Some guidelines
- First determine the main question, pointing out
the effect of interest - logA/B
- Then choose analysis model, so that effect
variance can be computed - VAR logA/B
- Practical constraints amount of RNA available,
number of hybridizations, number of slides - A good design measures the effect of interest as
accurately as possible - small VAR logA/B
48Some useful links
- http//dial.liacs.nl/Courses/CMSB20Courses.html
- http//www.brc.dcs.gla.ac.uk/rb106x/microarray_ti
ps.htm - http//exgen.ma.umist.ac.uk/course/notes/WitDesign
Lecture.pdf - http//discover.nci.nih.gov/microarrayAnalysis/Exp
erimental.Design.jsp
49Acknowledgements
Human and Clinical Genetics, LUMC Judith
Boer Renée de Menezes Rolf Turk Ellen
Sterrenburg Johan den Dunnen Gertjan van
Ommen Microarray facility Leiden Genome
Technology Center
50Case study
- Two genetically-modified zebrafish strains and
one wild-type - Defects mainly in muscle development
- Apparent at 12-48 hours of development early
death - Question which biological pathways are affected
and responsible for defective myogenesis?
51Possible platforms and budget
- Affymetrix (1-color) 500 euro per chip
- variance for ratio of two samples on two chips
s2 - Homespotted arrays (2-color) 100 euro per chip
- variance for ratio of two samples on one chip
2s2 - Budget 12,000 euro
52Questions
- Isolation of specific compartments / whole animal
lysates? - Pooling?
- How many replicates?
- Which hybridization design?
- What is the variance of the most important
comparisons?