Title: Some views on microarray experimental design
1Some views on microarray experimental design
- Rainer Breitling
- Molecular Plant Science Group Bioinformatics
Research Centre - University of Glasgow, Scotland, UK
2Personal Background
- University of Glasgow, Scotland, UK
- Molecular Plant Sciences Group
- Bioinformatics Research Centre
- Functional Genomics Facility
3Some common questions in microarray experimental
design
- How many arrays will I need?
- Should I pool my samples?
- Which arrays should I choose?
- Which samples should I put together on one array?
4Why are microarrays special?
- produce large amounts of data instantaneously
- can look for unexpected effects
- are still quite expensive
- ?almost never repeated
- ?careful design necessary before you start
5How many replicates?
- as many as possible
- Statistics says The more replicates, the better
your estimate of expression (thats an asymptotic
process, so if you add at least a few replicates,
the effect will be really strong)
6How many replicates?
- a significance level (probability of detecting
FP) - 1-ß power to detect differences (probability of
detecting TP) - s standard deviation of the log-ratios
- d detectable difference between class mean
log-ratios - z percentile of standard normal distribution
- ? n required number of arrays (reference design)
7How many replicates?
- Five
- Experience shows For most common experiments you
get a reasonable list of differentially expressed
genes with 5 replicates
8How many replicates?
- Three
- One to convince yourself, one to convince your
boss, one just in case...
9How many replicates?
- It depends on
- the quality of the sample
- the magnitude of the expected effect
- the experimental design
- the method of analysis
10The quality of the sample
- smaller samples (single cells) are more noisy
than large samples (tissue homogenates) - cell cultures are less noisy than patient
biopsies - sample pooling can decrease noise if individual
variation is not of interest
11The magnitude of the effect
- Microarrays are very sensitive
- To keep effects small
- use early time points, gentle stimuli
- never compare dogs and donuts
- if you get a list of 2000 genes that are
significantly changed, your experiment failed!
12The magnitude of the effect
- some problematic cases
- stably transfected cell lines (are they still the
same cells?) - knock-out organisms (even the same tissue can be
a different) - local changes may be diluted ?? cell isolation
will increase noise
13The experimental design
- Three major options
- reference design (flexible)
- balanced block design (efficient)
- loop design (elegant)
14The experimental design
- loop designs can save samples...
- ...but they can cause interpretation nightmares
in less simple cases (use for large studies, if
you have a full-time statistician in the team)
B
C
D
A
A
B
R
R
R
R
C
D
15The method of analysis
- Golub et al. (1999) data set
- 38 leukemia patient bone marrow samples,
hybridized individually to Affymetrix microarrays - Differential expression between two leukemia
types was examined, using random subsets of the
complete dataset
16The method of analysis
 0h 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h
  6144 - purine base metabolism 6099 - tricarboxylic acid cycle 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 6099 - tricarboxylic acid cycle
   9277 - cell wall (sensu Fungi) 3773 - heat shock protein activity 5749 - respiratory chain complex II (sensu Eukarya) 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity
   297 - spermine transporter activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 5977 - glycogen metabolism 5749 - respiratory chain complex II (sensu Eukarya)
   15846 - polyamine transport 297 - spermine transporter activity 8177 - succinate dehydrogenase (ubiquinone) activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone
    4373 - glycogen (starch) synthase activity 3773 - heat shock protein activity 4373 - glycogen (starch) synthase activity 8177 - succinate dehydrogenase (ubiquinone) activity
    15846 - polyamine transport 4373 - glycogen (starch) synthase activity 4129 - cytochrome c oxidase activity 6537 - glutamate biosynthesis
    5353 - fructose transporter activity 7039 - vacuolar protein catabolism 5751 - respiratory chain complex IV (sensu Eukarya) 6097 - glyoxylate cycle
    15578 - mannose transporter activity 6950 - response to stress 5749 - respiratory chain complex II (sensu Eukarya) 5750 - respiratory chain complex III (sensu Eukarya)
    7039 - vacuolar protein catabolism 4129 - cytochrome c oxidase activity 6121 - oxidative phosphorylation, succinate to ubiquinone 9060 - aerobic respiration
    8645 - hexose transport 5751 - respiratory chain complex IV (sensu Eukarya) 8177 - succinate dehydrogenase (ubiquinone) activity 4129 - cytochrome c oxidase activity
iterative GroupAnalysis (iGA)
17respiratory chain complex II
glyoxylate cycle
citrate (TCA) cycle
oxidative phosphorylation (complex V)
Graph-based iterative GroupAnalysis (GiGA)
respiratory chain complex III
18What is a good replicate?
- The experiment your competitor at the other side
of the globe would do to see if your results are
reproducible - Vary all parameters challenge your results
- Prepare new samples, from new cultures, using new
buffers and new graduate students - Remember to produce matched controls
19What is a bad replicate?
- technical replicates (i.e. hybridizing the same
sample repeatedly) - dye-swapping experiments (usually gene-specific
dye bias is not a big issue, and dye balancing is
more efficient anyway) - pooled samples, hybridized repeatedly
- the same preparation, only labelled twice
20Should samples be pooled?
- most samples are already pooled they come from
multiple cells - pool to increase amount of mRNA, but only as much
as necessary - prepare independent pools to assess variation
- problems bias, contamination, outliers,
information loss...
21Which arrays are the best?
- Standard arrays
- compare and exchange data easily
- Whole-genome arrays
- detect unexpected effects, increase confidence
- Single-color arrays (Affymetrix GeneChip)
- for more complex comparisons
- Annotated arrays
22Further reading
- Dobbin, Shih Simon (2003) J. Natl. Cancer Inst.
95 1362. - Yang Speed (2002) Nature Rev. Genet. 3 579.
- Breitling (2004) http//www.brc.dcs.gla.ac.uk/rb1
06x/microarray_tips.htm
23Contact
Rainer Breitling Bioinformatics Research
Centre Davidson Building A416 R.Breitling_at_bio.gla.
ac.uk http//www.brc.dcs.gla.ac.uk/rb106x