Title: Design and Analysis of
1Design and Analysis of Microarray Experiments
at CSIRO Livestock Industries
Toni Reverter Bioinformatics Group CSIRO
Livestock Industries Queensland Bioscience
Precinct 306 Carmody Rd., St. Lucia, QLD 4067,
Australia
SSAI QLD Branch 6 Apr. 2004
2Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
CONTENTS
Slides
Minutes
- Introduction 4 6
- Technical Concerns .... 2 7
- Designs ... 21 15
- Analysis .. 14 16
- Coverage and Sensitivity .... 5 7
- Summary .... 2 4
SSAI QLD Branch 6 Apr. 2004
3Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
1. Introduction
1.a The Material
SSAI QLD Branch 6 Apr. 2004
4Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
1. Introduction
1.b - The Method
SSAI QLD Branch 6 Apr. 2004
5Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
1. Introduction
1.c - The Challenge
Data Dependent
Time Dependent
Human Dependent
Chronology
Skill Integration
Paradigm
SSAI QLD Branch 6 Apr. 2004
6Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
1. Introduction
1.c Human-Dependent Challenge
The Biologist and the Statistician are being
executed. They are both granted one last request.
JOKE
The Statistician asks that he/she be allowed to
give one final lecture on his/her Grand Theory of
Statistics.
The Biologist asks that he/she be executed first.
- Biologists dont care 10
- Statisticians are bad . 20
- Unrealistic expectations 70
SSAI QLD Branch 6 Apr. 2004
7Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
2. Technical Concerns
- Biochemist Level
- Preparation (Printing) of the Chip
- RNA Extraction, Amplification and Hybridisation
- Optical Scanner (Reading)
- Quantitative Level
- Design
- Image (data) Quality
- Data Analysis
- Data Storage
Note Randomisation intentionally neglected.
SSAI QLD Branch 6 Apr. 2004
8Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
2. Technical Concerns
2.a Data Quality GP3xCLI
2.b Storage GEXEX
SSAI QLD Branch 6 Apr. 2004
9Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Key Issues
- Identify/Prioritise Questions
- N of Available Samples
- N of Available Arrays
- Consider Dye Bias
SSAI QLD Branch 6 Apr. 2004
10Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Glonek Solomon Factorial and Time Course
Designs for cDNA Microarray Experiments
- Definition
- A design with a total of n slides and design
matrix X is said to be admissible - if there exists no other design with n slides and
design matrix X such that - ci ? ci
- For all i with strict inequality for at least one
i. Where ci and ci are respectively - the diagonal elements of (XX)-1 and (XX)-1.
N of Configurations?
SSAI QLD Branch 6 Apr. 2004
11Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
N of Configurations?
SA-1
SSAI QLD Branch 6 Apr. 2004
12Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
N of Configurations?
Pie-Bald black
Non-Pie-Bald black
Normal
White
Recessive
SA-1 53 125
SSAI QLD Branch 6 Apr. 2004
13Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
SSAI QLD Branch 6 Apr. 2004
14Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
N of Configurations?
0 hr
24 hr
SA-1 109 1 Billion!
SSAI QLD Branch 6 Apr. 2004
15Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Transitivity (Townsend, 2003) Extendability
(Kerr, 2003)
Opt 2 10 Slides
Opt 1 10 Slides
Opt 3 11 Slides
Opt 4 9 Slides
Opt 5 9 Slides
SSAI QLD Branch 6 Apr. 2004
16Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
N of Configurations?
0 hr
24 hr
SA-1 1210 62 Billion!
SSAI QLD Branch 6 Apr. 2004
17Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
N of Configurations?
0 hr
24 hr
R
R
G
G
G
R
R
G
G
R
R
G
G
R
R
G
R
G
R
G
G
R
R
G
SSAI QLD Branch 6 Apr. 2004
18Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Handling Constraints (Samples Arrays)
Pooling Replication
- Pavlidis et al.(2003) The effect of replication
on gene - Expression microarray experiments. Bioinformatics
191620
gt 5 Replicates 10-15 Replicates
- Peng et al.(2003) Statistical implications of
pooling RNA - Samples for microarray experiments. BMC
Bioinformatics 426
Power n9c9 ? 95, n3c3 ? 50, n9c3 ? 90 n25c5
? n20c20
SSAI QLD Branch 6 Apr. 2004
19Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Pooling Replication
R
G
F HS
G
R
R
M TM
R
G
N of Arrays?
F HS
24 23 To 552
R
G
pooling
M HS
G
G
G
G
R
F TM
14 13 To 182
R
R
R
M HS
R
R
G
G
G
F HS
R
G
R
G
M HS
R
G
SSAI QLD Branch 6 Apr. 2004
20Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Pooling Replication
Reference Design
Sum(ABS) 26.8 26.8 39.1 23.1
17.3 7.1 7.1 14.3
14.3
SSAI QLD Branch 6 Apr. 2004
21Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Another (NEW?) Constraint
Amount of RNA
A
M avium slope 18 days 3 3-3-3
M avium broth 18 days 10 1-2-2-1-2-1-2-1-2-1
B
M para broth 10 weeks 5 1-2-2-1-1
C
M para broth 12 weeks 6 1-1-4-5-2-1
D
M para in-vivo 3 1-1-1
E
SSAI QLD Branch 6 Apr. 2004
22Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3. Experimental Designs
Another (NEW?) Constraint
Amount of RNA
?
?
A
B
?
C
A
?
?
Importance due to Transitivity of AB with BC and
BD
D
A
?
?
E
A
?
?
?
?
B
C
?
?
?
B
D
Procedure Five configurations will be proposed
and the statistical optimality of each evaluated.
B
E
?
?
?
C
D
C
E
?
D
E
?
?
?
SSAI QLD Branch 6 Apr. 2004
23Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
24Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Configuration 1
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
25Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Configuration 2
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
26Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Configuration 3
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
27Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Configuration 4
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
28Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Configuration 5
3
3
3
1
2
2
1
2
1
2
1
2
1
1
2
2
1
1
1
1
4
5
2
1
1
1
1
SSAI QLD Branch 6 Apr. 2004
29Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
Imp Weight Squared Error 1 2 3 4
5 1 2 3 4 5 4 6 5 6 6 5 4 1
4 4 1 2 0 2 1 0 0 4 0 1 4
4 2 3 2 2 3 4 1 0 0 1 4 1 0 0
0 0 0 1 1 1 1 1 3 5 5 4 4
5 4 4 1 1 4 4 4 5 5 5 5 0 1
1 1 1 1 0 0 0 0 0 1 1 1 1
1 2 2 0 2 3 2 0 4 0 1 0 1 0 0
0 0 0 1 1 1 1 1 4 3 3 3 3
3 1 1 1 1 1 SSE 17 14 11 16
18 0 1 2 1 0 0 MSE .74 .64 .48 .66 .75
A
B
C
A
D
A
E
A
Conclusion Configuration 3
B
C
B
D
B
E
C
D
C
E
D
E
Noise
D
D
SSAI QLD Branch 6 Apr. 2004
30Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
My (EDUCATED?) View
- Relaxed data acquisition criteria
- Signal to Noise gt 1.00 (relaxer (sp?) exist)
- Mean to Median gt 0.85 (Tran et al. 2002)
- Moving away from
- Ratios
- heavy-duty normalisation techniques
- Mixed-Model Equations
- Check residuals
- Check REML estimates of Variance Components
- Proportion of Total V due to Gene x Variety
- Process results Gene x Treatment
- Mixtures of Distributions
SSAI QLD Branch 6 Apr. 2004
31Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Mixed-Model Equations
Log2 Intensities
Residual (RANDOM)
Gene x Variety (RANDOM)
Comparison Group ArrayBlockDye (FIXED)
Main Gene Effect (RANDOM)
Gene x ArrayBlock (RANDOM)
DE Genes
Gene x Dye (RANDOM)
Note
missing but (generally) unimportant.
SSAI QLD Branch 6 Apr. 2004
32Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Mixed-Model Equations
Log2(Int.) CG Gene G?Dye G?Array
G?Variety Error
Control of FDR
The proportion of the Total Variation accounted
for by the G x Variety Interaction anticipates
the proportion of DE Genes
CLAIM
SSAI QLD Branch 6 Apr. 2004
33Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Observations Comparison Groups
Levels Observations N
Mean SD Min Max Mean Min Max
SSAI QLD Branch 6 Apr. 2004
34Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
- 54 Array Slides
- 959,498 Valid Intensity Records (S2Ngt1,
M2Mgt0.85) - 7,638 Elements (genes)
- 752,476 Equations
- 56 (Co)Variance Components (REML)
- BAYESMIX (Bayesian Mixtures of distributions)
SSAI QLD Branch 6 Apr. 2004
35Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
56 (Co)Variance Components
SSAI QLD Branch 6 Apr. 2004
36Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Total Variance Due to
- Error 3.0 3.6 5.1 6.7 3.0 3.7
- Gene 83.6 90.4 78.3 81.9 47.5
83.9 - Gene x Array 3.5 9.8 10.4 12.6 10.6
43.5 - Gene x Variety 2.4 3.7 2.1 2.6 2.5
5.4
- Genetic Correlations Moderate (EXP3) to Strong
- Gene ? Variety Corr Strong (EXP1) to Moderate
(EXP2)
SSAI QLD Branch 6 Apr. 2004
37Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Measures of (Possible) Differential Expression
i 1, , 7,638 genes j 1, , 7 variables t
0, , 5 time points (EXP3 only)
- Other measure definitions could also be valid
SSAI QLD Branch 6 Apr. 2004
38Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Mixtures of Distributions
SSAI QLD Branch 6 Apr. 2004
39Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Mixtures of Distributions
SSAI QLD Branch 6 Apr. 2004
40Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Differentially Expressed Genes
Exp1 Exp2 Exp3 Up Down
Up Down Up Down High-Low Up 409 0
26 13 36 11 Down 41 3 0
5 0 HOL-JBL Up 68 0
0 8 Down 319 10
6 TSS-UTS Up 252
0 Down 109
10 DE Elements across the 3 Exp (2 UP/DOWN/UP 8
UP/UP/DOWN)
SSAI QLD Branch 6 Apr. 2004
41Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
Residuals Plots
SSAI QLD Branch 6 Apr. 2004
42Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
178 _at_ Day 82
- Homologs
- Orthologs
- Paralogs
Allocation of 238 DE Genes
55
123
43
12
93
30
40
11
42
36
53
36
36
46
36
10
75
41
5
22
5
5
14
5
114 _at_ Day 105
171 _at_ Inguinal
24
26
21
81
27
99
26
39
130
44
25
43
12
12
12
31
12
22
55
16
45
23
71
68
Bovine
Up-Regulated
Ovine
Down-Regulated
139 _at_ Day 120
SSAI QLD Branch 6 Apr. 2004
43Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
4. Data Analysis
The Real Target Molecular Interaction Maps
SSAI QLD Branch 6 Apr. 2004
Adapted from Aladjem et al. 2004, Sciencess STKE
44Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
5. Coverage and Sensitivity
MPSS Paper PNAS 03,
1004702 tpm N Tags gt
1 (0.0) 27,965 100.00 5 (0.7) 15,145
54.16 10 (1.0) 10,519 37.61
50 (1.7) 3,261 11.66 100 (2.0) 1,719
6.15 500 (2.7) 298 1.07
1,000 (3.0) 154 0.55 5,000 (3.7)
26 0.09 10,000 (4.0) 7 0.02
SSAI QLD Branch 6 Apr. 2004
45Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
5. Coverage and Sensitivity
SSAI QLD Branch 6 Apr. 2004
46Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
5. Coverage and Sensitivity
Let NT N of Total Genes ND N of
Differentially Expressed Genes (ND ? NT)
- The relevance of f(xi) is limited to the
Concentration ? Signal mapping. - At equilibrium the probability of an error either
way equals.
SSAI QLD Branch 6 Apr. 2004
47Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
5. Coverage and Sensitivity
5 tpm
100 tpm
SSAI QLD Branch 6 Apr. 2004
48Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
5. Coverage and Sensitivity
? lt ?
? ?
? gt ?
Not many DE genes High Confidence Few False ve
Lots of DE genes High Power Few False -ve
SSAI QLD Branch 6 Apr. 2004
49Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
6. Summary
- General (ie. not only CSIRO LI)
- Still in its infancy (possibly even embryonic
stage) - Many decisions have a heuristic rather than a
theoretical foundation - Prone to miss-conceptions
- Amount of Expression Amount of Response
- Same cut-off point to judge all genes
- Over-emphasis in normalization (hence, despise
Boutique Arrays) - Over-emphasis in variance stabilization
- Over-emphasis in controlling false-positives
- Over-emphasis in biological replicates (DANGER
) - No hope for a One size fits all software (even
method) - Safer to aim towards Tailor to individuals
needs - Integration of interdisciplinary skills is a must
SSAI QLD Branch 6 Apr. 2004
50Design and Analysis of Microarray Experiments at
CSIRO Livestock Industries
6. Summary
- Livestock Species
- Tailing humans (at the moment)
- Andersson Georges (2004) Domestic-animal
genomics Deciphering the genetics of complex
traits. Nature Genetics, March 2004, Vol
5202-212 - Several key advantages
- More relaxed ethical issues (relative to RD in
humans) - Very strong similarities at the genome level with
humans - The genome is (being) sequenced for several
species - Strong background knowledge of genetics
accumulated - Quantitative genetics
- Mixed-Model equations
- Computing expertise
- Journals will soon be inundated
- We have the opportunity to participate
SSAI QLD Branch 6 Apr. 2004