Title: Design Issues in cDNA Microarray Analysis by Yang and Speed
1Design Issues in cDNA Microarray Analysisby Yang
and Speed
- Jim Booth
- UF Genomics Discussion Group
- Feb. 5, 2003
2cDNA Microarrays
- Relative expression of up to 20,000 genes
(probes) - in two mRNA samples.
- Samples (targets) labeled with own fluorophore
- Cy3 (green) or Cy5 (red).
- After competitive hybridization, ratio of
red/green - intensities measures relative abundance of DNA
probes - Membrane and Affy arrays measure gene expression
- in samples separately.
3General Design Issues
- Most efficient use of resources
- Minimize sample size
- Eliminate potential biases
- Answer primary question of interest
4Split Plot Experiment
A
B
A
B
B
A
B
A
1
2
3
4
- Each cDNA array is a split plot.
- Between plot/slide variability greater than
- within.
5Design Issues
- Which mRNA samples (targets) are to be
- labeled with which fluor?
- Which are to be hybridized on the same slide?
Practical Issues
- Type of mRNA samples reference, control,
treatment - Amount of mRNA available.
- Number of slides.
6Multi-digraphs
A
B
Box 1a A and B samples hybridized together. 5
replicates. Green-labeled sample at tail.
5
A
Box 1b Direct estimate of AvB abundance more
accurate than indirect.
B
C
7One design choice example 1
Samples Liver tissue from mice treated
with cholesterol-modifying drugs from
untreated (control) mice. Question Which genes
have differential expression in treated and
untreated mice?
T1
T2
T3
Ctr
8One design choice example 2
Samples Tissue from different tumors Question
What are the tumor subtypes?
Ref
9Direct v. Indirect Designs
Fig 1a Direct estimation of Log
Expression Ratio
T
C
LER
Var(LER)
Fig 1b Indirect estimation
C
T
LER
R
Var(LER)
10Designs with Replication
Dye-swap experiments
- Systematic differences in red/green intensities
- Normalization unlikely to remove bias for all
genes - simultaneously
- Two hybridizations for each target pair
- Dye assignment reversed in second hybridization
T
Without dye-swap
LER
With dye-swap
LER
C
11Single-factor Designs
Refer to Table 1
Design I A B C R
- 3 slides
- 3 samples (ABC1)
- average variance 2.00
Design II A B C R
- 6 slides
- 6 samples (ABC2)
- average variance 1.00
2
2
2
Design III A C B
- 3 slides
- 6 samples (ABC2)
- average variance 0.67
12Loop designs
- Design III is not feasible for larger numbers of
- samples
- e.g. if n6 there are 15 pairs and each sample
is - co-hybridized on 5 arrays.
- Loop designs
- Direct
AvB comparison - Indirect
AvC comparison - Are some comparisons more important than others?
A
B
D
C
13Time-course experiments
Refer to Table 2
Design I T1 as common reference T1 T2
T3 T4 average variance 1.5
Design II direct sequential T1 T2 T3
T4 average variance 1.67
Design V direct loop T1 T2 T3
T4 average variance 0.83
14Multi-factorial Experiments
- Study differential expression that results from
joint - effect of two or more factors
- Interaction joint effect not the sum of
separate effects - 2x2 factorial experiment.
- e.g. two ways of treating a cell line A and
B. - Let C denote mRNA derived from untreated cells
Factor 2 Factor 1
Untreated Treated Untreated
C B Treated A
AB
152 x 2 Factorial Experiment 1
Refer to Table 3
Design I Effect
Variance A B AB main A
0.50
main B 0.50 C
interaction 1.50
Interaction
162 x 2 Factorial Experiment 2
Refer to Table 3
Design III Effect
Variance C A main
A 0.67
main B 0.43 B
AB interaction 0.67
Interaction
17Variability and Replication 1
- Spot replicates replicate cDNA probes on array
- Technical replicates replicate hybridizations
using - target mRNA from the same pool
- Biological replicates replicate hybridizations
using - mRNA from different extractions
- e.g. different samples of cells from the same
tissue
18Variability and Replication 2
- Averaging over replicates reduces variability!
- Let denote the log expression ratio from
- replicate i, i1,n. Then
where
19Power Analysis
Power calculation (to determine sample size)
requires
- Variance of individual log ratios,
- Magnitude of effect to be detected.
- Acceptable false-positive rate.
- e.g. How many hybridizations required to have a
90 - chance of detecting a two-fold change?
- Variance, , varies across genes
- Use median or upper quartile based on previous
- experiments!