Title: Introduction to the design of cDNA microarray experiments
1Introduction to the design of cDNA microarray
experiments
- Statistics 246, Spring 2002
- Week 9, Lecture 1
- Yee Hwa Yang
2Some aspects of design
- Layout of the array
- Which cDNA sequence to print?
- Library
- Controls
- Spatial position
- Allocation of samples to the slides
- Different design layout
- A vs B Treatment vs control
- Multiple treatments
- Time series
- Factorial
- Replication
- number of hybridizations
- use of dye swap in replication
- Different types replicates (e.g pooled vs
unpooled material (samples)) - Other considerations
- Physical limitations the number of slides and
the amount of material - Extensibility - linking
3Issues that affect design of array experiments
- Scientific
- Aim of the experiment
- Specific questions and priorities between them.
- How will the experiments answer the questions
posed? - Practical (Logistic)
- Types of mRNA samples
- reference, control, treatment 1, etc.
- Amount of material.
- Count the amount of mRNA involved in one channel
of a hybridization as one unit. - Number of slides available for experiment.
- Other Information
- The experimental process prior to hybridization
- sample isolation, mRNA extraction,
amplification , labelling. - Controls planned
- positive, negative, ratio, etc.
- Verification method
- Northern, RT-PCR, in situ hybridization, etc.
4Graphical representation
5Natural design choice
C
- Case 1 Meaningful biological control (C)
- Samples Liver tissue from four mice treated by
cholesterol modifying drugs. - Question 1 Genes that respond differently
between the T and the C. - Question 2 Genes that responded similarly across
two or more treatments relative to control. - Case 2 Use of universal reference.
- Samples Different tumor samples.
- Question To discover tumor subtypes.
6Direct vs Indirect
- Two samples
- e.g. KO vs. WT or mutant vs. WT
Indirect
Direct
T
Ref
T
C
C
average (log (T/C))
log (T / Ref) log (C / Ref )
?2 /2
2?2
7One-way layout one factor, k levels
I) Common Reference II) Common reference III ) Direct comparison
Number of Slides N 3 N6 N3
Ave. variance 2 0.67
Units of material A B C 1 A B C 2 A B C 2
Ave. variance 1 0.67
All pair-wise comparisons are of equal importance
8Dye-swap
A
A
C
B
C
B
Design B2
Design B1
- - Design B1 and B2 have the same average variance
- - The direction of arrows potentially affects the
bias - of the estimate but not the variance
- For k 3, efficiency ratio (Design A1 / Design
B) 3 - In general, efficiency ratio (2k) / (k-1)
9Design how we sliced up the bulb
A
D
P
L
V
M
10Multiple direct comparisons between different
samples (no common reference) Different ways of
estimating the same contrast e.g. A compared to
P Direct log(A/P) Indirect log(A/M)
log((M/P) or log(A/D)
log(D/P) or log(A/L)
log((P/L)
D
A
M
L
P
V
How do we combine these?
11Linear model analysis
Define a matrix X so that E(Y)Xb a log(A),
plog(P), dlog(D), vlog(V), mlog(M), llog(L)
12Time Series
T2
T4
T5
T6
T7
T3
T1
Ref
- Possible designs
- All sample vs common pooled reference
- All sample vs time 0
- Direct hybridization between times.
Pooled reference
Compare to T1
t vs t1
t vs t2
t vs t3
13Design choices in time series Design choices in time series t vs t1 t vs t1 t vs t1 t vs t2 t vs t2
Design choices in time series Design choices in time series T1T2 T2T3 T3T4 T1T3 T2T4 T1T4 Ave
N3 A) T1 as common reference 1 2 2 1 2 1 1.5
N3 B) Direct Hybridization 1 1 1 2 2 3 1.67
N4 C) Common reference 2 2 2 2 2 2 2
N4 D) T1 as common ref more .67 .67 1.67 .67 1.67 1 1.06
N4 E) Direct hybridization choice 1 .75 .75 .75 1 1 .75 .83
N4 F) Direct Hybridization choice 2 1 .75 1 .75 .75 .75 .83
142 by 2 factorial two factors, each with two
levels
- Example 1 Suppose we wish to study the joint
effect of two drugs, A and B. - 4 possible treatment combinations
- C No treatment
- A drug A only.
- B drug B only.
- A.B both drug A and B.
- Example 2 Our interest in comparing two strain
of mice (mutant and wild-type) at two different
times, postnatal and adult. - 4 possible samples
- C WT at postnatal
- A WT at adult (effect of time only)
- B MT at postnatal (effect of the mutation only)
- A.B MT at adult (effect of both time and the
mutation).
15Factorial design
m
ma
Different ways of estimating parameters. e.g. B
effect. 1 (m b) - (m) b 2 - 5 ((m
a) - (m)) -((m a)-(m b)) (a) - (a b)
b
2
A
C
4
1
3
5
6
B
AB
mb
mabab
16Factorial design
m
ma
mabab
mb
172 x 2 factorial
Indirect A balance of direct and indirect A balance of direct and indirect A balance of direct and indirect
I) II) III) IV)
Slides N 6 N 6 N 6 N 6
Main effect A 0.5 0.67 0.5 NA
Main effect B 0.5 0.43 0.5 0.3
Interaction A.B 1.5 0.67 1 0.67
Table entry variance
18Linear model analysis
Define a matrix X so that E(Y)Xb Use least
squares estimate for a, b, ab
19y1 log (A / C) a
y2 log (B / C) b
y3 log (AB / C) a b ab
- Common reference approach
- Estimate (ab) with y3 - y2 - y1
202 x 2 factorial
Indirect A balance of direct and indirect A balance of direct and indirect A balance of direct and indirect
I) II) III) IV)
Slides N 6 N 6 N 6 N 6
Main effect A 0.5 0.67 0.5 NA
Main effect B 0.5 0.43 0.5 0.3
Interaction A.B 1.5 0.67 1 0.67
Table entry variance
21More general n by m factorial experiment
- 2 factors, one with n levels and the other with m
levels - OE experiment (2 by 2)
- interested in difference between zones, age
and also zone.age interaction. - Further experiment (2 by 3)
- only interested in genes where difference
between treatment and controls changes with
time.
treatment
control
control
treatment
0 12 24
0 12 24
22 WT.P21 ? a1 a2
WT.P11 ? a1
WT P1 ?
2
5
7
4
1
MT.P21 ? (a1 a2) b (a1 a2)b
MT.P1 ? b
MT.P11 ? a1ba1.b
3
6
23Replication
- Why replicate slides
- Provides a better estimate of the log-ratios
- Essential to estimate the variance of log-ratios
- Different types of replicates
- Technical replicates
- Within slide vs between slides
- Biological replicates
24Sample size
Apo A1 Data Set
25Technical replication - labelling
- 3 sets of self self hybridization (cerebellum
vs cerebellum) - Data 1 and Data 2 were labeled together and
hybridized on two slides separately. - Data 3 were labeled separately.
Data 3
Data 2
Data 1
Data 1
26(No Transcript)
27- Technical replication - amplification
- Olfactory bulb experiment
- 3 sets of Anterior vs Dorsal performed on
different days - 10 and 12 were from the same RNA isolation and
amplification - 12 and 18 were from different dissections and
amplifications - All 3 data sets were labeled separately before
hybridization
28T1
T2
Replicate Design 1
amplification
1 2 3 4
T1
amplification
Replicate Design 2
amplification
T2
1 2 3 4
amplification
Amplified samples
Original samples
29M1 Lc.MT.P1 ?
M2 Lc.WT.P11 ? ?1
M3 Lc.WT.P21 ? (?1 ?2)
M4 Lc.MT.P1 ? ?
M5 Lc.MT.P11 ? ?1 ? ?1 ?
M6 Lc.MT.P21 ? (?1 ?2) ? (?1 ?2)?
- Common reference approach
- Estimate (?1.?) with M5 M4 - M2 M1
- Estimate (?1 ?2).? with M6 M4 M3 M1