Design of microarray gene expression profiling experiments - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Design of microarray gene expression profiling experiments

Description:

Calculate your power: What is the lower border of the effect size that you can pick up? ... The best choice of design depends on the comparisons of interest and on the ... – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 53
Provided by: tho56
Category:

less

Transcript and Presenter's Notes

Title: Design of microarray gene expression profiling experiments


1
Design of microarray gene expression profiling
experiments
  • Peter-Bram t Hoen

2
Lay-out
  • Practical considerations
  • Pooling
  • Randomization
  • One-color vs Two-colors
  • Two-color hybridization designs
  • Ratio-based vs Intensity-based analysis

3
Think before you start
  • research question
  • choice of technology
  • controls and replicates
  • Ref Churchill. 2002. Nature Genetics Supplement
    32 490-495

4
Research question
  • Limit your (initial) number of question /
    conditions
  • choose best timepoint for mRNA regulation
  • can be different from protein/activity
  • pilots using RT-qPCR
  • experimental follow-up
  • what will you do with the data?
  • verification of differential gene expression
  • in vitro experiments to study mechanism
  • "in vivo" verification in tissue sections

5
Choice of technology
  • What is affordable?
  • Do a pilot to estimate the variance for your
    samples, experimental set-up and platform
  • Calculate your power What is the lower border of
    the effect size that you can pick up?

6
Controls
  • positive genes whose regulation is known
  • check on biological experiment data analysis
  • positive spikes in mRNA and/or hyb mix
  • check labeling procedure and hybridization
  • detection range (sensitivity) and dynamic range
  • "landing lights" for gridding software
  • negative controls non-specific binding
  • check cross-hybridization buffer, non-homologous
    DNA

7
Spikes
Spiked 2-fold change (copies/cell)
Spiked 3-fold change (copies/cell)
RCA
Cab
rbcL
LTP4
LTP6
XCP2
RPC1
NAC1
TIM
PRK
2 1
10 5
60 30
100 50
300 150
3 1
15 5
60 20
150 50
300 100
cDNA probe synth. hybridize
8
Spikes
Van de Peppel et al. EMBO Reports 4, 387 (2003)
9
Controls
  • positive genes whose regulation is known
  • check on biological experiment data analysis
  • positive spikes in mRNA and/or hyb mix
  • check labeling procedure and hybridization
  • detection range (sensitivity) and dynamic range
  • "landing lights" for gridding software
  • negative controls non-specific binding
  • check cross-hybridization buffer, non-homologous
    DNA

10
Replicates
  • Include sufficient replicates, based on pilot
    experiment
  • Biological replicates are preferred over
    technical replicates
  • Control experimental variables with possible
    unintended effects
  • genetic background
  • gender
  • age

11
Randomization
  • Randomize samples with respect to experimental
    influences
  • experimenter
  • day of hybridization
  • batch of arrays
  • dye
  • etc

12
Pooling
  • Often done because of lack of sufficient amounts
    of RNA, but good amplification protocols are
    available
  • Advantages
  • dampening of individual variation, may increase
    statistical power
  • Generally not recommended
  • outliers in the population may result in large
    and significant effects
  • information on the differences in the population
    is lost and is probably biologically relevant
  • in fact, it is an artificial way to increase the
    significance of your findings

13
Hybridization design
  • One color not many difficulties expected
  • Two color what to hybridize with what in which
    color?
  • Reference design
  • Paired design
  • Loop design
  • Mixed design
  • Read Yang Speed (2002). Design issues for cDNA
    microarray experiments. Nature Reviews Genetics
    3, 579-588

14
Hybridization design general issues
  • Comparisons on the same array are more precise
    than comparisons on different arrays
  • Identify most important comparisons
  • Hybridize those on the same slide
  • Dye swap
  • A dye-effect is always there
  • Balance designs with respect to dye (exception
    some common reference designs)

15
Common reference vs direct hybridizations
  • Direct
  • Common reference

Variance log(A/B) for slide s2 then the
variance of the average of the two measurements
is s2 /2
B
A
A
log(A/B) log(A/R) log(B/R) and variance of
log(A/B) is variance log(A/R) variance
log(B/R) s2 s2 2 s2
R
B
16
More samples
  • Loop Reference
  • 6 arrays

A
A
R
B
B
C
C
Log (A/B) 2/3 log (A/B) 1/3 log (A/C) log
(B/C) Assuming that all variances are
equal Variance log(A/B) 4/9 (s2 / 2) 1/9
(s2) 1/3 s2
Variance log(A/B) Variance log(A/C)
Variance log(B/C) 0.5s2 0.5s2 s2
17
Common reference vs direct hybridizations
  • Theoretical Considerations
  • A design is optimal when it minimizes the
    variance of the effect of interest
  • Look for designs leading to small variance of
    log(A/B)
  • Practical considerations
  • Common reference may be desired when experiment
    is extended in the future or when a lot of
    different conditions have to be compared
  • Choose a biologically relevant common reference
    (say your control sample). In that case, your
    ratios are of interest and better interpretable

18
Time-course designs
  • Take 4 time points
  • T1 T2 T3 T4
  • The best choice of design depends on the
    comparisons of interest and on the number of
    slides available

19
Time-course designs
  • Using 3 slides
  • T1 T2 T3 T4
  • which is the best to estimate changes relative
    to the initial time point T2 / T1, T3 / T1,
    T4 / T1

20
Time-course designs
  • Using 3 slides
  • T1 T2 T3 T4
  • which is the best to estimate relative changes
    between successive time points T2 / T1, T3 /
    T2, T4 / T3

21
Time course designs
  • Using 4 slides
  • T1 T2 T3 T4
  • R
  • which is the reference design
  • All comparisons have equal precision

22
Time course design
  • Using 4 slides
  • T1 T2 T3 T4
  • which is the loop design, balanced wrt dye
  • Distant comparisons have lower precision

23
Time course designs
  • Using 4 slides
  • T1 T2 T3 T4
  • also uses exactly 2 hybridizations per treatment,
  • balanced wrt dye.
  • Most precise estimates 1/2, 1/3, 2/4, 3/4

24
Factorial designs
  • Designs for studies which involve factors as
    explanatory variables
  • Age group
  • gender
  • Cell line
  • Tumor types

25
Factorial designs
  • Glonek Solomon (2004)
  • Admissible design using the same number of
    arrays, there are no other designs yielding
    smaller variances of all parameters

Glonek et al.Biostatistics 5, 89-111 (2004)
26
Factorial design example
  • Time
  • 0h
  • 24h
  • Cell lines
  • I (non-leukaemic)
  • II (leukaemic)
  • Find genes diff. expressed at 24 but not at 0
    interaction between time and cell line

27
Factorial design possible samples
  • All combinations of factor levels. In this case,
    4 are possible

28
Factorial design analysis model
  • (log-)linear model is used
  • experimental conditions correspond to parameter
    combinations as in

29
Factorial design possible arrays
(2)
I,24
I,0
(3)
(6)
(4)
(1)
II,0
II,24
(5)
30
Optimal admissible design
  • Designs that are not worse than others, and for
    which the variance of the parameter of interest
    is (one of the) smallest
  • In the example wish to find admissible designs
    for which the interaction term has one of the
    smallest variances

31
Glonek et al.Biostatistics 5, 89-111 (2004)
32
Optimal admissible design
Glonek et al.Biostatistics 5, 89-111 (2004)
33
Factorial designs conclusions
  • Design with all pairwise comparisons is not the
    best in this case
  • Best design can only be found with respect to a
    model
  • if model does not fit the data well, design
    choice may not be the best
  • make sure model chosen is adequate

34
How to compare efficiently many different
conditions?
  • Common reference not efficient
  • Loop and mixed designs not all
  • comparisons have equal precisions

GA Churchill, Nat Genet. 2002 Dec32 Suppl490-5
35
Possible solution
  • Randomized design
  • Intensity-based rather than ratio-based
    calculations
  • Requires
  • Hybridization of two samples independent no
    competition for binding sites
  • Absence of large spot and array effects
  • To be tested for each platform

36
Our favourite platform
  • Spotted collection of 65-mer oligonucleotides
    (Sigma-Compugen collection)
  • 22K

37
Design used to demonstrate independent hyb
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
38
Distribution of signal intensities is similar
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
39
Correlation of intensities is high
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
40
Effect of addition of unlabelled target
Two targets on microarray
Single target on microarray
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
41
Correlation of ratios calculated from different
hyb designs
t Hoen et al. Nucleic Acids Res. 32e41 (2004)
42
Intensity-based analysis
  • Hybridizations of two targets on the array are
    independent
  • No saturation and no competition
  • Intensity readings show high inter-array
    correlation
  • Comparisons on the same array have highest
    precision and all other comparisons have equal
    precision

t Hoen et al. Nucleic Acids Res. 32e41 (2004)
43
Example of randomized design
  • Mouse models for muscular dystrophy

Turk et al. FASEB J 20, 127-129 (2006)
44
Our design
  • Randomly assign samples to the arrays, avoiding
    co-hybridization of sample from the same group
  • 2 biological replicates
  • 4 technical replicates (dye-swap replicate
    spotting)

Turk et al. FASEB J 20, 127-129 (2006)
45
Intensity-based analysis can go wrong
Vinciotti et al. Bioinformatics 21492-501 (2005)
46
Intensity-based analysis can go wrong
Vinciotti et al. Bioinformatics 21492-501 (2005)
47
Some guidelines
  • First determine the main question, pointing out
    the effect of interest
  • logA/B
  • Then choose analysis model, so that effect
    variance can be computed
  • VAR logA/B
  • Practical constraints amount of RNA available,
    number of hybridizations, number of slides
  • A good design measures the effect of interest as
    accurately as possible
  • small VAR logA/B

48
Some useful links
  • http//dial.liacs.nl/Courses/CMSB20Courses.html
  • http//www.brc.dcs.gla.ac.uk/rb106x/microarray_ti
    ps.htm
  • http//exgen.ma.umist.ac.uk/course/notes/WitDesign
    Lecture.pdf
  • http//discover.nci.nih.gov/microarrayAnalysis/Exp
    erimental.Design.jsp

49
Acknowledgements
Human and Clinical Genetics, LUMC Judith
Boer Renée de Menezes Rolf Turk Ellen
Sterrenburg Johan den Dunnen Gertjan van
Ommen Microarray facility Leiden Genome
Technology Center
50
Case study
  • Two genetically-modified zebrafish strains and
    one wild-type
  • Defects mainly in muscle development
  • Apparent at 12-48 hours of development early
    death
  • Question which biological pathways are affected
    and responsible for defective myogenesis?

51
Possible platforms and budget
  • Affymetrix (1-color) 500 euro per chip
  • variance for ratio of two samples on two chips
    s2
  • Homespotted arrays (2-color) 100 euro per chip
  • variance for ratio of two samples on one chip
    2s2
  • Budget 12,000 euro

52
Questions
  • Isolation of specific compartments / whole animal
    lysates?
  • Pooling?
  • How many replicates?
  • Which hybridization design?
  • What is the variance of the most important
    comparisons?
Write a Comment
User Comments (0)
About PowerShow.com