Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA PowerPoint PPT Presentation

presentation player overlay
1 / 34
About This Presentation
Transcript and Presenter's Notes

Title: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA


1
Statistical tests for differential expression in
cDNA microarray experiments (2) ANOVA
  • Xiangqin Cui and Gary A. ChurchillGenome Biology
    2003, 4210

Presented by M. Carme Ruíz de Villa and Alex
Sánchez Departament dEstadística U.B.
2
Introduction
3
Remember
  • We want to measure how gene expression changes
    under different conditions.
  • Only two conditions and an adequate number of
    replicates ? t-tests extensions
  • More than two conditions / more than one factor
    several approaches
  • Analysis of Variance (ANOVA) (Churchill et al.)
  • Linear Models (Smyth, Speed, )

4
Sources of variation (1)
  • We want to determine when the variation due to
    gene expression is significant, but
  • There are multiple sources of variation in
    measurements besides just gene expression.
  • We want to know when the variation in
    measurements is caused by
  • varying levels of gene expression
  • versus other factors.

5
Sources of variation (2)
  • Some sources of variation in the measurements in
    microarray experiments are
  • Array effects
  • Dye effects
  • Variety effects
  • Gene effects
  • Combinations

6
Relative expression values
  • If more than two conditions ? we cannot simply
    compute ratios
  • ANOVA modelling yields estimates of the relative
    expression for each gene in each sample
  • The ANOVA model is not based on log ratios.
    Rather it is applied directly to intensity data.
    However the difference between two relative
    expression values can be interpreted as the mean
    log ratio for comparing two samples.

7
Technical biological replicates
  • If inference is being made on the basis of
    biological replicates
  • and there is also technical replication ?
  • technical replicates should be averaged
  • to yield a single value
  • for each independent biological unit.

8
Derived data sets
  • The set of estimated relative expression values,
    one for each gene in each RNA sample, is a
    derived data set that may be subject to a second
    level of analysis.
  • The derived data can be analyzed on a gene by
    gene basis using standard ANOVA methods to test
    for differences among conditions. (Oleksiak et
    al. 28)

9
Review of ANOVA models
10
One way ANOVA
  • Suppose you have a model for each measurement in
    your experiment
  • yij is jth measurement for ith group.
  • µ overall mean effect (constant)
  • ai ith group effect (constant)
  • eij experimental error term N(0,s2)
  • Therefore, observations from group i are
    distributed with mean µ ai and variance s2 .

11
Hypothesis Testing
Overall variability
Within group variability
Between group variability
Intuition if between group variability is large
compared to within group variability then the
differences between means is significant.
12
Sum of Squares
  • Total sum of squares
  • Within Sum of Squares
  • Between Sum of Squares

13
Mean Sum of Squares
  • Between MS Between SS/(k-1)
  • Within MS Within SS/(n-k)
  • F Between MS / Within SS
  • It is summarized in the ANOVA table
  • Example 1

14
Multiple Factor ANOVA
  • The model can be extended by adding more
  • Factors (?, ?, )
  • Interactions between them (??, )
  • Other
  • This is used to model the different sources of
    variation appearing in microarray experiments

15
Experiment 1 Latin Square
16
Random effects models
  • If the k factor levels can be considered a random
    sample of a population of factors we have a
    random effect
  • ANOVA model Yij ? Ai eij,
  • ? overall mean,
  • Ai is a random variable instead of a constanty,
  • eij experimental error.
  • E(Ai)0, E(eij)0, var(Ai)?A2, var(eij) ?2, Ai
    i eij independent? var(Yij) ?A2 ?2.

17
Where to find more
  • Draghici, S. (2003). ANOVA chapter (7) Data
    analysis tools for microarrays Wiley
  • Pavlidis, P. (2003) Using ANOVA for gene
    selection from microarray studies of the nervous
    systemhttp//microarray.cpmc.columbia.edu/pavlidi
    s/ doc/reprints/anova-methods.pdf

18
ANOVA Models for Microarray Data
19
Kerr Churchills model
  • yijkg ? expression measurement from the ith
    array, jth dye, kth variety, and gth gene.
  • µ ? average expression over all spots.
  • Ai ? effect of the ith array.
  • Dj ? effect of the jth dye.
  • Vk ? effect of the kth variety (treatment,
    sample, )
  • Gg ? effect of the gth gene.
  • (AG)ig ? effect of the ith array and gth gene.
  • (VG)kg ? effect of the kth variety and gth gene.
  • ?ijkg independent and identically distributed
    error terms.

20
Interpreting main effects
  • A differences in fluorescent signal from array
    to array (e.g. if arrays are probed under
    inconsistent conditions that increase or reduce
    hybridization of labeled cDNA)
  • D differences between two dye fluorescent labels
    (one dye may consistently be brighter than the
    other)
  • G differences in fluorescence for equally
    expressed genes.
  • V differences of expression level between
    different varieties (samples, tumour types,..).

21
Interpreting interactions
  • DV If for a particular variety labelling is
    produced in separate runs of the process ?
    Differences in the runs can produce pools of cDNA
    of varying concentrations or quality.
  • AG (Spot effect) Spots for a given gene on the
    different arrays vary in the amount of cDNA
    available for hybridization.
  • DG if there are differences in the dyes that are
    gene-specific
  • VG reflects differences in expression for
    particular variety and gene combinations that are
    not explained by the average effects of these
    varieties and genes.THIS IS THE QUANTITY OF
    INTEREST !!!

22
Normalization
  • A,D,V terms effectively normalize the data, thus
    the normalization process is integrated with the
    data analysis.
  • This approach has several benefits (?)
  • The normalization is based on a clearly stated
    set of assumptions
  • It systematically estimates normalization
    parameters based on all the data
  • The model can be generalized to the situation
    where genes are spotted multiple times on each
    array rather

23
Statistically Significant Effects
  • Array, Dye , Variety Gene effect
  • Goal To estimate their value.
  • Need not assess their significance
  • Sometimes dont appear (gene-level model)
  • Array x Gene, Variety x Gene effects
  • May or not be present
  • Goal To assess their significance
  • Mean effect 0 if fixed
  • Effect variance 0 if random

24
Test statistics The 3 Fs
  • Hypothesis testing involves the comparison of two
    models.
  • In this setting we consider a
  • null model of no differential expression (all VG
    0) and
  • an alternative model with differential expression
    among the conditions (some VG are not equal to
    zero).
  • F statistics are computed on a gene-by-gene
    basis based on the residual sums of squares from
    fitting each of these models.

25
Example 1
  • A gene, which is believed to be related to
    ovarian cancer is investigated
  • The cancer is sub-classified in 3 cathegories
    (stages) I, II, III-IV
  • 15 samples, 3 per stage are available
  • They are labelled with 3 colors and hybridized on
    a 4 channel cDNA array (1 channel empty)(A
    seemingly more reasonable procedure double
    dye-swap reference design)

































26
Example 1. Normalized Data
27
Example 1 ANOVA table (1)
If arrays are homogeneous The appropriate model
is 1 factor ANOVA
28
Example (1) Blocking
If arrays are not homogeneous? the appropriate
model is 2 factor ANOVA (1 new block factor for
arrays)
29
Example 2 CAMDA kidney dataftp//ftp.camda.duke.
edu/CAMDA02_DATASETS/papers/README_normal.html
  • 6 mouse kidney samples
  • (suppose 6 different treatments)
  • Compared to a common reference in a double
    reference design
  • Dye swap
  • Replicate arrays

2
30
2.1. The ANOVA model
  • Work only at the gene level no main effects (A,
    D, V, G) as defined
  • YijkDGiAGjVGk?ijk
  • i1,2 (dyes)
  • j1,2 (array)
  • K1,,6 (sample)

31
Example 3 A 2 factor design Diet X Strain
32
3.2. Design
33
3.3. The ANOVA model
  • YijkDGiAGjStrainlDietm StrainDietlm
    VGk?ijklm
  • i1,,2 (dyes)
  • j1,,2 (array)
  • k1,,12 (sample)
  • l 1,,3 (strain)
  • m 1,...,2 (diet)

34
3.4 Sample R code (1)
  • data(paigen)
  • paigen lt- createData(rawdata, 2)
  • model.full.fix lt- makeModel (data
    paigen,formulaDGAGSG StrainDietStrainDie
    t)
  • anova.full.fix lt-fitmaanova (paigen,
    model.full.fix)
  • model.noint.fix lt- makeModel (data
    paigen,ormulaDGAGSGStrainDiet)
  • anova.noint.fix lt- fitmaanova(paigen,
    model.noint.fix)

35
3.4. Sample R code (2)
  • permutation tests
  • test for interaction effect
  • test.int.fix lt- ftest(paigen, model.full.fix,
    model.noint.fix, n.perm500) idx.int.fix lt-
    volcano(anova.full.fix, test.int.fix,
    title"Int. test")
Write a Comment
User Comments (0)
About PowerShow.com