Supervised microarray data analysis - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Supervised microarray data analysis

Description:

Perform a small scale, well-controlled experiment to assess influence of ... Specialized commercial software: Spotfire, Genespring, Genesight, Rosetta ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 15
Provided by: markvan2
Category:

less

Transcript and Presenter's Notes

Title: Supervised microarray data analysis


1
Supervised microarray data analysis
  • Mark van de Wiel

2
Quality control
  • Protocols
  • Perform a small scale, well-controlled experiment
    to assess influence of experimental factors
    (Microarrays from different batches, printing
    tips, dyes, linearity of the scanner, etc.)
  • Continuous factors (temperature, humidity,
    spotsize over time, intensity of control spot
    over time) can be monitored with standard control
    chart techniques.

3
(No Transcript)
4
Design of the experiment
  • Think very, very well what the biological goals
    are.
  • What software do you have at your disposal to
    analyse the data?
  • Do we need reference or not?
  • Biological design what tissues to combine on
    an array (cDNA)? More than one biological factor
    factorial design
  • Dye-bias dye-swap.
  • Design on the array (negative/positive controls,
    repeats?, how many genes? Pilot study first,
    distributing the repeats over experimental
    factors (spatial, printing tips, etc.))
  • Save some space on the (cDNA) microarray for
    assessing variability due to experimental factors
    (e.g. print same control gene with several
    printing tips)

5
Analysis Multiple testing (after normalization)
  • Objective control the number of falsely selected
    genes
  • FWE Family wise error rate
  • Weak FWE control

    P(falsely select gene i, i1, ..., 20.000
    no gene truly expressed) ? ?
  • Strong FWE control

    P(falsely select gene i, i1, ..., 20.000
    some genes expressed, some genes not expressed) ?
    ?
  • FDR False Discovery Rate
  • F Expected number of false rejections when no
    genes are expressed, T Total number of
    rejections
  • FDR control F/T ? ?

6
Multiple testing FWE vs FDR
  • Control of FDR implies weak control of FWE
  • Advantage strong control of the FWE
    significance level ? under all situations
    controlled
  • Disadvantage less power than FDR control
  • FWE based procedures tend to select less genes
    than FDR based procedure
  • Software
  • Bioconductor Step-down Westfall-Young (Dudoit
    et al.), control FDR and FWE.
  • SAM (permutation based control of FDR)

7
(No Transcript)
8
SAM
  • Developed at Stanford, Tibshirani et al. (Paper
    Tusher et al, PNAS 98, 5116-5121) Claim is
    FDR-control
  • Plus
  • Ease of use, add-in to Excel
  • Allows asymmetric cut-offs
  • Minus
  • Distribution under the null-hypotheses (no
    expression) needs to be the same for all genes
    to guarantee FDR control
  • Combination with k-fold rule no control of FDR
    anymore
  • Solutions Use (normal) rank scores and a simple
    rank statistic
  • Explicitly test on k-fold expression
    combine with FDR criterion

9
Modelling vs Normalisation Testing
  • Modelling forces you to state what the
    assumptions are (linearity, normality,
    independence, etc.)
  • Normalisation steps may not be commutative
  • Non-linearities can be dealt with by
    normalisation methods
  • Advanced modelling requires help of
    statistician/bio-informatician
  • Standard approach to modelling ANOVA. Model has
    two levels
  • Normalisation level which includes linear
    corrections for dye and microarray effects
  • Gene expression level which includes effects on
    gene level, including interactions (interaction
    of interest is usually genevariety)

10
Software
  • Freeware SAM, Bioconductor
  • Specialized commercial software Spotfire,
    Genespring, Genesight, Rosetta
  • Most contain normalisation, variance stabilizing
    transformations, ANOVA, testing (most do not yet
    include the advanced multiple testing criteria)
  • Statistical software SAS, S-Plus, SPSS
  • Much more debugged, long history, better
    documentation (Often very unclear what the
    specialized packages really do.)
  • Advantages specialized software user-friendly,
    visualisation (nice pictures), link with data
    bases, annotation
  • Try several!!!

11
Bayesian models
  • Natural translation to networks (pathways)
  • Complex models (linearity is not necessary,
    interactions)
  • Prior biological knowledge can be included
  • Nesting of the models (image analysis
    normalisation gene expression)
  • Inference for complex functions of gene
    expression data is relatively easy
  • No easy software
  • Computational methods may take time to find
    reliable estimates
  • Example Network

12
Validation
  • Cross-validation leave some data out and see how
    well the data values are predicted by the model
    (Note that for normalisation procedures it may be
    harder to predict the data from the normalized
    data)
  • Biological validation (spikes known
    concentrations)
  • Very useful for validating the normalisation
    procedure or the model
  • Pretend that spikes with equal concentrations
    that are used under different conditions
    (different dyes, microarray batch)are different
    quantities.
  • Estimate ratio of two estimates after
    normalisation or modelling
  • Ratio should approximately be equal to 1.

13
Comparison and meta analysis
  • Objective comparisons between methods very much
    needed!
  • Simulations may help (because we know the truth
    then). Setting up realistic simulations may be
    hard!
  • Competition between several methods (CAMDA 03
    Lung cancer)
  • Future goals
  • Methods that allow for combining data from
    several experiments.
  • From relative quantities to absolute quantities.
  • Absolute quantities allow for direct comparison
    between labs. (otherwise, only if labs have used
    same reference material etc.)

14
Useful overview papers, books
  • Design Churchill, G.A. (2002) Fundamental of
    experimental design for cDNA microarrays. Nature
    Genet.32 (490-495)
  • Analysis Slonim, D.K. (2002) From patterns to
    pathways gene expression data analysis comes of
    age Nature Genet.32 (502-508)
  • Normalisation Quackenbush, J. (2002) Microarray
    normalisation and transformation Nature Genet.32
    (496-501)
  • Pitfalls Richard Simon et al. (2003) Pitfalls in
    the Use of DNA Microarray Data for Diagnostic and
    Prognostic Classification J Natl Cancer Inst 95
    14-18.
  • Books Baldi Hatfield (2002), DNA Microarrays
    and Gene expression, Cambridge University Press
  • Speed, T. (2003) Statistical Analysis of Gene
    Expression Microarray DataChapman Hall
  • Acknowledgement Nicola Armstrong (EURANDOM)
Write a Comment
User Comments (0)
About PowerShow.com