An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments PowerPoint PPT Presentation

presentation player overlay
1 / 15
About This Presentation
Transcript and Presenter's Notes

Title: An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments


1
An Adaptive Empirical Bayesian Thresholding
Procedure for Analysing Microarray Experiments
  • Rebecca E. Walls, Stuart Barber, Mark S.
    Gilthorpe John T. Kent
  • rebecca_at_maths.leeds.ac.uk
  • University of Leeds

XXIII International Biometric Conference Montréal
July 2006
2
Microarray experiments
  • Suppose we observe intensity measurements Tijk
    (treatment) and Cijk (control), where
  • i 1, , n genes
  • j 1, , m chips
  • k 1, , r replicate spots
  • For which of the i genes are Tijk and Cijk
    significantly different in intensity level?
  • For those significant genes, can we estimate
    the level of differential expression?

3
Contrast variables
  • We assume that Tijk and Cijk have been suitably
    transformed and normalised and have constant
    variance (Huber et al.) denote adjusted
    intensities Tijk and Cijk
  • Define the contrast variable
  • Xijk Tijk - Cijk
  • We analyse the sequence of , where

4
Sparse sequences
  • Assume most genes are not differentially
    expressed
  • sequence of will be sparse
  • (0, 0, 0, -3.1, 0, 0, 0, 3.7, -2.6, 0, 0, 2.1)
  • (2.2, -0.1, -1.4, -5.4, -2.6, 2.2, -1.1, 1.0,
    -1.8, 1.2, 1.0, -0.1)
  • We adapt the EBayesThresh methodology of
    Johnstone and Silverman (2002), originally
    designed for thresholding wavelet coefficients

Add normally distributed noise with mean 0 and
some variance s2
5
Empirical Bayesian methodology
  • Suppose we have an observation Z which can be
    written in the form
  • The prior on µ is a mixture of d0(µ), a point
    mass at zero, and ?(µa), a heavy-tailed Laplace
    distribution,
  • in proportions according to the

mixing weight, 0 ? 1.
6
Empirical Bayesian methodology
  • Suppose we have an observation Z which can be
    written in the form
  • The prior on µ is a mixture of d0(µ), a point
    mass at zero, and ?(µa), a heavy-tailed Laplace
    distribution,
  • in proportions according to the

for small ?
mixing weight, 0 ? 1.
7
Posterior distribution for µ
  • The posterior distribution for µ given Z z is a
    mixture distribution with a point mass at zero
    and is given by
  • where,
  • and be calculated explicitly.

8
Estimating µ
  • Estimate µ by the posterior median
  • For a fixed ?, is a
    monotonic function with a thresholding property

An observation Z will yield a non-zero µ if Z
exceeds some threshold t(?)
9
Parameter estimation
  • Suppose now we have a sequence of observations of
    the form
  • for i 1, , n
  • We need to estimate the mixing weight ?, the
    scaling parameter a, and variance s2.
  • We use a maximum likelihood approach to find
    estimates for ? and a.
  • To estimate s2, we employ a sum-of-squares
    approach from fitting a linear additive model
    which accounts for both variation between chip
    replicates and spot replicates nested within the
    chips

10
Linear additive model
  • Model each observed intensity by
  • with µi gene specific mean
  • between chip variation
  • within chip variation
  • Given estimates for sB2 and sW2, the variance of
    is given by
  • Simulations show sum-of-squares approach more
    reliable as ? grows large

11
Data from homemade spotted array
12
Leukemia dataset Golub et al.
13
Results
Table 1 Percentages of differentially expressed
genes identified by different common methods
Simulation studies show that FDR too conservative
rather than Bayesian thresholding not
conservative enough!!
14
Conclusions
  • The empirical Bayesian approach is a natural way
    to incorporate our prior belief that not many
    genes are differentially expressed.
  • This method is not a one-size-fits-all!
  • Future work includes more reliable variance
    estimation through latent class modelling and a
    Bayesian model that can be fitted to raw data
    eliminating the need to normalise.

15
References
  • Golub, T. et al. (1999). Molecular classification
    of cancer Class discovery and class prediction
    by gene expression monitoring. Science, 286,
    531-537.
  • Huber, W. et al. (2003). Variance stabilization
    applied to microarray data calibration and to the
    quantification of differential expression.
    Bioinformatics, 18, S96-S104.
  • Johnstone, I. and Silverman, B. (2004). Needles
    and straw in haystacks Empirical bayes estimates
    of possibly sparse sequences. The Annals of
    Statistics, 32, 1594-1649.
Write a Comment
User Comments (0)
About PowerShow.com