Title: An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments
1An Adaptive Empirical Bayesian Thresholding
Procedure for Analysing Microarray Experiments
- Rebecca E. Walls, Stuart Barber, Mark S.
Gilthorpe John T. Kent - rebecca_at_maths.leeds.ac.uk
- University of Leeds
XXIII International Biometric Conference Montréal
July 2006
2Microarray experiments
- Suppose we observe intensity measurements Tijk
(treatment) and Cijk (control), where - i 1, , n genes
- j 1, , m chips
- k 1, , r replicate spots
- For which of the i genes are Tijk and Cijk
significantly different in intensity level? - For those significant genes, can we estimate
the level of differential expression?
3Contrast variables
- We assume that Tijk and Cijk have been suitably
transformed and normalised and have constant
variance (Huber et al.) denote adjusted
intensities Tijk and Cijk - Define the contrast variable
- Xijk Tijk - Cijk
- We analyse the sequence of , where
4Sparse sequences
- Assume most genes are not differentially
expressed - sequence of will be sparse
- (0, 0, 0, -3.1, 0, 0, 0, 3.7, -2.6, 0, 0, 2.1)
- (2.2, -0.1, -1.4, -5.4, -2.6, 2.2, -1.1, 1.0,
-1.8, 1.2, 1.0, -0.1) - We adapt the EBayesThresh methodology of
Johnstone and Silverman (2002), originally
designed for thresholding wavelet coefficients
Add normally distributed noise with mean 0 and
some variance s2
5Empirical Bayesian methodology
- Suppose we have an observation Z which can be
written in the form - The prior on µ is a mixture of d0(µ), a point
mass at zero, and ?(µa), a heavy-tailed Laplace
distribution, - in proportions according to the
mixing weight, 0 ? 1.
6Empirical Bayesian methodology
- Suppose we have an observation Z which can be
written in the form - The prior on µ is a mixture of d0(µ), a point
mass at zero, and ?(µa), a heavy-tailed Laplace
distribution, - in proportions according to the
for small ?
mixing weight, 0 ? 1.
7Posterior distribution for µ
- The posterior distribution for µ given Z z is a
mixture distribution with a point mass at zero
and is given by - where,
- and be calculated explicitly.
8Estimating µ
- Estimate µ by the posterior median
- For a fixed ?, is a
monotonic function with a thresholding property
An observation Z will yield a non-zero µ if Z
exceeds some threshold t(?)
9Parameter estimation
- Suppose now we have a sequence of observations of
the form -
- for i 1, , n
- We need to estimate the mixing weight ?, the
scaling parameter a, and variance s2. - We use a maximum likelihood approach to find
estimates for ? and a. - To estimate s2, we employ a sum-of-squares
approach from fitting a linear additive model
which accounts for both variation between chip
replicates and spot replicates nested within the
chips
10Linear additive model
- Model each observed intensity by
- with µi gene specific mean
- between chip variation
- within chip variation
- Given estimates for sB2 and sW2, the variance of
is given by - Simulations show sum-of-squares approach more
reliable as ? grows large
11Data from homemade spotted array
12Leukemia dataset Golub et al.
13Results
Table 1 Percentages of differentially expressed
genes identified by different common methods
Simulation studies show that FDR too conservative
rather than Bayesian thresholding not
conservative enough!!
14Conclusions
- The empirical Bayesian approach is a natural way
to incorporate our prior belief that not many
genes are differentially expressed. - This method is not a one-size-fits-all!
- Future work includes more reliable variance
estimation through latent class modelling and a
Bayesian model that can be fitted to raw data
eliminating the need to normalise.
15References
- Golub, T. et al. (1999). Molecular classification
of cancer Class discovery and class prediction
by gene expression monitoring. Science, 286,
531-537. - Huber, W. et al. (2003). Variance stabilization
applied to microarray data calibration and to the
quantification of differential expression.
Bioinformatics, 18, S96-S104. - Johnstone, I. and Silverman, B. (2004). Needles
and straw in haystacks Empirical bayes estimates
of possibly sparse sequences. The Annals of
Statistics, 32, 1594-1649.