An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments

Description:

... Johnstone and Silverman (2002), originally designed ... Johnstone, I. and Silverman, B. (2004) ... McLachlan, G., Bean, R. and Ben-Tovim Jones, L. (2006) ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 21

Provided by: bsu83

Category:

more less

Transcript and Presenter's Notes

Title: An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments

1
An Adaptive Empirical Bayesian Thresholding
Procedure for Analysing Microarray Experiments

Rebecca E. Walls, Stuart Barber, Mark S.
Gilthorpe John T. Kent
rebecca_at_maths.leeds.ac.uk
University of Leeds

PG seminar series 6th December 2006
2
Outline

An introduction to gene expression and
microarrays
( THIS WILL BE VERY BRIEF!!)
Empirical Bayesian methodology
Results

3
Gene expression

Gene expression - process in which a cell
transfers the coded information stored in its DNA
into proteins
Gene expression is regulated genes will only
express at the right time and in the right cell
and responds to enviromental stimuli
Interesting questions to try and answer...
Which genes are expressed in which tissues?
How is the expression of a gene influenced by
external stimuli?
What patterns of gene expression cause a disease
or lead to disease progression?
What patterns of gene expression influence
response to treatment?

4
Philosophy behind microarrays

Microarrays allow the comparison of gene
expression between multiple samples for many
thousands of genes simultaneously
Previously we described gene expression as a
process- how can we measure a process??
Proteins notoriously hard to measure
accurately
Instead, we measure the abundance of
the intermediary molecule mRNA

5
Notation

Suppose we observe intensity measurements Tijk
(treatment) and Cijk (control), where
i 1, , n genes
j 1, , m chips
k 1, , r replicate spots
For which of the i genes are Tijk and Cijk
significantly different in intensity level?
For those significant genes, can we estimate
the level of differential expression?

6
Statistical challenges

Data generation
Noise
Background
Experimental variability (chip, samples, lab)
Intensities not well distributed
T-test becomes invalid!!
Data structure
Many obs (from 100 20k genes per chip,
replicated)
Multiple testing ( suppose that 10,000 genes are
tested could incur as many as 500 false
positives at 5 level!)
Lack of independence (20k genes -gt 1M proteins!)

7
Contrast variables

We assume that Tijk and Cijk have been suitably
transformed and normalised and have constant
variance (Huber et al.) denote adjusted
intensities Tijk and Cijk (on logarithmic
scale)
Define the contrast variable
Xijk Tijk - Cijk
We analyse the sequence of , where

8
Sparse sequences

Assume most genes are not differentially
expressed
sequence of will be sparse
(0, 0, 0, -3.1, 0, 0, 0, 3.7, -2.6, 0, 0, 2.1)
(2.2, -0.1, -1.4, -5.4, -2.6, 2.2, -1.1, 1.0,
-1.8, 1.2, 1.0, -0.1)
We adapt the EBayesThresh methodology of
Johnstone and Silverman (2002), originally
designed for thresholding wavelet coefficients

Add normally distributed noise with mean 0 and
some variance s2
9
Empirical Bayesian methodology

Suppose we have an observation Z which can be
written in the form
The prior on µ is a mixture of d0(µ), a point
mass at zero, and ?(µa), a heavy-tailed Laplace
distribution,
in proportions according to the

mixing weight, 0 ? 1.
10
Empirical Bayesian methodology

Suppose we have an observation Z which can be
written in the form
The prior on µ is a mixture of d0(µ), a point
mass at zero, and ?(µa), a heavy-tailed Laplace
distribution,
in proportions according to the

for small ?
mixing weight, 0 ? 1.
11
Posterior distribution for µ

The posterior distribution for µ given Z z is a
mixture distribution with a point mass at zero
and is given by
where,
and be calculated explicitly.

12
Estimating µ

Estimate µ by the posterior median
For a fixed ?, is a
monotonic function with a thresholding property

An observation Z will yield a non-zero µ if Z
exceeds some threshold t(?)
13
Parameter estimation

Suppose now we have a sequence of observations of
the form
for i 1, , n
We need to estimate the mixing weight ?, the
scaling parameter a, and variance s2.
We use a maximum likelihood approach to find
estimates for ? and a.
To estimate s2, we employ a sum-of-squares
approach from fitting a linear additive model
which accounts for both variation between chip
replicates and spot replicates nested within the
chips

14
Linear additive model

Model each observed intensity by
with µi gene specific mean
between chip variation
within chip variation
Given estimates for sB2 and sW2, the variance of
is given by
Simulations show sum-of-squares approach more
reliable as ? grows large

15
Error distribution for spike-in experiment HIV
data
16
Results for HIV spike-in experiment
17
Data from homemade spotted array E.Coli
18
Results
Table 1 Numbers of differentially expressed
genes identified by different common methods
19
Conclusions

The empirical Bayesian approach is a natural way
to incorporate our prior belief that not many
genes are differentially expressed.
Making an adjustment to the variance is not
sufficient compensation for using the incorrect
prior distribution! Future work includes using a
Laplace distribution for the errors.

20
References

Hedenfalk, I. et al. (2001). Gene expression
profiles in hereditary breast cancer. The New
England Journal of Medicine, 344 (8), 539-548.
Huber, W. et al. (2003). Variance stabilization
applied to microarray data calibration and to the
quantification of differential expression.
Bioinformatics, 18, S96-S104.
Johnstone, I. and Silverman, B. (2004). Needles
and straw in haystacks Empirical bayes estimates
of possibly sparse sequences. The Annals of
Statistics, 32, 1594-1649.
McLachlan, G., Bean, R. and Ben-Tovim Jones, L.
(2006). A simple implementation of a normal
mixture approach to differential expression in
multiclass microarrays. Bioinformatics, 22 (13),
1608-1615.
Smyth, G., Michaud, J. and Scott, H. (2005). Use
of within-array replicate spots for assessing
differential expression in microarray
experiments. Bioinformatics, 21 (9), 2067-2075.
Tusher, V., Tibshirani, R. and Chu, C. (2001).
Significance analysis of microarray applied to
transcriptional responses to ionizing radiation.
Proc. Natn. Acad. Sci. USA, 98, 5116-5121.