An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments

Description:

... Johnstone and Silverman (2002), originally designed ... Johnstone, I. and Silverman, B. (2004) ... McLachlan, G., Bean, R. and Ben-Tovim Jones, L. (2006) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: bsu83
Category:

less

Transcript and Presenter's Notes

Title: An Adaptive Empirical Bayesian Thresholding Procedure for Analysing Microarray Experiments


1
An Adaptive Empirical Bayesian Thresholding
Procedure for Analysing Microarray Experiments
  • Rebecca E. Walls, Stuart Barber, Mark S.
    Gilthorpe John T. Kent
  • rebecca_at_maths.leeds.ac.uk
  • University of Leeds

PG seminar series 6th December 2006
2
Outline
  • An introduction to gene expression and
    microarrays
  • ( THIS WILL BE VERY BRIEF!!)
  • Empirical Bayesian methodology
  • Results

3
Gene expression
  • Gene expression - process in which a cell
    transfers the coded information stored in its DNA
    into proteins
  • Gene expression is regulated genes will only
    express at the right time and in the right cell
    and responds to enviromental stimuli
  • Interesting questions to try and answer...
  • Which genes are expressed in which tissues?
  • How is the expression of a gene influenced by
    external stimuli?
  • What patterns of gene expression cause a disease
    or lead to disease progression?
  • What patterns of gene expression influence
    response to treatment?

4
Philosophy behind microarrays
  • Microarrays allow the comparison of gene
    expression between multiple samples for many
    thousands of genes simultaneously
  • Previously we described gene expression as a
    process- how can we measure a process??
  • Proteins notoriously hard to measure
    accurately
  • Instead, we measure the abundance of
    the intermediary molecule mRNA

5
Notation
  • Suppose we observe intensity measurements Tijk
    (treatment) and Cijk (control), where
  • i 1, , n genes
  • j 1, , m chips
  • k 1, , r replicate spots
  • For which of the i genes are Tijk and Cijk
    significantly different in intensity level?
  • For those significant genes, can we estimate
    the level of differential expression?

6
Statistical challenges
  • Data generation
  • Noise
  • Background
  • Experimental variability (chip, samples, lab)
  • Intensities not well distributed
  • T-test becomes invalid!!
  • Data structure
  • Many obs (from 100 20k genes per chip,
    replicated)
  • Multiple testing ( suppose that 10,000 genes are
    tested could incur as many as 500 false
    positives at 5 level!)
  • Lack of independence (20k genes -gt 1M proteins!)

7
Contrast variables
  • We assume that Tijk and Cijk have been suitably
    transformed and normalised and have constant
    variance (Huber et al.) denote adjusted
    intensities Tijk and Cijk (on logarithmic
    scale)
  • Define the contrast variable
  • Xijk Tijk - Cijk
  • We analyse the sequence of , where

8
Sparse sequences
  • Assume most genes are not differentially
    expressed
  • sequence of will be sparse
  • (0, 0, 0, -3.1, 0, 0, 0, 3.7, -2.6, 0, 0, 2.1)
  • (2.2, -0.1, -1.4, -5.4, -2.6, 2.2, -1.1, 1.0,
    -1.8, 1.2, 1.0, -0.1)
  • We adapt the EBayesThresh methodology of
    Johnstone and Silverman (2002), originally
    designed for thresholding wavelet coefficients

Add normally distributed noise with mean 0 and
some variance s2
9
Empirical Bayesian methodology
  • Suppose we have an observation Z which can be
    written in the form
  • The prior on µ is a mixture of d0(µ), a point
    mass at zero, and ?(µa), a heavy-tailed Laplace
    distribution,
  • in proportions according to the

mixing weight, 0 ? 1.
10
Empirical Bayesian methodology
  • Suppose we have an observation Z which can be
    written in the form
  • The prior on µ is a mixture of d0(µ), a point
    mass at zero, and ?(µa), a heavy-tailed Laplace
    distribution,
  • in proportions according to the

for small ?
mixing weight, 0 ? 1.
11
Posterior distribution for µ
  • The posterior distribution for µ given Z z is a
    mixture distribution with a point mass at zero
    and is given by
  • where,
  • and be calculated explicitly.

12
Estimating µ
  • Estimate µ by the posterior median
  • For a fixed ?, is a
    monotonic function with a thresholding property

An observation Z will yield a non-zero µ if Z
exceeds some threshold t(?)
13
Parameter estimation
  • Suppose now we have a sequence of observations of
    the form
  • for i 1, , n
  • We need to estimate the mixing weight ?, the
    scaling parameter a, and variance s2.
  • We use a maximum likelihood approach to find
    estimates for ? and a.
  • To estimate s2, we employ a sum-of-squares
    approach from fitting a linear additive model
    which accounts for both variation between chip
    replicates and spot replicates nested within the
    chips

14
Linear additive model
  • Model each observed intensity by
  • with µi gene specific mean
  • between chip variation
  • within chip variation
  • Given estimates for sB2 and sW2, the variance of
    is given by
  • Simulations show sum-of-squares approach more
    reliable as ? grows large

15
Error distribution for spike-in experiment HIV
data
16
Results for HIV spike-in experiment
17
Data from homemade spotted array E.Coli
18
Results
Table 1 Numbers of differentially expressed
genes identified by different common methods
19
Conclusions
  • The empirical Bayesian approach is a natural way
    to incorporate our prior belief that not many
    genes are differentially expressed.
  • Making an adjustment to the variance is not
    sufficient compensation for using the incorrect
    prior distribution! Future work includes using a
    Laplace distribution for the errors.

20
References
  • Hedenfalk, I. et al. (2001). Gene expression
    profiles in hereditary breast cancer. The New
    England Journal of Medicine, 344 (8), 539-548.
  • Huber, W. et al. (2003). Variance stabilization
    applied to microarray data calibration and to the
    quantification of differential expression.
    Bioinformatics, 18, S96-S104.
  • Johnstone, I. and Silverman, B. (2004). Needles
    and straw in haystacks Empirical bayes estimates
    of possibly sparse sequences. The Annals of
    Statistics, 32, 1594-1649.
  • McLachlan, G., Bean, R. and Ben-Tovim Jones, L.
    (2006). A simple implementation of a normal
    mixture approach to differential expression in
    multiclass microarrays. Bioinformatics, 22 (13),
    1608-1615.
  • Smyth, G., Michaud, J. and Scott, H. (2005). Use
    of within-array replicate spots for assessing
    differential expression in microarray
    experiments. Bioinformatics, 21 (9), 2067-2075.
  • Tusher, V., Tibshirani, R. and Chu, C. (2001).
    Significance analysis of microarray applied to
    transcriptional responses to ionizing radiation.
    Proc. Natn. Acad. Sci. USA, 98, 5116-5121.
Write a Comment
User Comments (0)
About PowerShow.com