b - PowerPoint PPT Presentation

About This Presentation
Title:

b

Description:

b – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 32
Provided by: AHe1
Category:
Tags: kep | kip

less

Transcript and Presenter's Notes

Title: b


1
Bayesian inference in differential expression
experiments
Sylvia Richardson Natalia Bochkina Alex
Lewin Centre for Biostatistics Imperial College,
London
Biological Atlas of Insulin Resistance
www.bgx.org.uk
BBSRC
2
Background
  • Investigating changes of gene expression under
    different conditions is one of the key questions
    in many biological experiments
  • Specificity of the context is
  • High dimensional data (ten of thousands of genes)
    and few samples
  • Need to borrow information
  • Many sources of variability
  • Important to adopt a flexible modelling framework

Bayesian Hierarchical Modelling allows to capture
important features of the data while maintaining
generalisibility of the tools/ techniques
developed
3
Modelling differential expression
Condition 2
Condition 1
Start with given point estimates of expression

Hierarchical model of replicate variability and
array effect
Hierarchical model of replicate variability and
array effect
Posterior distribution (flat prior)
Differential expression parameter
Mixture modelling for classification
4
Outline
  • Background
  • Bayesian hierarchical models for differential
    expression experiments
  • Decision rules based on tail posterior
    probabilities
  • Comparison with existing approaches
  • FDR estimation for tail posterior probabilities
  • Extension of tail posterior probabilities to
    analysing multiclass experiments
  • Illustration
  • Discussion and further work

5
I -- Bayesian hierarchical model for differential
expression (Lewin et al, Biometrics, 2006)
  • Data ygcr log gene expression gene g,
    replicate r, condition c
  • ?g gene effect
  • dg differential effect for gene g between
    2 conditions
  • ?r(g)c array effect modelled as a smooth
    (spline) function of ?g
  • ?gc2 gene specific variance
  • 1st level yg1r ? N(?g ½ dg ?r(g)1 ,
    ?g12)
  • yg2r ? N(?g ½ dg ?r(g)2 , ?g22)
  • Sr?r(g)c 0, ?r(g)c function of ?g ,
    parameters c,d
  • 2nd level Flat priors for ?g , dg, c,d
  • ?gc2 ? g (ac, bc)
  • (lognormal or inverse-gamma)

Exchangeable variances
6
Joint modelling of array effects and differential
expression
  • Performs normalisation simultaneously with
    estimation
  • Gives fewer false positives than plug in
  • BHM set up allows to check some of the modelling
    assumptions using mixed posterior predictive
    checks
  • the need for gene specific variances
  • their 2nd level distribution

Found that lognormal or 2 parameter inverse gamma
distribution for the variances gave similar
model checks
7
Selecting genes that are differentially expressed
  • Interested in testing the null hypothesis
  • Two broad approaches have been used

P value type Mixture P(H0 ygcr)
H0 H1 U 0,1 close to 0 close to 1 close to 0
References Baldi and Long Smyth 2004, Moderated t stat Lonnstedt Speed 02, Newton Kendziorski, 01, 03 Lonnstedt Britton 05, Gottardo 06, .

8
Bayesian mixtures
  • Relies on specification of prior model for
  • Choice of model for the alternative (see the
    poster by Alex Lewin)
  • Could influence the performance of the
    classification
  • To check how the alternative fits the data is non
    standard

Investigate properties of Bayesian selection
rules based on non informative prior for
9
II -- Bayesian selection rules for pairwise
comparisons
  • 1st level (no array effect)
  • Hierarchical model
  • Extend p value approach to consider the tail
    probabilities of appropriate function of
    parameters

10
Posterior distributions
  • Define the Bayesian T statistic
  • The following conditional distributions hold
  • Posterior distributions

11
Tail posterior probabilities 1 (N. Bochkina and
SR, 2006)
  • Use selection rules of the form
  • What statistic to choose
  • How to define its percentiles ?
  • we suppose that we could have observed data
  • with (its expected value of
    under the null)
  • work out the percentiles using posterior
    distributions conditional on

Summarise the distribution of the Tg by a tail
area
12
Tail posterior probabilities 2Recall
  • Corresponding distribution function involves
    numerical integration ? computationally expensive
  • But
  • Distribution function of
  • does not involve gene specific parameters

? The percentile is easy to calculate ? Consider
the tail probability
13
Key point F0 is gene independent (conjugate case)
14
Another Bayesian rule
  • A natural idea is to compare the parameter
    to 0,
  • i.e. to consider
  • or its complementary or the 2-sided alternative
  • It turns out that this Bayesian selection rule
    behaves like a p-value
  • Distribution of is uniform under H0
  • There is equivalence with frequentist testing
    based on the marginal distribution of under
    the null, in the spirit of the moderated t
    statistic introduced by Smyth 2004

15
Link between p(dg,0) and the moderated t statistic
Moderated t statistic
16
Histograms of measure of differential
expression Simulated data
17
Tail posterior probabilities 3
  • Investigate the performance of selection rules
    based on
  • In particular
  • what is the FDR associated with each value of
    ?
  • In the conjugate case
  • How does this rule compares to rules based on

Use F0
Use Storey
Use observed proportion
18
Comparison of estimated (solid line) and true
FDR (dashed line) on simulated data
p0 0.90
p0 0.70
p0 0.95
19
III-- Data Sets and Biological questions
  • Biological Questions
  • Understand the mechanisms of insulin resistance
  • Cell line experiments where reaction of mouse
    muscle cell line to treatment by insulin or
    metformin (an insulin replacement drug) is
    observed after 2 and 12 hours
  • Questions of interest related to simple and
    compound comparisons
  • 3 replicates for each condition, Affymetrix
    MOE430A chip, 22690 genes per chip
  • Data pre-processed by RMA and normalised using
    intensity dependent LOESS normalisation

20
Volcano plots for muscle cell data Change
between insulin and control at 2 hours
p(tg , t (a)), a 0.05
2max p(dg ,0), 1- p(dg ,0) - 1
Cut-off 0. 925
Peaked around zero Varies steeply as a function
of
Less peaked around zero Allows better separation
21
Insulin versus control
p0 0.61
p0 0.98
22
Metformin versus control
Tail posterior probabilities
2 hours
12 hours
p0 0.56
p0 0.79
Estimated FDR
72 selected (FDR 0.5)
1854 selected (FDR 0.5)
23
IV Extension to the analysis of multi class data
  • In our case study, 3 groups (control c0, insulin
    c1, metformin c2) and 3 times points t0, t1
    ( 2 hours), t2 (12 hours) each replicated 3
    times
  • ANOVA like model formulation suited to the
    analysis of such multifactorial experiments

Global variance parametrisation (borrowing
information)
24
Joint tail posterior probabilities
  • Interest is in testing a compound null
    hypothesis, i.e. involving several differential
    parameters
  • e.g. testing jointly for the effect of insulin
    and metformin at 2 hours
  • In this case, we are interested in a specific
    alternative
  • Note Rejecting the null hypothesis in an ANOVA
    setting corresponds to a different alternative
  • Define joint tail posterior probabilities
  • where is the Bayesian T statistic for each
    treatment

25
Benefits of joint posterior probabilities
  • Takes into account correlation of the
    differential expression measures between the
    conditions induced by sharing the same variance
    parameter
  • Usual practice is to
  • Carry out pairwise comparisons
  • Select genes for each comparison using same
    cut-off on the pp
  • Intersect lists and find genes common to both
    lists
  • Joint pp shown to lead to fewer false positives
    in this case of positive correlation (simulation
    study)

26
Correlation of DE parameters and Bayesian T
statistic for insulin and metformin (2 hours)
  • With joint tail posterior probabilities, and a
    cut-off of pcut 0.92, 280 selected as jointly
    perturbed at 2 hours
  • Applying pairwise comparison and combining the
    lists adds another 47 genes to the list

27
Discussion 1
  • Tail posterior probabilities (Tpp) is a generic
    tool that can be used in any situations where a
    large number of hypotheses related in a
    hierarchical fashion are to be tested
  • We have derived the distribution of the Tpp under
    the null and proposed a corresponding estimate of
    FDR
  • This distribution requires numerical integration
    but is gene independent (conjugate case), so only
    needs to be evaluated once
  • Tpp is a smooth function of the amount of DE with
    a gradient that spreads the genes, thus
    allowing to choose genes with desired level of
    uncertainty about their DE
  • Interesting connection between Bayesian and
    frequentist inference for the differential
    expression parameter

28
Discussion 2
  • Interesting to compare performance of Tpp with
    that of mixture models
  • E.g Gamma mixtures (see poster by Alex Lewin)
  • dg p0d0 p1G (-x1.5, ?1) p2G (x1.5, ?2)
  • H0 H1
  • Dirichlet distribution for (p0, p1, p2)
  • Exp(1) hyper prior for ?1 and ?2
  • Also Normal and t mixtures have been considered
  • dg p0d0 (1-p0) T(?,µ,t) (µ 1, t,
    ? -1 Exp(1) )
  • dg p0d0 (1-p0) N(µ,t) (µ 1, t
    Exp(1) )

29
Simulated data
  • 3000 variables, 6 replicates, 2 conditions
  • yg1r ? N(?g, ?g2)
  • yg2r ? N(?g dg, ?g2)
  • ?g2 0.03 LogNorm(-3.85, 0.82),
  • ?g Norm(7, 25),
  • dg slightly asymmetric
  • 5 dg dg gt 0 h( dg),
  • 10 dg dg lt 0 h(-dg),
  • 85 dg N(0, 0.01),

30
Comparison of mixture and tail pp
  • Fit 3 mixture models (Gamma, Normal, t
    alternative) and flat model.
  • Classification mixtures P H1 data, flat tail
    posterior probability.

Comparable performance, with a little edge for
the Gamma and Normal mixture
31
Thanks
BBSRC Exploiting Genomics grant Wellcome Trust
BAIR consortium Colleagues in the Biostatistics
group Marta Blangiardo, Anne Mette Hein, Maria
de Iorio Colleagues in the Biology group at
Imperial Tim Aitman, Ulrika Andersson, Dave
Carling Papers and technical reports
www.bgx.org.uk/ For the tail probability
paper www.bgx.org.uk/Natalia/Bochkina.ps
Write a Comment
User Comments (0)
About PowerShow.com