Model checks for complex hierarchical models - PowerPoint PPT Presentation

About This Presentation
Title:

Model checks for complex hierarchical models

Description:

Many complex models used in bioinformatics. Classification/clustering can be greatly affected by ... Our approach: exploit the structure of the model to ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 32
Provided by: richar509
Category:

less

Transcript and Presenter's Notes

Title: Model checks for complex hierarchical models


1
Model checks for complex hierarchical models
  • Alex Lewin and Sylvia Richardson
  • Imperial College
  • Centre for Biostatistics

2
Background and Aims
  • Many complex models used in bioinformatics
  • Classification/clustering can be greatly affected
    by choice of distributions
  • Our approach exploit the structure of the model
    to perform predictive checks
  • hierarchical models generally involve
    exchangeability assumptions
  • mixture models are partially exchangeable

3
Outline of Talk
  • Mixture model for gene expression data
  • Model checks for mixture model
  • distribution for gene-specific variances
  • different mixture priors
  • Future work model checks for a clustering and
    variable selection model (Tadesse et al. 2005)

4
Hierarchical mixture model for gene expression
data
w Dirichlet(1,,1), various priors for dg, ?g
dg ? Swjhj(?j), ?g2 µ,t ? f(µ,t)
ygr dg, ?g ? N(dg, ?g2)
g gene r replicate j mixture component
5
Mixture model for gene expression data
  • Many mixture models have been proposed for gene
    expression data
  • Set-up is similar to variable selection prior
    point mass alternative distribution
  • Particular choices for alternative
  • Normal (Lönnstedt and Speed)
  • Uniform (Parmigiani et al)
  • many others

6
Mixture model for gene expression data
Allow for asymmetry in over-and under-expressed
genes ? 3-component mixture model dg ?
w1h1(?1) w2h2(?2) w3h3(?3)
6 knock-out and 5 wildtype mice MAS5.0 processed
data
7
Mixture model for gene expression data
Classify each gene into mixture components using
posterior probabilities
8
Choice of mixture prior affects classification
results
Mixture Prior for dg Est. w2 ( in null)
w1Unif(-?-,0) w2d(0) w3Unif(0,?) 0.96
w1Gam-(1.5,?-) w2 d(0) w3Gam(1.5,?) 0.68
w1Gam-(1.5,?-) w2N(0,e) w3Gam(1.5,?) 0.99
9
Outline of Talk
  • Mixture model for gene expression data
  • Models checks for mixture model
  • distribution for gene-specific variances
  • different mixture priors
  • Future work model checks for a clustering and
    variable selection model (Tadesse et al. 2005)

10
Predictive model checks
  • Predict new data from the model
  • Use posterior predictive distribution
  • Condition on hyperparameters (mixed predictive
    ? not very conservative)
  • Get Bayesian p-value for each gene/marker/sample
  • Use all p-values together (100s or 1000s) to
    assess model fit
  • Gelman, Meng and Stern 1995 Marshall and
    Spiegelhalter 2003

11
Checking distribution for gene variances
Bayesian p-value for gene g pg Prob( Smpred gt
Sgobs data )
All genes are exchangeable ? histogram of
p-values for all genes together
12
Mixed v. posterior predictive
  • Predictive p-values for data simulated from the
    model
  • Histograms should be Uniform
  • Mixed predictive distribution much less
    conservative than posterior predictive

Using global distribution
Using gene-specific distributions
13
Checking different variance models
?g2 µ,t ? Gam(µ,t), µ fixed
?g2 ?2 for all genes
Model differential expression between 3
transgenic and 3 wildtype mice
?g2 µ,t ? Gam(µ,t)
?g2 µ,t ? logNorm(µ,t)
14
Implementation (MCMC)
  • pg 0
  • for t 1,,niter
  • stpred ? f(µt,tt)
  • Stmpred ? Gam( m, m(stpred)-2 )
  • pg ? pg I Stmpred gt Sgobs
  • pg ? pg / niter

niter no. MCMC iterations m (no. replicates
1)/2
Just two extra parameters predicted at each
iteration
15
Outline of Talk
  • Mixture model for gene expression data
  • Model checks for mixture model
  • distribution for gene-specific variances
  • different mixture priors
  • Future work model checks for a clustering and
    variable selection model (Tadesse et al. 2005)

16
Checking mixture prior
dg ? w1h1(?1) w2h2(?2) w3h3(?3) OR dg
?, zg j hj(?j) j 1,,3 P(zg j)
wj Model checking focus on separate mixture
components
17
Issues for mixture model checking
  • dg ?, zg j hj(?j) j 1,,3
  • Think about MCMC iterations
  • Mixture component is estimated from genes
    currently assigned to that component
  • Can only define p-value for given gene and mix.
    component when the gene is assigned to that
    component (i.e. condition on zg in p-value)
  • So check each component using only the genes
    currently assigned (i.e. condition on zg in
    histogram)

18
Predictive checks for mixture model
Bayesian p-value for gene g and mix. component
j pgj Prob( ybargjmpred gt ybargobs data,
zgj )
  • Genes assigned to the same mix. component are
    exchangeable
  • histogram of p-values for each mix. component
    separately
  • histogram for component j made only from genes
    with large P(zg j)

19
Condition on classification to check separate
components
Predictive p-values for data simulated from the
model
All genes with P(zg j) gt 0 Only genes with
P(zg j) gt 0.5
Effectively we condition on a best classification
20
Checking different mixture distributions
w1Unif(-?-,0) w2d(0) w3Unif(0,?)
  • Outer mix. components skewed too much away from
    zero
  • Null component too narrow

21
Checking different mixture distributions
w1Gam-(1.5,?-) w2 d(0) w3Gam(1.5,?)
  • Outer components skewed opposite
  • Null still too narrow?

22
Checking different mixture distributions
w1Gam-(1.5,?-) w2N(0,e) w3Gam(1.5,?)
  • Better fit for all components

23
Implementation
  • pgj 0
  • for t 1,,niter
  • djtpred hjt(?jt) j 1,,3
  • ybargtmpred ? N( djtpred , ?g2/nrep ) for
    j zgt
  • pgj ? pgj I ybargtmpred gt ybargobs for j
    zgt
  • pgj ? pgj / niter(zgj)

Need ngenes extra parameters at each iteration
24
Summary of model checking procedure
  1. Find part of model where individuals are assumed
    to be exchangeable (so information is shared)
  2. Choose test statistic T (eg. sample mean or
    variance)
  3. Predict Tpred from distribution for exchangeable
    individuals (whole posterior for Tpred)
  4. Compare observed Ti for each individual i to
    distribution of Tpred
  5. For checking mixture components, condition on the
    best classification

25
Outline of Talk
  • Mixture model for gene expression data
  • Model checks for mixture model
  • distribution for gene-specific variances
  • different mixture priors
  • Future work model checks for a clustering and
    variable selection model (Tadesse et al. 2005)

26
Clustering and variable selection (Tadesse et al.
2005)
  • yi vector of gene expression for each sample i
    1,,n
  • Multi-variate mixture model for clustering
    samples
  • yi zi j ? MVN(?j, ?j) j 1,,J
  • P(zi j) wj
  • No. of mix. components (J) is estimated in the
    model
  • Aim to select genes which are informative for
    clustering the samples

27
Clustering and variable selection (Tadesse et al.
2005)
? vector of indices of variables not used to
cluster samples
Likelihood conditional on allocation to mixture
? vector of indices of selected variables
Conjugate priors on multivariate means and
covariance matrices P(?g 1) f
i sample g gene j mix. component
28
Clustering and variable selection (Tadesse et al.
2005)
Model checking want to check the distribution
for each mixture component separately
(conditional on J) In addition, need to condition
on a given variable selection Clearly impossible
computationally
i sample g gene j mix. component
29
Computing predictive p-values
  • Run model with no prediction
  • Find the best configuration
  • set of selected variables (?)
  • no. mixture components J
  • allocation of samples to mixture components zi
  • Re-run model, with (?), J and zi fixed,
    calculated predictive p-values

pij Prob( Tjpred gt Tiobs data, zij, J, (?)
) where T y2 (for example)
30
Conclusions
  • Choice of model distributions can greatly
    influence results of clustering and
    classification
  • For models where information is shared across
    individuals, predictive checks can be used as an
    alternative to cross-validation
  • Should be possible to do this even for quite
    complex models (if you can fit the model, you can
    check it)

31
Acknowledgements
Collaborators on BBSRC Exploiting Genomics
Grant Natalia Bochkina, Clare Marshall Peter
Green Meeting on model checking in
Cambridge David Spiegelhalter Shaun Seaman BBSRC
Exploiting Genomics Grant Paper and software at
http//www.bgx.org.uk/
Write a Comment
User Comments (0)
About PowerShow.com