Microarray Statistics - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Microarray Statistics

Description:

Following Suchard 2003 both the MCP and Dual MCP models ... Parsimony in a likelihood method. Felsenstein zone. Joseph Felsenstein, Systematic Zoology 1978 ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 59
Provided by: dirk6
Category:

less

Transcript and Presenter's Notes

Title: Microarray Statistics


1
DETECTING RECOMBINATION IN THE FELSENSTEIN ZONE
Alexander V. Mantzaris Wolfgang Lehrach Dirk
Husmeier
2
Inference of a phylogenetic tree
3
Nucleotide substitution model
4
Maximum likelihood tree
5
Probabilistic phylogenetic model
6
Steps of recombination
7
Steps of recombination
8
Steps of recombination
9
Steps of recombination
10
Steps of recombination
11
Steps of recombination
12
Sequence alignment changes from recombination
13
Sequence alignment changes from recombination
14
Multiple change point model (MCP)
  • Suchard 2003 the MCP uses Reversible Jump MCMC
    (RJMCMC) to place break points modelling the
    change in topology, rate of evolution and
    substitution parameters

s-topology r-rate t-substitution parameters
15
Dual multiple change point model (dual MCP)
  • Decoupling the topology from the substitution
    parameters removes the prior belief that these
    changes must coincide Minin 2005

s-topology r-rate t-substitution parameters
16
Analytic integration of branch lengths
  • Following Suchard 2003 both the MCP and Dual MCP
    models analytically integrate the branch lengths
    from the model
  • The approximation assumes independence of the
    branch lengths to make the model more tractable

17
Introduction of the prior into the model
18
Felsenstein parameters
  • Branch lengths are in groups d2 and d3 which all
    share the same length

19
Data
  • Sequence alignment data produced synthetically
  • seq-gen Rambaut, A. and Grassly, N. C. (1996)

20
Dualbrothers
d2-0.25 d3-0.25
Posterior probabilities per topology vs. position
21
Dualbrothers
d2-0.45 d3-0.65
Posterior probabilities per topology vs. position
22
Dualbrothers long branch attraction
  • d2-0.05 d3-0.85

Posterior probabilities per topology vs. position
23
Dualbrothers long branch attraction
  • d2-0.15 d3-0.85

Posterior probabilities per topology vs. position
24
Phylo-Factorial Hidden Markov model
  • Two independent chains for rate and topology
    inference
  • Husmeier, Bioinformatics 2005

25
Sampling in phylo-FHMM
26
FHMM
  • d2-0.25 d3-0.25

Posterior probabilities per topology vs. position
27
FHMM
d2-0.45 d3-0.65
Posterior probabilities per topology vs. position
28
FHMM long branch attraction
  • d2-0.05 d3-0.85

Posterior probabilities per topology vs. position
29
FHMM long branch attraction
  • d2-0.15 d3-0.85

Posterior probabilities per topology vs. position
30
Long branch attraction
  • Long branches can produce a statistically
    consistent wrong tree that converges towards a
    parsimonious tree
  • Parsimony in a likelihood method

31
Felsenstein zone
  • Joseph Felsenstein, Systematic Zoology 1978

32
Recpars on synthetic data set
  • Viewing most parsimonious tree with no
    recombination

score of correct topology-White low score of
correct topology-Black
33
Model integrating out the branch lengths
  • The data is dependent on the topology and rate
    with a hyper-parameter for the rate distribution

Model 1
34
Removing the branch length independence assumption
  • Model incorporates the branch length sampling

Model 2
35
Contrast to model without analytic integration
  • This new model incorporates branch lengths

Model 1
Model 2
36
Intra-model explorationAnnealed Importance
Sampling (AIS)
Objective P(DM)
Inter-model exploration Simultaneous MCMC
sampling of model topologies and parameters
37
Annealed importance sampling
  • Radford Neal 1998, based on importance sampling
    with annealing transitions

The first sample is produced from the prior
-prior
-posterior
38
Annealed importance sampling
Each annealing transition produces importance
weights
The weights allow computing the marginal
likelihood with respect to the sampling
distribution (prior here).
The ratio of the marginal likelihood with the
prior normalisation factor
39
AIS analytic integration
  • 200 samples, 5 MCMC steps, 5 transitions

Model 1
High posterior probability of correct
topology-White low posterior probability of
correct topology-Black
40
Annealed Importance Sampler
  • Average posterior probabilities of correct
    topology over 8 independent simulations

Model 2
High posterior probability of correct
topology-White low posterior probability of
correct topology-Black
41
Model 1/Model 2 comparison with AIS
Model 2
Model 1
Model 2
42
Remarks on Annealed importance sampling
  • Sporadic wrong inferences deep in the Felsenstein
    zone assumed to be due to the sampler
    occasionally missing the narrow high density
    regions as noted in Neal 1998
  • Longer simulations alleviate this weakness, which
    is unsuitable for large batch jobs
  • The method is particularly good where the large
    density region is less narrow

43
Intra-model explorationAnnealed Importance
Sampling (AIS)
Objective P(DM)
Inter-model exploration Simultaneous MCMC
sampling of model topologies and parameters
44
MCMC procedure without branch lengths
  • The peeling algorithm is used to calculate the
    probability of the data

45
Region of long branch attraction
Model 1
  • Long branch attraction where expected from the
    theory

High posterior probability of correct
topology-White low posterior probability of
correct topology-Black
46
MCMC procedure including branch lengths
  • The peeling algorithm is used to calculate the
    probability of the data

47
MCMC procedure including branch lengths
  • Symmetric proposal distributions centred on the
    current value-Cauchy used

48
Convergence Diagnostics
  • Gelman-Rubin tests performed as a heuristic test
    for convergence
  • PSR factor below 1.3 required 15K iterations for
    burn-in and 5K sampling

49
Avoiding the Felsenstein zone
Model 2
  • Metropolis-Hastings with Cauchy distribution
    escapes long branch attraction

High posterior probability of correct
topology-White low posterior probability of
correct topology-Black
50
Integration of branch length sampling into
phylo-FHMM
51
Contrast to model without analytic integration
  • This new model incorporates branch lengths

Model 1
Model 2
52
Felsenstein parameters
  • Branch lengths are in groups d2 and d3 which all
    share the same length

53
FHMM long branch attraction
  • d2-0.15 d3-0.85

Posterior probabilities per topology vs. position
54
Integration of branch length sampling into
phylo-FHMM
  • Previous incorrect inference corrected D2-0.15
    D3-0.85

Posterior probabilities per topology vs. position
55
Mosaic sequence alignment Model2
  • D2-0.25 D3-0.25 flanking 500bp
  • D2-0.15 D3-0.85 center

Model 2
Model 1
56
Conclusion
  • Long branch attraction is evident in the region
    of the Felsenstein zone for models using the
    independence assumption for branch lengths as in
    Suchard 2003
  • The more complex model escapes false positives
  • Future work on real biological DNA sequence
    alignments

57
Appendix
58
Reliability of sampling scheme
  • For problems outside the Felsenstein zone the
    scheme is reliable
  • Within the Felsenstein zone where the region of
    high density is narrow there can be occasions
    with wrong inference from a lack of convergence
  • Sufficient iterations are needed in any case
Write a Comment
User Comments (0)
About PowerShow.com