Title: Parallel
1Parallel Distributed Systems and Algorithms for
Inference of Large Phylogenetic Trees with
Maximum Likelihood
- Alexandros Stamatakis
- LRR TU München
- Contact stamatak_at_cs.tum.edu
2Outline
- Motivation
- Introduction to phylogenetic tree inference
- Statistical inference methods
- Maximum Likelihood associated problems
- Solutions
- 2 simple heuristics
- parallel distributed implementation
- Results
- Conclusion
- Availability Future Work
3Motivation Towards a Tree of Life
- 30.000 organisms available, current trees lt 1000
Where we are
4Motivation Towards a Tree of Life
- 30.000 organisms available, current trees lt 1000
Where we want to get
5Phylogenetic Tree Inference
- Input good multiple alignment of a
distinguished, highly conserved part of DNA
sequences - Output unrooted binary tree with the sequences
at its leaves (all nodes either degree 1 or 3) - Various methods for phylogenetic tree inference
- Differ in computational complexity and quality of
trees - Most accurate methods Maximum Likelihood Method
(ML) and Bayesian Phylogenetic Inference - most sound and flexible methods
- other methods not suited for
large/complex trees - -- most computationally intensive methods
6ML and Bayesian methods
- T.Williams et al (March 2003) comparative
analysis with simulated data shows MrBayes is
best program - Guidon et al (May 2003) PHYML very fast
accurate ML program for real simulated data
faster than MrBayes - ML (PHYML, RAxML2)
- Significantly faster than MrBayes
- Reference/starting trees for bayesian methods
- -- Less powerful statistical model
- Bayesian Inference (MrBayes)
- Powerful statistical model
- -- MCMC convergence problem
- Memory requirements for 1000/10000-taxon
alignment - RAxML 200MB/750MB
- PHYML 900MB/8.8GB
- MrBayes 1150MB/unknown
7MCMC Convergence Problem
8What does ML compute?
- Maximum Likelihood calculates
- Topologies
- Branch lengths vi
- Likelihood of the tree
S1
v1
S3
v5
S4
v3
v7
v4
v2
S2
v6
S5
Goal Find tree topology wich maximizes
likelihood Problem I Number of possible
topologies is exponential in n Problem II
Computation of likelihood value branch length
optimization is expensive Solution
Algorithmic Optimizations (previous work) New
heuristics HPC
9New Heuristics for RAxML
- Two common methods to build a tree
- Progressive addition of organisms e.g. stepwise
addition algorithm - Use a (random, simple) starting tree containing
all organisms and optimize likelihood by
application of topological changes - RAxML (Randomized Axelerated Maximum Likelihood)
computes parsimony starting tree with dnapars - -gt fast and relatively good initial likelihood
- dnapars uses stepwise addition -gt randomized
sequence input order to obtain distinct starting
trees - Optimize starting tree by application of
rearrangements - Accelerate rearrangements by two simple ideas
10Subtree Rearrangements
11Subtree Rearrangements
ST2
ST1
ST3
ST6
ST4
ST5
12Subtree Rearrangements
1
ST2
ST1
ST3
ST6
ST4
ST5
13Subtree Rearrangements
1
ST2
ST1
ST3
ST6
ST4
ST5
14Subtree Rearrangements
1
ST6
ST2
ST1
ST3
ST4
ST5
15Subtree Rearrangements
1
ST6
ST2
ST1
ST3
ST4
ST5
16Subtree Rearrangements
2
ST2
ST1
ST3
ST4
ST5
ST6
17Subtree Rearrangements
2
ST2
ST1
ST3
ST4
ST5
ST6
18Subtree Rearrangements
ST2
ST1
Optimize all branches
ST3
ST4
ST5
ST6
19Subtree Rearrangements
ST2
ST1
Need to optimize all branches ?
ST3
ST4
ST5
ST6
20Idea 1 Local Optimization of Branch Length
ST2
ST1
ST3
ST6
ST4
ST5
21Idea 1 Local Optimization of Branch Length
ST2
ST1
ST3
ST6
ST4
ST5
22Why is Idea 1 useful?
- Local optimization of branch lengths
- Update less likelihood vectors -gt significantly
faster - Allows higher rearrangement settings -gt better
trees - Likelihood depends strongly on topology
- Fast exploration of large number of topologies
- Straight-forward parallelization
- Store best 20 trees from each rearrangement step
- Branch length optimization of best 20 trees only
- Experimental results justify this mechanism
23Idea 2Subsequent Application of Topological
Changes
24Idea 2Subsequent Application of Topological
Changes
ST3
25Idea 2Subsequent Application of Topological
Changes
ST3
ST3
26Idea 2Subsequent Application of Topological
Changes
ST2
ST1
ST3
ST3
ST6
ST4
ST5
ST2
ST1
ST3
ST3
ST6
ST4
ST5
27Why is Idea 2 useful?
- During inital 5-10 rearrengement steps many
improved topologies are encountered - Acceleration of likelihood improvment in initial
optimization phase - Enables fast optimization of random starting trees
28Remainder of this Talk
- Motivation
- Introduction to phylogenetic tree inference
- Statistical inference methods
- Maximum Likelihood associated problems
- Solutions
- 2 simple heuristics
- parallel distributed implementation
- Results
- Conclusion
- Availability Future Work
29Basic Parallel Distributed Algorithm
- Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml) - Simple Master-Worker architecture
- Subsequent application of topological changes
introduces non-determinism
ST2
ST1
ST3
ST6
ST4
ST5
30Basic Parallel Distributed Algorithm
- Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml) - Simple Master-Worker architecture
- Subsequent application of topological changes
introduces non-determinism
ST2
ST1
ST3
ST6
ST4
ST5
MPI_Send(ST3_ID, tree)
31Basic Parallel Distributed Algorithm
- Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml) - Simple Master-Worker architecture
- Subsequent application of topological changes
introduces non-determinism
ST2
ST1
MPI_Send(ST2_ID, tree)
ST3
ST6
ST4
ST5
MPI_Send(ST3_ID, tree)
32Differences between Parallel Distributed
Algorithm
- Parallel best tree list of max(20, workers)
maintained and merged at the master - Parallel Master distributes max(20, workers) as
toplogy-strings to workers for branch length
optimization - Distributed Each worker maintains local best
list of 20 trees - Distributed Worker performs fast branch length
optimizations locally on all 20 trees -gt returns
only best topology to the master
33Sequential Results
- 50 distinct simulated 100-taxon alignments
- Measured average execution times topological
distance (RF-rate) from true tree - PHYML 35.21 seconds, RF-rate 0.0796
- MrBayes 945.32 seconds, RF-rate 0.0741
- RAxML 29.27 seconds, RF-rate 0.0818
- 9 distinct real alignments containing 101-1000
taxa - Measured execution times final likelihood
values - RAxML yields best-known likelihood for all data
sets - RAxML faster than PHYML MrBayes
34Sequential Results Real Data
data PHYML secs MrBayes secs RAxML secs R gt PHY secs PAXML hrs
101_SC -74097.6 153 -77191.5 40527 -73919.3 617 31 -73975.9 47
150_SC -44298.1 158 -52028.4 49427 -44142.6 390 33 -44146.9 164
150_ARB -77219.7 313 -77196.7 29383 -77189.7 178 67 -77189.8 300
200_ARB -104826.5 477 -104856.4 156419 -104742.6 272 99 -104743.3 775
250_ARB -131560.3 787 -133238.3 158418 -131468.0 1067 249 -131469.0 1947
500_ARB -253354.2 2235 -263217.8 366496 -252499.4 26124 493 -252588.1 7372
1000_ARB -402215.0 16594 -459392.4 509148 -400925.3 50729 1893 -402282.1 9898
218_RDPII -157923.1 403 -158911.6 138453 -157526.0 6774 244 n/a n/a
500_ZILLA -22186.8 2400 -22259.0 96557 -21033.9 29916 67 n/a n/a
35Sequential Results Real Data
data PHYML secs MrBayes secs RAxML secs R gt PHY secs PAXML hrs
101_SC -74097.6 153 -77191.5 40527 -73919.3 617 31 -73975.9 47
150_SC -44298.1 158 -52028.4 49427 -44142.6 390 33 -44146.9 164
150_ARB -77219.7 313 -77196.7 29383 -77189.7 178 67 -77189.8 300
200_ARB -104826.5 477 -104856.4 156419 -104742.6 272 99 -104743.3 775
250_ARB -131560.3 787 -133238.3 158418 -131468.0 1067 249 -131469.0 1947
500_ARB -253354.2 2235 -263217.8 366496 -252499.4 26124 493 -252588.1 7372
1000_ARB -402215.0 16594 -459392.4 509148 -400925.3 50729 1893 -402282.1 9898
218_RDPII -157923.1 403 -158911.6 138453 -157526.0 6774 244 n/a n/a
500_ZILLA -22186.8 2400 -22259.0 96557 -21033.9 29916 67 n/a n/a
36Sequential Results Real Data
37Sequential Results Real Data
38Sequential Results Real Data
39Parallel Results Speedup 1000_ARB
40Distributed Results First Tests
- Platforms
- Infiniband-Cluster 10 Intel Xeon 2.4 GHz
- Sunhalle 50 Sun-Workstations for CS students
- Alignments
- 1000_ARB
- 2025_ARB
- Larger trees to come ..........
- Results
- Program executed correctly terminated
- RAxML_at_home yielded best-known tree for 2025_ARB
41Biological Results 1st ML 10.000-taxon tree
- Calculated 5 parsimony starting trees 3-4
initial rearrangement steps sequentially on Xeon
2.4GHz - Further rearrangements of those 5 trees in
parallel on 32 or 64 Xeon 2.66GHz at RRZE - Accumulated CPU hours/tree 3200hours
- Best ln likelihood -949539 worst -950026
- Problems
- Quality assessment? bootstrap not feasible
- Consense crashes for gt 5 trees
- MrBayes/PHYML crash on 32-bit/4GB
- MrBayes crashed on Itanium
- Visualization?
42(No Transcript)
43Conclusion
- RAxML not able to handle protein data
- RAxML not able to perform model parameter
optimization - BUT
- RAxML easy to parallelize/distribute
- Accurate fast for large trees
- Significantly lower memory requirements than
MrBayes/PHYML - Conclusion Imlement model parameter optimization
protein data in RAxML
44Availability Future Work
- Further development distribution of RAxML_at_home
- Big production runs with RAxML_at_home
- Survey ML supertrees vs. integral trees
- Alignment split-up methods for ML supertrees
- RAxML implementation on GPUs
- RAxML2 download, benchmark, code
wwwbode.in.tum.de/stamatak - RAxML_at_home development www.sourceforge.com/projec
ts/axml