Parallel - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Parallel

Description:

Parallel & Distributed Systems and Algorithms for Inference of Large ... Guidon et al (May 2003) PHYML very fast & accurate ML program for real ... – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 45

Provided by: gnther1

Category:

more less

Transcript and Presenter's Notes

Title: Parallel

1
Parallel Distributed Systems and Algorithms for
Inference of Large Phylogenetic Trees with
Maximum Likelihood

Alexandros Stamatakis
LRR TU München
Contact stamatak_at_cs.tum.edu

2
Outline

Motivation
Introduction to phylogenetic tree inference
Statistical inference methods
Maximum Likelihood associated problems
Solutions
2 simple heuristics
parallel distributed implementation
Results
Conclusion
Availability Future Work

3
Motivation Towards a Tree of Life

30.000 organisms available, current trees lt 1000

Where we are
4
Motivation Towards a Tree of Life

30.000 organisms available, current trees lt 1000

Where we want to get
5
Phylogenetic Tree Inference

Input good multiple alignment of a
distinguished, highly conserved part of DNA
sequences
Output unrooted binary tree with the sequences
at its leaves (all nodes either degree 1 or 3)
Various methods for phylogenetic tree inference
Differ in computational complexity and quality of
trees
Most accurate methods Maximum Likelihood Method
(ML) and Bayesian Phylogenetic Inference
most sound and flexible methods
other methods not suited for
large/complex trees
-- most computationally intensive methods

6
ML and Bayesian methods

T.Williams et al (March 2003) comparative
analysis with simulated data shows MrBayes is
best program
Guidon et al (May 2003) PHYML very fast
accurate ML program for real simulated data
faster than MrBayes
ML (PHYML, RAxML2)
Significantly faster than MrBayes
Reference/starting trees for bayesian methods
-- Less powerful statistical model
Bayesian Inference (MrBayes)
Powerful statistical model
-- MCMC convergence problem
Memory requirements for 1000/10000-taxon
alignment
RAxML 200MB/750MB
PHYML 900MB/8.8GB
MrBayes 1150MB/unknown

7
MCMC Convergence Problem
8
What does ML compute?

Maximum Likelihood calculates
Topologies
Branch lengths vi
Likelihood of the tree

S1
v1
S3
v5
S4
v3
v7
v4
v2
S2
v6
S5
Goal Find tree topology wich maximizes
likelihood Problem I Number of possible
topologies is exponential in n Problem II
Computation of likelihood value branch length
optimization is expensive Solution
Algorithmic Optimizations (previous work) New
heuristics HPC
9
New Heuristics for RAxML

Two common methods to build a tree
Progressive addition of organisms e.g. stepwise
addition algorithm
Use a (random, simple) starting tree containing
all organisms and optimize likelihood by
application of topological changes
RAxML (Randomized Axelerated Maximum Likelihood)
computes parsimony starting tree with dnapars
-gt fast and relatively good initial likelihood
dnapars uses stepwise addition -gt randomized
sequence input order to obtain distinct starting
trees
Optimize starting tree by application of
rearrangements
Accelerate rearrangements by two simple ideas

10
Subtree Rearrangements
11
Subtree Rearrangements
ST2
ST1
ST3
ST6
ST4
ST5
12
Subtree Rearrangements
1
ST2
ST1
ST3
ST6
ST4
ST5
13
Subtree Rearrangements
1
ST2
ST1
ST3
ST6
ST4
ST5
14
Subtree Rearrangements
1
ST6
ST2
ST1
ST3
ST4
ST5
15
Subtree Rearrangements
1
ST6
ST2
ST1
ST3
ST4
ST5
16
Subtree Rearrangements
2
ST2
ST1
ST3
ST4
ST5
ST6
17
Subtree Rearrangements
2
ST2
ST1
ST3
ST4
ST5
ST6
18
Subtree Rearrangements
ST2
ST1
Optimize all branches
ST3
ST4
ST5
ST6
19
Subtree Rearrangements
ST2
ST1
Need to optimize all branches ?
ST3
ST4
ST5
ST6
20
Idea 1 Local Optimization of Branch Length
ST2
ST1
ST3
ST6
ST4
ST5
21
Idea 1 Local Optimization of Branch Length
ST2
ST1
ST3
ST6
ST4
ST5
22
Why is Idea 1 useful?

Local optimization of branch lengths
Update less likelihood vectors -gt significantly
faster
Allows higher rearrangement settings -gt better
trees
Likelihood depends strongly on topology
Fast exploration of large number of topologies
Straight-forward parallelization
Store best 20 trees from each rearrangement step
Branch length optimization of best 20 trees only
Experimental results justify this mechanism

23
Idea 2Subsequent Application of Topological
Changes
24
Idea 2Subsequent Application of Topological
Changes
ST3
25
Idea 2Subsequent Application of Topological
Changes
ST3
ST3
26
Idea 2Subsequent Application of Topological
Changes
ST2
ST1
ST3
ST3
ST6
ST4
ST5
ST2
ST1
ST3
ST3
ST6
ST4
ST5
27
Why is Idea 2 useful?

During inital 5-10 rearrengement steps many
improved topologies are encountered
Acceleration of likelihood improvment in initial
optimization phase
Enables fast optimization of random starting trees

28
Remainder of this Talk

Motivation
Introduction to phylogenetic tree inference
Statistical inference methods
Maximum Likelihood associated problems
Solutions
2 simple heuristics
parallel distributed implementation
Results
Conclusion
Availability Future Work

29
Basic Parallel Distributed Algorithm

Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml)
Simple Master-Worker architecture
Subsequent application of topological changes
introduces non-determinism

ST2
ST1
ST3
ST6
ST4
ST5
30
Basic Parallel Distributed Algorithm

Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml)
Simple Master-Worker architecture
Subsequent application of topological changes
introduces non-determinism

ST2
ST1
ST3
ST6
ST4
ST5
MPI_Send(ST3_ID, tree)
31
Basic Parallel Distributed Algorithm

Basic idea Distribute work by subtrees instead
of topologies (e.g. parallel fastDNAml)
Simple Master-Worker architecture
Subsequent application of topological changes
introduces non-determinism

ST2
ST1
MPI_Send(ST2_ID, tree)
ST3
ST6
ST4
ST5
MPI_Send(ST3_ID, tree)
32
Differences between Parallel Distributed
Algorithm

Parallel best tree list of max(20, workers)
maintained and merged at the master
Parallel Master distributes max(20, workers) as
toplogy-strings to workers for branch length
optimization
Distributed Each worker maintains local best
list of 20 trees
Distributed Worker performs fast branch length
optimizations locally on all 20 trees -gt returns
only best topology to the master

33
Sequential Results

50 distinct simulated 100-taxon alignments
Measured average execution times topological
distance (RF-rate) from true tree
PHYML 35.21 seconds, RF-rate 0.0796
MrBayes 945.32 seconds, RF-rate 0.0741
RAxML 29.27 seconds, RF-rate 0.0818
9 distinct real alignments containing 101-1000
taxa
Measured execution times final likelihood
values
RAxML yields best-known likelihood for all data
sets
RAxML faster than PHYML MrBayes

34
Sequential Results Real Data
data PHYML secs MrBayes secs RAxML secs R gt PHY secs PAXML hrs
101_SC -74097.6 153 -77191.5 40527 -73919.3 617 31 -73975.9 47
150_SC -44298.1 158 -52028.4 49427 -44142.6 390 33 -44146.9 164
150_ARB -77219.7 313 -77196.7 29383 -77189.7 178 67 -77189.8 300
200_ARB -104826.5 477 -104856.4 156419 -104742.6 272 99 -104743.3 775
250_ARB -131560.3 787 -133238.3 158418 -131468.0 1067 249 -131469.0 1947
500_ARB -253354.2 2235 -263217.8 366496 -252499.4 26124 493 -252588.1 7372
1000_ARB -402215.0 16594 -459392.4 509148 -400925.3 50729 1893 -402282.1 9898
218_RDPII -157923.1 403 -158911.6 138453 -157526.0 6774 244 n/a n/a
500_ZILLA -22186.8 2400 -22259.0 96557 -21033.9 29916 67 n/a n/a
35
Sequential Results Real Data
data PHYML secs MrBayes secs RAxML secs R gt PHY secs PAXML hrs
101_SC -74097.6 153 -77191.5 40527 -73919.3 617 31 -73975.9 47
150_SC -44298.1 158 -52028.4 49427 -44142.6 390 33 -44146.9 164
150_ARB -77219.7 313 -77196.7 29383 -77189.7 178 67 -77189.8 300
200_ARB -104826.5 477 -104856.4 156419 -104742.6 272 99 -104743.3 775
250_ARB -131560.3 787 -133238.3 158418 -131468.0 1067 249 -131469.0 1947
500_ARB -253354.2 2235 -263217.8 366496 -252499.4 26124 493 -252588.1 7372
1000_ARB -402215.0 16594 -459392.4 509148 -400925.3 50729 1893 -402282.1 9898
218_RDPII -157923.1 403 -158911.6 138453 -157526.0 6774 244 n/a n/a
500_ZILLA -22186.8 2400 -22259.0 96557 -21033.9 29916 67 n/a n/a
36
Sequential Results Real Data
37
Sequential Results Real Data
38
Sequential Results Real Data
39
Parallel Results Speedup 1000_ARB
40
Distributed Results First Tests

Platforms
Infiniband-Cluster 10 Intel Xeon 2.4 GHz
Sunhalle 50 Sun-Workstations for CS students
Alignments
1000_ARB
2025_ARB
Larger trees to come ..........
Results
Program executed correctly terminated
RAxML_at_home yielded best-known tree for 2025_ARB

41
Biological Results 1st ML 10.000-taxon tree

Calculated 5 parsimony starting trees 3-4
initial rearrangement steps sequentially on Xeon
2.4GHz
Further rearrangements of those 5 trees in
parallel on 32 or 64 Xeon 2.66GHz at RRZE
Accumulated CPU hours/tree 3200hours
Best ln likelihood -949539 worst -950026
Problems
Quality assessment? bootstrap not feasible
Consense crashes for gt 5 trees
MrBayes/PHYML crash on 32-bit/4GB
MrBayes crashed on Itanium
Visualization?