Title: Untangling Molecular Evolution
 1Untangling Molecular Evolution
- Andrew Meade 
- A.Meade_at_Reading.ac.uk
2Molecular Data
- Human Genome 
- Finished 2003 
- 13 years, finished 2 year ahead of schedule. 
- 3 billion, cost 2.7 
- 483 completely sequenced genomes (2006) 
- X Prize 100 human genomes 10 days 10 million. 
3Molecular Data  Pancreatic Ribonuclease 
 4What is a phylogeny?
- Representation of evolution 
- Inferred from data via a model. 
- Data is normally a genetic element (such as a 
 gene), taken from a number of species.
- Allows us to infer the past processes of 
 evolution without observing it.
5(No Transcript) 
 6Uses of phylogeny
- Spread of diseases, H5N1, HIV. 
- Protein-Protein Interaction 
- Predicting changes in Protein structure. 
- Information about molecular evolution 
7Human Influenza (Flu) Virus
1997
10
1984 
 8(No Transcript) 
 9The Chicken And The Egg
80 million years
Amniotic Egg 330 million years 
 10The true tree is unknown
-  Data is only available for living species 
-  Evolution has been going on for a long time (4 
 billion years)
-  
-  Evolution is very complex
11There are lots of trees
Number of Possible Phylogenetic Trees
 Species Number of Trees
Species 50 275292135328356515452597297515244306393
00973035816196098326553772152587890625 
 12MCMC
- Sample of trees used. 
- Trees are sampled in proportion to there 
 probability.
- Not looking for the best / most probable tree. 
13Where
Is the probability of the sequence given Treei
Is a vector of branch lengths
Is a vector of parameters lengths
Is the prior probability of t
Is the prior probability of m 
 14MCMC properties 
- Guaranteed to sample all trees in the search 
 space.
Only as time goes to 8 
Guaranteed to sample trees in proportion to there 
probability.
Only at convergence 
 15MCMC Sampling 
 16Iteration 
Convergence Sampling from the stationary 
distribution 
 Log Likelihood 
Burn-in 
 17Postior distribution of likelihoods  
 18(No Transcript) 
 19Computational Time 
 20Parallel algorithm
Node 1
Node 3
Node 2 
 21Algorithm Scaling
1 Processor  130 Days 60 Processors  4 Days 
 22Estimating dinosaur genome properties
In Genome size (pg)
ln Osteocyte cell size (µm3) 
 23(No Transcript) 
 24The effect of speciation on molecular evolution
each speciation event makes some contribution to 
path length
path length accumulates as a function of time 
 25How many data sets show evidence of a 
punctuational effect?
35 of the 100 data sets showed significant 
punctuational effects
significantly more common in plants and fungi 
than animals
10,000 molecule studied 
 26(No Transcript) 
 27Protein Networks 
Genes in the human genome
1999 100,000
2002 65,000  75,000 
2007 20,000  25,00 19,599 protein-coding genes 
confirmed 
 28Eukaryote protein-interaction network
animals
yeast protein-interaction network (MIPS)
fungal pathogens
yeast 
 29Changes in Gene networks
yeast
fungal pathogens
animals
retained link
acquired link 
 30Areas of computer science interest
-  Search / Optimisation 
-  
-  Distributed computation / parallelisation 
-  Visualisation / user interfaces 
-  Data mining
31Acknowledgments
- Mark Pagel, Chris Venditti and Daniel Barker - 
 Computation Biology
- Vassil Alexandrov, Christian Weihrauch and Ashish 
 Thandavan - ACET
- Chris Organ, Andrew Shedlock, Scott Edwards - 
 Harvard University
32Convergence of a Markov chainsampling 
phylogenetic tree of n500 tips using 
an alignment of n4400 nucleotides
log-likelihood
Iteration number
NB 99 of increase in likelihood in first 2.8 
of run. 0.07 change in final 2 million 
iterations