Title: Molecular evidence for endosymbiosis
1Molecular evidence for endosymbiosis
- Perform blastp to investigate sequence similarity
among domains of life - Found yeast nuclear genes exhibit more sequence
similarity (closer in evolutionary time) with
archaeal genes - Found yeast mitochondrial genes exhibit more
sequence similarity with eubacterial genes
2t-test and significance
- t-test determines if the data come from the same
population or if there are significant
differences - Calculate the mean of data, standard deviation of
each data set, derive a weighted standard
deviation to be used in t-test - Compare to t-critical value obtained from t-table
or software
3Origins of eukaryotic cells
4Martin-Muller hypothesis
Martin and Muller hypothesis
5Evidence from phylogenetic relationships
6Leprae vs. tuberculosis
- Leprae (3.2Mb) is 50 coding, contrasted with
4.4 Mb and 91 coding for tuberculosis - Comparing genomes using Mummer
- http//www.tigr.org/tigr-scripts/CMR2/webmum/mumpl
ot
7How Mummer works
- Uses suffix trees to create an internal
representation of a genome sequence - Identify maximal unique matches (MUM) version
2.0 uses streaming whereas 1.0 adds sequence 2 to
suffix tree for sequence 1 - Alignment via Smith-Waterman
8Origin of species
- Mitochondrial DNA and human evolution
- Evolution of pathogens
9Phylogeny data mining by biologists
- Molecular phylogenetics is using clustering
techniques to discern relationships between
different biological sequences
10Why phylogenetics?
- Understand evolutionary history
- Map pathogen strain diversity for vaccines
- Assist in epidemiology (Dentist and HIV)
- Aid in prediction of function of novel genes
- Biodiversity
- Microbial ecology
11Changes can occur
12Observing differences in nucleotides
- The simplest measure of distance between two
sequences is to count the of sites where the
two sequences differ - If all sites are not equally likely to change,
the same site may undergo repeated substitutions - As time goes by, the number of differences
between two sequences becomes less and less an
accurate estimator of the actual number of
substitutions that have occurred
13The relationship between time and substitutions
is non-linear
14Various models have been generated to more
accurately estimate distance and evolution
- All use the following framework
Probability matrix pAC is the probability of a
site starting with an A had a C at the end of
time interval t, etc.
Base composition of sequence fa frequency of A
15Jukes-Cantor Model
- Distance between any two sequences is given by
d -3/4 ln(1-4/3p) - p is the proportion of nucleotides that are
different in the two sequences - All substitutions are equally probable
- Each position in matrix a except diagonal
1-Sa
16Kimuras two parameter model
- d ½ ln1/(1-2P-Q) ¼ ln1/1-2Q)
- P and Q are proportional differences between the
two sequences due to transitions and
transversions, respectively. - Accounts for transition bias in sequences
(transversions more rare)
17Evolutionary models
18Implementing models and building trees
19Rooted vs. unrooted
- Root ancestor of all taxa considered
- Unrooted relationship without consideration of
ancestry - Often specify root with outgroup
- Outgroup distantly related species (ie. mammals
and an archaeal species)
20Tree building
- Get protein/RNA/DNA sequences
- Construct multiple sequence alignment
- Compute pairwise distances (if necessary)
- Build tree topology and distances
- Estimate reliability
- Visualize
21Distance methods
22Unweighted pair-group method using arithmetic
averages (UPGMA)
- Assumes a constant rate of gene substitution,
evolution - Clustering algorithm that measures distances
between all sequences, merges the closest pair,
recalculates that node as an average, then merges
the next closest pair, re-iterate - Usually gives a rooted tree
23Testing the reliability of trees
- Interior branch test or Bootstrap analysis
- Bootstrap analysis subsequences or sequence
deletion or replacement re-draw trees how many
times do you get some branching? Bootstrap
values of 70 (95) or greater are normally
considered reliable
24Homework due on 10/6
- Discovery questions in Chapter 2
- 4, 25-27