Phylogenetic Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Phylogenetic Analysis

Description:

Phylogenetics Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly ... – PowerPoint PPT presentation

Number of Views:541
Avg rating:3.0/5.0
Slides: 86
Provided by: user230
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Analysis


1
Phylogenetic Analysis
YTSLLLSRQ-
YASLLW-RQA
PASIILSRQA
GRSIVLTRQM
2
Phylogenetics
What do I need to do?
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
3
Phylogenetics
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
4
So you have a sequencenow what?
MKILLLCIIFLYYVNAFKNTQKDGVSLQILKKKRSNQVNFLNRKNDYNLI
KNKNPSSSLKSTFDDIKKIISKQLSVEEDKIQMNSNFTKDLGADSLDLVE
LIMALEEKFNVTISDQDALKINTVQDAIDYIEKNNKQ
5
1 What is it?
Does source organism have its own genome
database?
Unknown/No
Yes
BLAST_at_ genome database(GeneDB, PlasmoDB, etc.)
BLAST_at_ Pubmed
6
Why start with genome-specific database?
Genome location/structure
Strain variability
BLAST
Expression data
Pathway data
7
PubMed BLAST
8
Blastp
PubMed BLAST
9
Protein families Conserved Domains
10
(No Transcript)
11
BLAST Hits
12
Downloading sequences FASTA format
13
Getting sequences FASTA format
14
Saving and editing FASTA files
15
Phylogenetics
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
16
Pair-wise sequence alignment
Smith-Waterman
17
Aligning 2 sequences globally










-4
-8
-12
-16
-20
-24
-28
-32
-36
-8
-12
-16
-20
-24
-28
-32
-36
-4
4
-4
2
-12
-16
-20
-24
-28
-32
-36
-8
-12
-4
-8
10
-16
-20
-24
-28
-32
-36
-4
-8
-12
14
-20
-24
-28
-32
-36
-16
-20
-4
-8
-12
-16
18
14
10
-32
-36
-19
-8
-12
-16
-20
14
10
6
-36
-24
-28
-4
-20
-12
-16
-20
-24
-28
15
11
-25
-29
-24
-16
-20
-24
-28
-32
20
-32
16
-36
-26
-25
-34
-25
-35
-28
-28
-32
18
Multiple sequence alignment
Progressive
Align 2 closest sequences
Add in next closest sequence
Continue adding.
Hyper dependent on initial matches.
19
Multiple sequence alignment
Iterative
Initial MSA Score (low)
Optimize MSA score
Probabilistic methods dont always generate the
same answer
20
Multiple sequence alignment programs
Pair-wise alignment type
Global
Local
ClustalX T-Coffee
progressive
POA
MSA Alignment type
HMMs GAs
Dialign
iterative
21
Multiple Sequence Alignments
POAVIZ progressive local
CLUSTAL progressive global
22
Multiple Sequence Alignments
POAVIZ progressive local
CLUSTAL progressive global
23
POAVIZ
24
POAVIZ
25
POAVIZ
26
Multiple Sequence Alignments
POAVIZ progressive local
CLUSTAL progressive global
27
CLUSTALX
Parameters
28
CLUSTALX
29
CLUSTALX Protein Weight Matrices
  • 1) BLOSUM (Henikoff). These matrices appear to be
    the best available for carrying out data base
    similarity (homology searches).
  • 2) PAM (Dayhoff). These have been extremely
    widely used since the late '70s.
  • 3) GONNET. These matrices were derived using
    almost the same procedure as the Dayhoff one
    (above) but are much more up to date and are
    based on a far larger dataset.

30
BLOSUM (BLOck SUbstitution Matrix)
BLOSUM62 Gather proteins with at least 62
identity to obtain actual substitution rates for
these proteins
Pros Best bet for distantly divergent
sequences
31
PAM (point accepted mutation)
Gather the substitution rates for PAM1 (99
identical sequences) Assuming that those
substitution rates are consistent over time
( Point mutations / 100 amino acids)
Pros Very good for closely related
sequences Cons Rare mutations
under-represented Substitution rates not
constant over time (both are problems for
phylogenetic estimation)
32
CLUSTALX
33
CLUSTALX - Aligning
34
CLUSTALX - Aligning
35
CLUSTALX Alignment view
36
CLUSTAL vs POAVIZ
(global vs local)
POAVIZ
CLUSTAL
37
Phylogenetics
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
38
BioEdit Alignment manipulation
Open the .aln file
39
BioEdit Alignment manipulation
Back colored view gives more contrast
Select Edit from the mode dropdown
40
BioEdit Alignment manipulation
Select Insert so that you dont accidentally
lose part of your sequence
Then select the unaligned beginning (or end)
sequence and delete it.
41
BioEdit Alignment manipulation
Now save as a different file .fasta
42
Phylogenetics
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
43
Tree terminology
root
outgroup
common ancestor (node, branch point)
lineage
(branch, edge)
branch length
B
C
D
E
F
G
A
Operational taxonomic units (OTUs, leaves)
44
monophyletic
paraphyletic
polyphyletic
45
Sequence homology orthologues and paralogues
Ancestral gene
duplication
A
B
Last common ancestor
speciation
Human A
Human B
Rat A
Rat B
orthologues
orthologues
paralogues
orthologues
paralogues
46
Methods of estimating phylogenetic relationships
Character-based Maximum Parsimony
(MP)Distance-based Neighbor-Joining
(NJ) Minimum Evolution (ME)Probabilistic Maxim
um likelihood (ML) Bayesian inference
47
Methods of estimating phylogenetic relationships
Maximum Parsimony (MP)
48
Methods of estimating phylogenetic relationships
Distance-based
Neighbor-Joining (NJ) MethodThe NJ method
involves clustering of neighbor species that are
joined by one node. It does not evaluate all the
possible tree topologies. Not guaranteed to
obtain the optimal tree Minimum
Evolution (ME) MethodEstimates the total branch
length of each topology exhaustively, then
chooses the topology with the least total branch
length. Time intensive for large numbers of
taxa.
49
Methods of estimating phylogenetic relationships
Probabilistic methods Maximum likelihood (ML)
Prob ( data model tree )
More likely topology found
Search all possible topologies to optimize
probability
50
Bayesian inference
Prior information
Model for selection
need both for everyone in the class
51
Methods of estimating phylogenetic relationships
Character Maximum Parsimony (MP)Distance Neigh
bor-Joining (NJ) Minimum Evolution
(ME)Probabilistic Maximum likelihood
(ML) Bayesian inference
52
Estimating Phylogenetic Relationships
MEGA
MrBayes
53
Estimating Phylogenetic Relationships
MEGA
MrBayes
54
MEGA Molecular Evolutionary Genetic Analysis
First we have to get a MEGA formatted file made
Select All Files from the dropdown Files
of Type menuThen choose the .aln file you
just made
55
MEGA making a MEGA formatted file
MEGA recognizes that you didnt enter a MEGA
formatted file Click OK
Now click on the Convert to MEGA format button
at the top left hand side of the screen
56
MEGA making a MEGA formatted file
Make sure that the file is the right one and that
the formatting is correct. Click OK.
Now we have to make sure that the file looks good
before starting any analysis
57
MEGA making a MEGA formatted file
  • Make sure all sequences are the same
    length-Remove all traces of the consensus
    marks

When the file looks good, save it and close both
text formatter windowsNow try Activating the
data file again, this time with the .meg file
you just made
58
MEGA input a MEGA formatted file
Make sure that the correct sequence type is
selectedMake sure that the correct characters
are selected for missing data and gaps.
59
MEGA input a MEGA formatted file
You should now see the sequence data
explorerMinimize this window and you can begin
analyzing your data
60
MEGA choose an algorithm
From the phylogeny window you can choose an
appropriate algorithm.In this case well use
Minimum Evolution.
61
MEGA set parameters
There are two major things to think about first
Model and Rates among SitesIn this example,
Ill use the Poisson model with gamma (y2.0)
rate variation
62
Identity
Substitution rates
Equal
Base frequencies
Variable
Equal
Transition and/or transversion frequencies
Variable
Symmetrical substitution (G-gtA A-gtG)
Kimura 2-parameter B(E), si(V),
sv(V) Tamura-Nei B(V), si(V), sv(E) Kimura
3-parameter B(V), si(E), sv(V) General Time
Reversible B(V), Sym
Rate variation across sites
Gamma ( G )distribution of rate variation among
sites Proportion of Invariable Sites ( I )
G I GTR
Substitution models (nucleic acid)
63
Sophistication
Each site can choose its own substitution model,
and coupled with maximum likelihood probability
estimations or MCMC/Bayesian methods
Mixture models
High dimensional model but requires large dataset
Site specific residue frequencies
probabilistic substitution rates
Poisson
mtREV
extrapolation of observed substitution rates
JTT
PAM
Identity
No model
Substitution models (amino acid)
64
MEGA set parameters
There are two major things to think about first
Model and Rates among SitesIn this example,
Ill use the Poisson model with gamma (y2.0)
rate variation
65
MEGA choose tree test options
Now switch over to the Test of Phylogeny
tab..In order to determine the validity of your
tree youll need to bootstrap it. Since our
sequence isnt very long, only a couple hundred
replications are needed.Now click the check
button, then click Compute in the main window
66
MEGA edit your tree
Your tree should appear. Not a very good one in
this case. Why? Because the sequences were too
identical.The icons on the left allow you to
reroot, flip branches, etc.You can also change
the format of the treeBut lets also compute a
condensed tree(Select that from the Compute
menu)using a cutoff of 50..
67
MEGA interpret the tree
Four of the sequences cluster indistinguishably
together, while a single other sequence stands
out. If we look back at our alignments we could
predict this
68
Estimating Phylogenetic Relationships
MEGA
MrBayes
69
MrBayes Making a NEXUS (.nex) file
70
MrBayes Making a NEXUS (.nex) file
71
MrBayes Running MrBayes
72
MrBayes Running MrBayes
73
MrBayes Running MrBayes
74
MrBayes Running MrBayes
75
MrBayes Running MrBayes
76
MrBayes Running MrBayes
77
Phylogenetics
Get related sequences of interest
Perform multiple sequence alignments
Edit alignment
Estimate phylogenetic relationships
Interpret results correctly
78
Phylogenetics
Interpret results correctly
Quality of aligned sequences
One bad egg
Sequence similarity (think goldilocks)
Use an appropriate model
Use an appropriate estimation method
Use appropriate parameters
Try different things and compare results wisely
Determine the validity of each part of your tree
Develop a model to explain your tree
how does it square with known information? what
can you learn from your sequences? what cant
you learn from your analysis?
79
The Intelligent Consumer(You dont have to
completely understand everything in order to use
it properly, but it helps to have a rough idea)
BLAST - stochastic processes - random
walksSequence alignments - Markov processes -
dynamic programming - Viterbi, Forward, and
Backward algorithmsBayesian phylogenetic
inference - Bayes theorem - Bayesian
inference - Metropolis algorithm
80
Many uses for multiple sequence analysis
81
Protein family analysis
multiple sequence alignment
profile
profileHMM (hidden Markov model)
2
1
2
1
2
1
Find new proteins with same domains
82
RNA secondary structure prediction
83
Protein secondary structure prediction
84
Protein structure prediction homology modeling
Protein sequence with known structure
Aligned sequences with unknown structure
85
Comparative genomics
Write a Comment
User Comments (0)
About PowerShow.com