Title: Multiple Sequence Alignment and Molecular Evolution
1Multiple Sequence Alignment and Molecular
Evolution
2Quiz
- What do
- BLAST
- FASTA
- Protein motif searching
- Multiple Sequence alignment
- Have in common?
3Key Concepts
- Appreciate the foundation of sequence alignment
evolutionary theory - Appreciate the significance of unique and
non-unique sequence characters - Realize potential uses of multiple sequence
alignments - Understand the basics of the most popular
multiple sequence alignment method - Appreciate importance of accurate multiple
alignments and the need for manual editing in
some cases
418th and 19th centuries The evolution of a theory
- Earth erosion, sediment deposition, strata
present earth conditions provide keys to the past
518th and 19th centuries The evolution of a theory
- Discoveries of fossils accumulated
- Remains of unknown but still living species that
are elsewhere on the planet? - Cuvier (circa 1800) the deeper the strata, the
less similar fossils were to existing species
6- Discoveries of fossils accumulated
- Remains of unknown but still living species that
are elsewhere on the planet? - Cuvier (circa 1800) the deeper the strata, the
less similar fossils were to existing species
7(No Transcript)
8Part of Darwins Theory
- The world is not constant, but changing
- All organisms are derived from common ancestors
by a process of branching.
9Part of Darwins Theory
- This explained
- Fossil record
- Similarities of organisms classified together
(shared traits inherited from common ancestor) - Similar species in the same geographic region
10- What is evolution?
- Dynamic changes with selected pressure
- Punctuated equilibrium
- Progressive generational adaptation
- Environmentally imposed mutational success
- E (mutationselective pressure) / time
- Staying fit
11Characters
- Heritable changes in features (morphology,
DNA sequence etc) - The more similar characters you have, the more
related you are - However.. characters can be unique and non-unique
12Evolution and characters
time
13A Unique Character Hair for Mammals
- Hair evolved only once and is unreversed
- Presence of hair ? strong indication that
organism is a mammal
14Homoplasy The formation of tails
- Tails evolved independently in the ancestors of
frogs and humans - Presence of a tail ? no useful conclusions
15Unique and non-unique characters
Non-unique Unique
bioinformatics bioinfortatics bioinfortatios
oinformatios informatios infortation
information
time
16Unique and non-unique characters
- Example Sequence analysis of functionally
similar transporters - All share the same deleted sequence region, which
is not found in any other transporter examined to
date - Unique character?
- Further investigate for possible functional
significance, or use for classification
17Unique and non-unique characters
- Example Sequence analysis of functionally
similar transporters - All have isoleucine at the third position in the
sequence, however some other transporters have
isoleucine there too, while some other
transporters have valine at that position - Non-unique.
- Changes from I ? V ? I are common (see BLOSUM or
PAM matrices). Not a high priority for further
analysis of significance and not useful for
classification.
18Classification according to characters more
characters can be good
Chicken most similar to Tofu?
19Classification according to characters
20Classification according to characters
increasing the number of characters
Chicken most similar to Duck?
21Multiple Sequence Alignment The power of many
many characters
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWESNG--
22Evolution and characters the importance of
comparing characters with common origins
(homologous)
bioinformatics bioinformatics bioinformatios oinfo
rmatios informatios information information
time
23Evolution and characters
- Gaps represent non-homologous positions in the
sequence. - They reflect the occurrence of insertions/deletion
s or other rearrangements during the evolutionary
process.
bioinformatics bioinformatics bioinformatios --oin
formatios ---informatios ---information ---informa
tion
time
24Multiple Sequence Alignment
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWESNG--
The sole purpose of multiple sequence alignments
is to place homologous positions of homologous
sequences into the same column.
25On further with Multiple Sequence
AlignmentQuestions?
26Multiple Sequence Alignment - uses
- Powerful tool
- Detect trends/patterns in homologous sequences
(motifs, domains, indels) - Indels (insertions and deletions) of evolutionary
interest, yet not incorporated into some
phylogenetic tree algorithms - - ATTYNETCITRTQ -
- - SITYNETCVTITQ -
- - SVTY-----CIVR -
27- Multiple sequence alignments and phylogenetic
analysis -
- First step in any phylogenetic analysis
- Phylogenetic analysis only as good as the
alignment - in ?
out!
28- Multiple alignments not just sequence
- insertions and deletions in sequences
29- Automated Analysis or Manual Intervention?
- Â
- Automated more explicit or objective than manual
- Leads to false sense of security
- Aligns residues that are likely similar only by
chance - ILPITSPSKEGYESGKAPDEFSSGG
- ILPEH--IKDDGELGAAPHSFSTAG
- VLPLD-----S--AGRPADSFSAAG
- VLPVDR-------DGQARDEYTKVG
- VLPVDN-------KGEARDEYTKVG
- LLPYDD-------QGRPQDDYSRAG
- GIVSRSG---SNFDGEPKDSYGKVG
30- Clustal
- Thompson, J.D., Higgins, D.G. and Gibson,
T.J. (1994) - CLUSTAL W improving the sensitivity of
progressive multiple sequence alignment through
sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic
Acids Research, 224673-4680.
31- Clustal Incorporation of phylogenetic criterion
into multiple sequence alignment algorithms - 1. Pairwise alignments calculate a distance
matrix - 2. Guide tree constructed
- 3. Sequences progressively aligned according to
guide tree hierarchy
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Clustal Incorporating Biology into Sequence
Alignment Algorithms
- Matrices varied at different alignment stages
according to the divergence of the sequences - For proteins, gap penalties differ for
hydrophilic (water-loving) sequence regions to
encourage new gaps in potential loop regions on
the protein surface (which is usually exposed to
water) - Â
- Gapped positions in early alignments have reduced
gap penalties to encourage the opening up of new
gaps at these positions - (gaps not penalized as much at the end of
proteins) - gh
36ClustalX
- Subset of sequences in alignment can be selected
and realigned. Useful when trying to align very
divergent sequences. - A range of the sequence alignment can be selected
for realignment. Guide tree built based only on
the residue range selected.
37Differences between Clustal and BLAST?Clustal
has full length (global) alignmentGap penalty
differencesInput differences selection of
sequencesClustal vary gap penalties and
matricesmany to many Speed Clustal
slowerAlign pro-pro or nuc-nucSimilarities?Iden
tifies conserved domainsUse the same matrices
38Algorithms in Molecular Biology http//www.math.t
au.ac.il/rshamir/algmb/00/algmb00.html
39ClustalX features
- 'Alignment Quality Score' below the alignment.
40MACAW - a program for semi-manual local multiple
alignment of DNA and protein sequences.
- User delimits the sequences and regions in which
to search for blocks or specify blocks - Decides which to keep and significance of each
block is given statistical value.
41Genedoc - for editing and flexible display of
alignments
- view your alignment with different forms of
shading that you customize - edit your alignment (add or remove gaps) or the
sequence order or the sequences themselves - print directly, or export a graphic of your
alignment
42Genedoc - for editing and flexible display of
alignments
43- Statistics Report
- 1 residues identical
- 2 residues gt zero score (similar residues)
- 3 residues lined up with a gap
- human rat rabbit turtle
- human 1870 97 96 22
- 0 98 96 28
- 0 0 2 61
- rat 1830 1874 94 22
- 1846 0 95 28
- 18 0 2 61
- rabbit 1818 1793 1863 22
- 1828 1815 0 28
- 45 53 0 61
44Standard multiple sequence alignment approach
- Be as sure as possible that the sequences
included are homologous - Know as much as possible about the gene/protein
in question before trying to create an alignment
(secondary structure, domains etc..) - Start with an automated alignment preferably one
that utilizes some evolutionary theory such as
Clustal
45- Examine alignment
- Are you confident that aligned residues/bases
evolved from a common ancestor? - Are domains of the proteins/predicted secondary
structures, etc. aligning correctly? - ? No? May need to edit sequences and redo
- _______________________________
- _________________ ___ __ ____ _
- ? Yes? Move on!
- Note indels (insertions and deletions)
- Possible insights into functionally important
regions
46- Use in subsequent analyses (identify consensus
or other pattern recognition, for HMM
construction, phylogenetic analysis, etc..) - For phylogenetic analysis Remove unreliably
aligned regions - ILPITSPSKEGYESGKAPDEFSSGG
- ILPEH--IKDDGELGAAPHSFSTAG
- VLPLD-----S--AGRPADSFSAAG
- VLPVDR-------DGQARDEYT-VG
- VLPVDN-------KGEARDEYT-VG
- LLPYDD-------QGRPQDDYSRAG
- GIVSRSG---SNFDGEPKDSYGKVG
Delete?
47- If aligning DNA sequence for phylogenetic
analysis may remove every third codon position
MMET GLY SER GLYMET GLY SER GLY MET ARG
CYS ARG AATG GGA AGT GGA ATG GGG AGC GGGATG
AGG TGC AGG
48Key Concepts
- Appreciate the foundation of sequence alignment
evolutionary theory - Appreciate the significance of unique and
non-unique sequence characters - Realize potential uses of multiple sequence
alignments - Understand the basics of the most popular
multiple sequence alignment method - Appreciate importance of accurate multiple
alignments and the need for manual editing in
some cases
49 M