Title: An%20Introduction%20to%20Multiple%20Sequence%20Alignments
1An Introduction toMultiple Sequence Alignments
Cédric Notredame
2(No Transcript)
3Manguel M, Samaniego F.J., Abraham Walds Work
on Aircraft Suvivability, J. American
Statistical Association. 79, 259-270, (1984)
4Our Scope
How Can I Use My Alignment?
How Does The Computer Align The Sequences?
How Can I Assemble a Mult. Aln?
What are the Difficulties?
5Outline
-Why Do We Need Multiple Sequence Alignment ?
-The progressive Alignment Algorithm
-A possible Strategy
-Potential Difficulties
6Pre-requisite
-How Do Sequences Evolve?
-How can We COMPARE Sequences ?
-How can We ALIGN Sequences ?
7Why Do We Need Multiple Sequence Alignment ?
8Sometimes Two Sequences Are Not Enough
9What is A Multiple Sequence Alignment?
10(No Transcript)
11(No Transcript)
12How Can I Use A Multiple Sequence Alignment?
BUT Conserved where it MATTERS
13(No Transcript)
14How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
Prosite Patterns
15How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
P-K-R-PA-x(1)-ST
Prosite Patterns
16How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
Prosite Patterns
SwissProt
Uncharacterised Signature
Match?
17How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
Prosite Patterns
Profiles And HMMs
-More Sensitive -More Specific
18A PROSITE PROFILE
19How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
chite
wheat
Motifs/Patterns
trybr
mouse
Profiles
-Evolution -Paralogy/Orthology
Phylogeny
20How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
Motifs/Patterns
Profiles
Phylogeny
Struc. Prediction
21How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
Extrapolation
PsiPred OR PhD For secondary Structure
Prediction 75 Accurate.
Motifs/Patterns
Profiles
Threading is improving but is not yet as good.
Phylogeny
Struc. Prediction
22How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
23(No Transcript)
24Why Is It Difficult To Compute A multiple
Sequence Alignment?
A CROSSROAD PROBLEM
25Why Is It Difficult To Compute A multiple
Sequence Alignment ?
BIOLOGY
COMPUTATION
CIRCULAR PROBLEM....
Good
Good
Alignment
Sequences
26The Biological Problem.
Same as PairWise Alignment Problem
We do NOT know how Sequences Evolve.
We do NOT understand the Relation Between
Structures and Sequences.
We would NOT recognize the Correct Alignment if
we had it IN FRONT of our eyes
27The Biological Problem. The Charlie Chaplin
Paradox
28The Biological Problem. How to Evaluate an
Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
29The COMPUTATIONAL Problem. Producing the Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
-An Alignment Algorithm
30HOW CAN I ALIGN MANY SEQUENCES
2 Globins gt1 Min
31HOW CAN I ALIGN MANY SEQUENCES
3 Globins gt2 hours
32HOW CAN I ALIGN MANY SEQUENCES
4 Globins gt 10 days
33HOW CAN I ALIGN MANY SEQUENCES
5 Globins gt 3 years
34HOW CAN I ALIGN MANY SEQUENCES
!DHEALoaded
6 Globins gt300 years
35HOW CAN I ALIGN MANY SEQUENCES
7 Globins gt30. 000 years
Solidified Fossil,Old stuff
36HOW CAN I ALIGN MANY SEQUENCES
8 Globins gt3 Million years
37The Progressive Multiple Alignment
Algorithm (Clustal W)
38(No Transcript)
39Making An Alignment
Any Exact Method would be TOO SLOW
We will use a Heuristic Algorithm.
Progressive Alignment Algorithm is the most
Popular
-ClustalW
40Progressive Alignment
Feng and Dolittle, 1988 Taylor 1989
Clustering
41Progressive Alignment
42Progressive Alignment
-Depends on the CHOICE of the sequences.
-Depends on the ORDER of the sequences (Tree).
- -Depends on the PARAMETERS
- Substitution Matrix.
- Penalties (Gop, Gep).
- Sequence Weight.
- Tree making Algorithm.
43Progressive Alignment When Does It Work
Works Well When Phylogeny is Dense
No outlayer Sequence.
Image River Crossing
44Progressive Alignment When Doesnt It Work
45(No Transcript)
46Building the Right Multiple Sequence Alignment.
47Recognizing The Right Sequences When you Meet
Them
48Gathering Sequences BLAST
49Common Mistake Sequences Too Closely Related
PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGL
KKKSADDVKKVFHILDKDKSGFIEE PRVA_HUMAN
SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHML
DKDKSGFIEE PRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDH
KKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE PRVA_MOUSE
SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHIL
DKDKSGFIEE PRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDH
KKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_RABIT
AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHIL
DKDKSGFIEE .
.. PRVA_MACF
U DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLV
AES PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDG
DGKIGVDEFSTLVAES PRVA_GERSP DELGFILKGFSSDARDLSAK
ETKTLLAAGDKDGDGKIGVEEFSTLVSES PRVA_MOUSE
DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES
PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKI
GVEEFSTLVAES PRVA_RABIT EELGFILKGFSPDARDLSVKETKT
LMAAGDKDGDGKIGADEFSTLVSES
.. .
-IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE
MULTIPLE SEQUENCE ALIGNMENT
-MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY
50(No Transcript)
51Sequence Weighting Within ClustalW
52Selecting Diverse Sequences (Opus II)
53Respect Information!
PRVA_MACFU ------------------------------------
------SMTDLLN----AEDIKKA PRVA_HUMAN
------------------------------------------SMTDLLN-
---AEDIKKA PRVA_GERSP --------------------------
----------------SMTDLLS----AEDIKKA PRVA_MOUSE
------------------------------------------SMTDVLS-
---AEDIKKA PRVA_RAT --------------------------
----------------SMTDLLS----AEDIKKA PRVA_RABIT
------------------------------------------AMTELLN-
---AEDIKKA TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDI
FVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM
.
. PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG----
--LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_HUMAN
VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSG
FIEEDELGFI PRVA_GERSP IGAFAAADS--FDHKKFFQMVG----
--LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI PRVA_MOUSE
IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSG
FIEEDELGSI PRVA_RAT IGAFTAADS--FDHKKFFQMVG----
--LKKKSADDVKKVFHILDKDKSGFIEEDELGSI PRVA_RABIT
IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSG
FIEEEELGFI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
DDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
54Selecting Diverse Sequences (Opus II)
55Selecting Diverse Sequences (Opus II)
PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLT
SKSADDVKKAFAIIDQDKSGFIE PRVB_BOACO
-AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGV
IDRDKSGYIE PRV1_SALSA MACAHLCKEADIKTALEACKAADTFS
FKTFFHTIGFASKSADDVKKAFKVIDQDASGFIE PRVB_LATCH
-AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKI
LDQDKSGFIE PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFN
YKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIE PRVA_MACFU
-SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHI
LDKDKSGFIE PRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFN
HKKFFALVGLKAMSANDVKKVFKAIDADASGFIE
. . . ..
PRVB_CYPCA EDELKLFLQNFKADARALTDGETKT
FLKAGDSDGDGKIGVDEFTALVKA- PRVB_BOACO
EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG
PRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDG
MIGIDEFAVLVKQ- PRVB_LATCH DEELELFLQNFSAGARTLTKTE
TETFLKAGDSDGDGKIGVDEFQKLVKA- PRVB_RANES
QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-
PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDG
KIGVDEFSTLVAES PRVA_ESOLU EEELKFVLKSFAADGRDLTDAE
TKAFLKAADKDGDGKIGIDEFETLVHEA
.. . .
-A REASONABLE Model Now Exists. -Going
FurtherRemote Homologues.
56Aligning Remote Homologues
PRVA_MACFU -------------------------------------
-----SMTDLLNA----EDIKKA PRVA_ESOLU
-------------------------------------------AKDLLKA
----DDIKKA PRVB_CYPCA --------------------------
----------------AFAGVLND----ADIAAA PRVB_BOACO
------------------------------------------AFAGILSD
----ADIAAG PRV1_SALSA --------------------------
---------------MACAHLCKE----ADIKTA PRVB_LATCH
------------------------------------------AVAKLLAA
----ADVTAA PRVB_RANES --------------------------
----------------SITDIVSE----KDIDAA TPCS_RABIT
-TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQ
TPTKEELDAI TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDM
FDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCC_MOUSE
MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQ
NPTPEELQEM
PRVA_MACFU
VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDK
SGFIEEDELGFI PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG--
----LKAMSANDVKKVFKAIDADASGFIEEEELKFV PRVB_CYPCA
LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSG
FIEEDELKLF PRVB_BOACO LQSCQAADS--FSCKTFFAKSG----
--LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA
LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASG
FIEVEELKLF PRVB_LATCH LEGCKADDS--FNHKVFFQKTG----
--LAKKSNEELEAIFKILDQDKSGFIEDEELELF PRVB_RANES
LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSG
FIEQDELGLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMK
EDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG
IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDG
YIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
DDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
. . .. .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLM
AAGDKDGDGKIGVDEFSTLVAES- PRVA_ESOLU
LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA- PRVB_
CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTAL
VKA-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGD
GKIGVEEFVVLVTKG- PRV1_SALSA
LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-- PRVB_
LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKL
VKA-- PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGD
GKIGVEEFQALVKA-- TPCS_RABIT
FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ TPCS_
PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKM
MEGVQ TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNND
GRIDYDEFLEFMKGVE
.. . . .
57SomeGuidelines
58Do Not Use Two Many Sequences
59Reading Your Alignment
60(No Transcript)
61Going Further
PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADD
VKKVFHILDKDKSGFIEEDELGFI PRVB_BOACO
LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSG
YIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG----
--FASKSADDVKKAFKVIDQDASGFIEVEELKLF TPCS_RABIT
IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADG
YIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMK
EDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE
IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADG
YIDLDELKMM TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---
KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI
. .. . . .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKD
GDGKIGVDEFSTLVAES-- PRVB_BOACO
LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-- PRV1
_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVK
Q--- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRID
FDEFLKMMEGVQ- TPCS_PIG FR---ASGEHVTDEEIESIMKDG
DKNNDGRIDFDEFLKMMEGVQ- TPCC_MOUSE
LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE- TPC_
PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMM
SSDA . ..
.
62WHAT MAKES A GOOD ALIGNMENT
-THE MORE DIVERGEANT THE SEQUENCES, THE BETTER
-THE FEWER INDELS, THE BETTER
-NICE UNGAPPED BLOCKS SEPARATED WITH INDELS
- -DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK
- Completely Conserved
- Conserved For Size and Hydropathy
- Conserved For Size or Hydropathy
-THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL
JUDGEMENT AND KNOWLEDGE.
63(No Transcript)
64Potential Difficulties
65DO NOT OVERTUNE!!!
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGG
ELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKS
VAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS--
--KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE
ALIGNMENT YOU WANT MAKE IT YOURSELF!
chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKG
GELWRGLKD wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKN
KSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS
----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNL
SP . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
66TUNING or NOT TUNING!!!
- -PARAMETERS TO TUNE USUALLY INCLUDE
- GOP/ GEP
- MATRIX
- SENSITIVITY Vs SPEED
Substitution Matrices (Etzold and al.
1993) Gonnet 61.7 Blosum50 59.7
Pam250 59.2
-MOST METHODS ARE TUNED FOR WORKING WELL ON
AVERAGE
-PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW
THE THEORY (i.e. Substitution Matrices).
-A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes
little).
-TUNE IF YOU WANT TO CONVINCE YOURSELF.
67(No Transcript)
68KEEP A BIOLOGICAL PERSPECTIVE
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKG
GELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNK
SVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS-
---KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
.
DIFFERENT PARAMETERS
chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EV
AKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQ
KNPKNKSVA-AVGKAAGERWKSLS trybr -K--KDSNAPKR-AMT-MF
FSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse
----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAW
KNLS . ... . . .
.
WRONG ALIGNMENT !!!
69REPEATS
THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT
CONTAIN THE SAME NUMBER OF REPEATS
IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS
AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE
RECOGNIZED USING DOTTER
70(No Transcript)
71Naming Your Sequences The Right Way
72What Are The Available Methods ???
73Simultaneous Alignments MSA
1) Set Bounds on each pair of sequences (Carillo
and Lipman)
2) Compute the Maln within the Hyperspace
-Few Small Closely Related Sequence.
-Memory and CPU hungry
-Do Well When They Can Run.
74Simultaneous Alignments DCA
75Dialign
763) Assemble the alignment according to the
segment pairs.
77-May Align Too Few Residues
-No Gap Penalty -Does well with ESTs
78bibiserv.techfak.uni-bielefeld.de/dialign/submissi
on.html
79Muscle
80Iterative Methods
7.16.1 Progressive
-HMMs, HMMER, SAM, MUSCLE
-Slow, Sometimes Inaccurate
-Good Profile Generators
81MUSCLE
7.16.1 Progressive
82MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
83MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
84T-Coffee
85Mixing Local and Global Alignments
Local Alignment
Global Alignment
Extension
Multiple Sequence Alignment
86Mixing Heterogenous Data With T-Coffee
Local Alignment
Global Alignment
Multiple Alignment
Structural
Specialist
Multiple Sequence Alignment
87Mixing Sequences and Structures with T-Coffee
Seq Vs Seq
LocalGlobal
Seq Vs Struct
Struct Vs Struct
Thread
Superpose
Evaluation on Homestrad
88What is the Local Quality of my Alignment
I
II
89T-Coffee
igs-server.cnrs-mrs.fr/Tcoffee/
90DBClustal
91DBClustal
BlastP
92DBClustal
93DBClustal
94Expasy Blast
95Expasy BLAST
www.expasy.org/tools/blast/
96Expasy BLAST
97Choosing the right method
98Situation ? Solution
99Priority ? Solution
Method Priority Trees Profile 2D Pred 3D-Pred Func-Pred
Accuracy
Speed
100Purpose ? Solution
101Conclusion
102Multiple Alignment
103Multiple Alignment
Know Your Problem What do you want to do with
your MSA
104Addresses
MAFFT Progressive/iterative www.biophys.kyoto-u.jp/katoh
POA Progressive/Simultaneous www.bioinformatics.ucla.edu/poa
MUSCLE Progressive/Iterative www.drive5.com/muscle
105BaliBase
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Description
PROBLEM
106Which Method ?
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Strategy
Strategy
PROBLEM
107Methods /Situtations
1-Carillo and Lipman
-MSA, DCA.
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
2-Segment Based
-DIALIGN, MACAW.
-May Align Too Few Residues -Good For Long Indels
3-Iterative
-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate
-Good Profile Generators
4-Progressive
-ClustalW, Pileup, Multalign
-Fast and Sensitive