Title: Aligning Sequences With Genetic Algorithms
1Aligning Sequences With Genetic Algorithms
Cédric Notredame
2How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .
3Why Is It Difficult To Compute A multiple
Sequence Alignment?
BIOLOGY
What is A
GOOD
Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGG
ELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKS
VAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS--
--KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
.
COMPUTATION
What is
THE
good Alignment?
4Why Is It Difficult To Compute A multiple
Sequence Alignment
BIOLOGY
COMPUTATION
CIRCULAR PROBLEM....
Good
Good
Alignment
Sequences
5The Computational Problem
2 Globins gt1 sec
3 Globins gt2 mn
4 Globins gt5 hours
5 Globins gt3 weeks
6 Globins gt9 years
7 Globins gt1000 years
6Existing Solution
1-Carillo and Lipman
-MSA, DCA.
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
2-Segment Based
-DIALIGN, MACAW.
-May Align Too Few Residues
3-Iterative
-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate
-Good Profile Generators
4-Progressive
-ClustalW, Pileup, Multalign
-Fast and Sensitive
7Progressive Alignment
Feng and Dolittle, 1980 Taylor 1981
Dynamic Programming Using A Substitution Matrix
8SAGA
Biological Objective Function
SAGA
Alignment
?
Biological Quality
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16The Biological Problem. The Charlie Chaplin
Paradox
17(No Transcript)
18An Alignment is a STORY
19Comparing Sequences Reconstructing Evolution
20Most Common Objective Function Sums of Pairs
ModelEvery sequence is the ancestor of every
sequence
PROBLEM -over-estimation of the mutation
costs -Requires a weighting scheme
21(No Transcript)
22(No Transcript)
23(No Transcript)
24T-Coffee A Fast Heuristic for SAGA-COFFEE
25(No Transcript)
26Progressive Alignment Principle and its
Limitations
27The Extended Library Principle
28The Extended Library Principle
29The Extended Library Principle
30The Triplet Assumption
SEQ A
SEQ B
31T-Coffee Progressive Alignment
Notredame, Higgins, Heringa, 2000
Dynamic Programming Using The extended Library
32Mixing Local and Global Alignments
Local Alignment
Global Alignment
Extension
Multiple Sequence Alignment
33Validation Using BaliBase
34Mixing Heterogenous Information With T-Coffee
Local Alignment
Global Alignment
Multiple Alignment
Structural
Specialist
Extension
Multiple Sequence Alignment
35Why Using GAs
Time
SAGA
SAGA-COFFEE
T-COFFEE
36Des Higgins, UCC, Ireland Jaap Heringa, MRC,
UK Liisa Holm, EMBL-EBI, UK Orla OSullivan,
UCC Chantal Abergel, IGS, France
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)