Introduction%20to%20bioinformatics%20lecture%209 - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20bioinformatics%20lecture%209

Description:

PRALINE pre-profile generation ... In PRALINE, this threshold is given as prepro=1500 (alignment score threshold ... (PRALINE prepro=0) ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 36
Provided by: pir80
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20bioinformatics%20lecture%209


1
Introduction to bioinformaticslecture 9
  • Multiple sequence alignment (II)

2
Scoring a profile position
Profile 1
Profile 2
A C D . . Y
A C D . . Y
  • At each position (column) we have different
    residue frequencies for each amino acid (rows)
  • SO
  • Instead of saying SM(aa1, aa2) (one residue
    pair)
  • For frequency fgt0 (amino acid is actually there)
    we take

3
Progressive alignment
  1. Perform pair-wise alignments of all of the
    sequences
  2. Use the alignment scores to produces a dendrogram
    using neighbour-joining methods (guide-tree)
  3. Align the sequences sequentially, guided by the
    relationships indicated by the tree.
  • Biopat (first method ever)
  • MULTAL (Taylor 1987)
  • DIALIGN (12, Morgenstern 1996)
  • PRRP (Gotoh 1996)
  • ClustalW (Thompson et al 1994)
  • PRALINE (Heringa 1999)
  • T Coffee (Notredame 2000)
  • POA (Lee 2002)
  • MUSCLE (Edgar 2004)

4
Progressive multiple alignment
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
5
General progressive multiple alignment technique
(follow generated tree)
d
1
3
1
3
2
5
1
3
2
5
1
root
3
2
5
6
PRALINE progressive strategy
d
1
3
1
3
2
1
3
2
5
4
1
3
2
5
4
7
There are problems
  • Accuracy is very important !!!!
  • Alignment errors during the construction of the
    MSA cannot be repaired anymore propagated into
    the progressive steps.
  • The comparisons of sequences at early steps
    during progressive alignments cannot make use of
    information from other sequences.
  • It is only later during the alignment progression
    that more information from other sequences (e.g.
    through profile representation) becomes employed
    in the alignment steps.

Once a gap, always a gap Feng Doolittle, 1987
8
Additional strategies for multiple sequence
alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

9
Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
1
Key Sequence
2
1
Pre-alignment
3
4
5
Master-slave (N-to-1) alignment
A C D . . Y
1
Pre-profile
Pi Px
10
Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
11
Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
12
Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
13
Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
14
PRALINE pre-profile generation
  • Idea use the information from all query
    sequences to make a pre-profile for each query
    sequence that contains information from other
    sequences
  • You can use all sequences in each pre-profile, or
    use only those sequences that will probably align
    correctly. Incorrectly aligned sequences in the
    pre-profiles will increase the noise level.
  • Select using alignment score only allow
    sequences in pre-profiles if their alignment with
    the score higher than a given threshold value.
    In PRALINE, this threshold is given as
    prepro1500 (alignment score threshold value is
    1500 see next two slides)

15
Flavodoxin-cheY consistency scores(PRALINE
prepro0)
1fx1 --7899999999999TEYTAETIARQL8776-66
57777777777777553799VL999ST97775599989-43556667779
8998878AQGRKVACF FLAV_DESVH
-46788999999999TEYTAETIAREL7777-775777777777777755
3799VL999ST97775599989-435566677798998878AQGRKVACF
FLAV_DESDE -47899999999999999999999988776695
658888777777778763YDAVL999SAW987778987775355666666
9777776789GRKVAAF FLAV_DESGI
-46788999999999TEGVAEAIAKTL9997-766788887777778875
39DVVL999ST987776--9889546667776697776557777888888
FLAV_DESSA 936777999999999999999999999887597
65777888888888876399999999STW77765--99995366666777
97998779999999999 4fxn
-8787799999999999999999997766669675677888888888887
77999999988777776--9889577788888897773237888888888
FLAV_MEGEL 9776779999999999999999997777766-6
65666677788899976799999999987777669--8873623344666
95555455778888888 2fcr
--87899999999999TEVADFIGK9965419003000001122333556
79DLLF99999855312888111224555555407777777888888888
FLAV_ANASP -47899LFYGTQTGKTESVAEIIR977765392
2356677777777897779999999999988843--99985557787778
99998879999999999 FLAV_ECOLI
997789999GSDTGNTENIAKMIQ87742229224566788899999955
69999999999755553----99262225555495777767778999999
FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK998877596
57577888888999777899999999999877761112222222244555
-5555555778999999 FLAV_ENTAG
94789999999999999999999998755229223234555555555555
688899999998875521111111133477777-7777777999999999
FLAV_CLOAB -86999ILYSSKTGKTERVAK999755555505
7678887888887777765778899998522223--98883422344555
97777777777777777 3chy
01222222233333356666655555552229222222222222211121
63335555755553222888877674533344493332222222222222
Avrg Consist 86677788888888899999999987765548
44455566666666665557888888888766544887666334445566
586666556778888888 Conservation
01255386758489697469639464633430452443554465434735
16658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777-
---7777797787787978---5555555667765556777777788887
99------ FLAV_DESVH G888799955555559888888888
899777----7777797787787978---555555566776555677777
778888799------ FLAV_DESDE
A88878685555555999988888889998879--8777788-9877777
7--8555555554433245667777777777599------ FLAV_DESG
I 87775977755555677777777777777778---88888887
667778777775555555555542424667888887777-------- FL
AV_DESSA 977768777555556777777777777777767887
777777778888-978985555555556536556888888888877----
---- 4fxn 86777755555555266666666655555
55778877679998777779777776655555555554444666666665
55798------ FLAV_MEGEL 8577775666666525556777
77888888868997788898877655867788554433322222221223
3223355557-------- 2fcr
87777357333333377776666777776553333333333333332283
3333333332244444567777777888777633------ FLAV_ANAS
P 9777737753333447778888887777777333344444444
44433833333344444444444455577777788777734------ FL
AV_ECOLI 977743786444444777788888888888833334
44444444444424444455555455577566778888888887773411
0000 FLAV_AZOVI 97776355333333466666667777777
77333344444444444448233335555555555554555888888887
7772311---- FLAV_ENTAG 9777738865555558666666
66677666633333333333333322123333344444444455555665
566666555582------ FLAV_CLOAB
76662722222221244444444445555558788222222222222211
1111122222222222344443333333233399------ 3chy
222227222222224111355431113324578-877789976
66556877776322222222222322222323344444422------ A
vrg Consist 86665656444444466666666666666665666
55555655555556555654444434444433444556666666666668
89999 Conservation 736630574333341634645344447
46710000011010011000000010434744645443225474454448
434301000000 Iteration 0 SP 135136.00 AvSP
10.473 SId 3838 AvSId 0.297
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
16
Flavodoxin-cheY consistency scores (PRALINE
prepro1500)
1fx1 -42444IVYGSTTGNTEYTAETIARQL8866
66666577777775667888DLVLLGCSTW77766----99547666676
9-77888788AQGRKVACFFLAV_DESVH
-34444IVYGSTTGNTEYTAETIAREL77666666657777777566788
8DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
FLAV_DESSA -33444IVYGSTTGNTET999998887776557
77668888899666686YDIVLFGCSTW77777----996466666779-
88SL98ADLKGKKVSVFFLAV_DESGI
-34444IVYGSTTGNTEGVA999999999976555567777788666667
8DVVLLGCSTW77777----995466666779-88887688888KKVGVF
FLAV_DESDE -44777IVFGSSTGNTE9887776666555667
77778899999777777YDAVLFGCSAW88877----997587777779-
8887766777GRKVAAF4fxn
-32222IVYWSGTGNTE8888888876666778888888888NI888858
6DILILGCSA888888------8-8888886--66665378ISGKKVALF
FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888
888555555555555485DVILLGCPAMGSE77------572222288--
8888755588GKKVGLF2fcr
-41456IFFSTSTGNTTEVA999998865432222765554443244779
YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
FLAV_ANASP -00456LFYGTQTGKTESVAEII9877553233
22427776666623589YQYLIIGCPTW55532--999843678W98889
9998888888GKLVAYFFLAV_AZOVI
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777
YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL666466
4424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-
8NTLSEADLTGKTVALFFLAV_ECOLI
-51114IFFGSDTGNTENIAKMI987743311111555555588355599
YDILLLGIPT954431----88355225544--44666666779KLVALF
FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE633333333
33333333333366LQESEGIIFGTPTY63--6--------66SWE3333
3333333333GKLGAAF3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGG
YGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
93344599999999999999999887766555555556666677566678
89999999999767658888775555566668967777677889999999
Conservation 023642867584896974696394646334435
43125645654143443665886856755445500000031446544600
55575345547747759 1fx1
G98879-89-999877977--7788899999999955--88888-99
88887798999777778766553344588776666222266899899FL
AV_DESVH G98879-89-999877977--778889999999995
5--88888-99888877989997777787665533445887766662222
66899899FLAV_DESSA G98878-688688888-88--8899
9999999999979988888887788889-89-978777766675664557
7776666654466899899FLAV_DESGI
G98879-898688888987--788888999GATLV7698899-9998789
888-8899787878776663122477788888333276899899FLAV_
DESDE AS8888-68-888888899--9999999999988888-9
99888889887788978887766688542222122555555553332779
999994fxn GS2228-228222222222--2388888
88888888888888888888888888888888777886676553557755
5533221288888888FLAV_MEGEL
G4888--28-8888882MD--AWKQRTEDTGATVI77-------------
--------77222--224444222222244222112--------2fcr
GLGDA5-8Y5DNFC88-88--887777777777776544
45555555555443855557777744653333577999999875553338
99899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKIS
QRGG9997555554444444443328444446666555555555666667
6666433333899899FLAV_AZOVI
GLGDQ5-885777555-55--55555788888888555555555555555
554855555555555666555555888855555544442--288FLAV_
ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG888
8EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE888422426
88688FLAV_ECOLI GC99549784688888987997777777
77888885544444444444444411444477777445577556778888
8887433322100100FLAV_CLOAB
STANS636666333333333333666666666666666666333336336
6336663333336EDENARIFGERIANKVKQI3333336666663chy
VTAEA---KKENIIAA-----------AQAGAS------
-------------------GYVVK-----PFTAATLEEKLNKIFEKLGM-
----- Avrg Consist
99887797877777777779977888888888888667777777777677
66677777676667766655455577776666433355788788Conse
rvation 74664003715454570630035453444474575300
00010100100000000106837601444423355744544484343010
00000 Iteration 0 SP 136702.00
AvSP 10.654 SId 3955 AvSId 0.308
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
17
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective integrate secondary structure
    information to anchor alignments and avoid errors

18
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
19
Why use (predicted) structural information
  • Structure more conserved than sequence
  • Many structural protein families (e.g. globins)
    have family members with very low sequence
    similarities. For example, globin sequences
    identities can be as low as 10 while still
    having an identical fold.
  • This means that you can still observe equivalent
    secondary structures in homologous proteins even
    if sequence similarities are extremely low.
  • But you are dependent on the quality of
    prediction methods. For example, secondary
    structure prediction is currently at 76
    correctness. So, 1 out of 4 predicted amino acids
    is still incorrect.

20
Two superposed protein structures with two
well-superposed helices
Red well superposed Blue low match quality
C5 anaphylatoxin -- human (PDB code 1kjs) and pig
(1c5a)) proteins are superposed
21
How to combine ss and aa info
Amino acid substitution matrices
Dynamic programming search matrix
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
22
In terms of scoring
  • So how would you score a profile using this extra
    information?
  • Same formula as in lecture 6, but you can use
    sec. struct. specific substitution scores in
    various combinations.
  • Where does it fit in?
  • Very important structure is always more
    conserved than sequence so it can help with the
    insertion(or not) of gaps.

23
Sequences to be aligned
Predict secondary structure
HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHH
HCCCEEEE
CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC
Secondary structure
Align sequences using secondary structure
Multiple alignment
24
Using predicted secondary structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee

eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht

G
25
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objectives
  • Instead of single amino acid positions, focus on
    local alignments
  • Consider best local alignment through each cell
    in DP matrix
  • Try to avoid (early) errors

26
Globalised local alignment
1. Local (SW) alignment (M Po,e)


2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
27
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

28
Integrating alignment methods and alignment
information with T-Coffee
  • Integrating different pair-wise alignment
    techniques (NW, SW, ..)
  • Combining different multiple alignment methods
    (consensus multiple alignment)
  • Combining sequence alignment methods with
    structural alignment techniques
  • Plug in user knowledge

29
Matrix extension
  • T-Coffee
  • Tree-based Consistency Objective Function For
    alignmEnt Evaluation
  • Cedric Notredame
  • Des Higgins
  • Jaap Heringa J. Mol. Biol., 302, 205-2172000

30
Using different sources of alignment information

Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
31
Search matrix extension alignment transitivity
32
T-Coffee
Other sequences
Direct alignment
33
Search matrix extension
34
but.....
  • T-COFFEE (V1.23) multiple sequence alignment
  • Flavodoxin-cheY
  • 1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-
    YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
    L-FDSLEETGAQGRK-----
  • FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-
    YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
    L-FDSLEETGAQGRK-----
  • FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-
    METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVP
    L-YEDLDRAGLKDKK-----
  • FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-
    IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIP
    L-YDSLENADLKGKK-----
  • FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-
    HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLS
    L-FEEFNRFGLAGRK-----
  • 4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-
    KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEP
    F-IEEIS-TKISGKK-----
  • FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-
    ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEP
    F-FTDLA-PKLKGKK-----
  • FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGN
    IEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKK
    W-IDESSEFNLEGKL-----
  • 2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA-
    --DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDE
    FLYDKLPEVDMKDLP-----
  • FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA-
    --DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQE
    F-TNTLSEADLTGKT-----
  • FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV-
    --VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEG
    L-YSELDDVDFNGKL-----
  • FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-
    M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEE
    F-LPKIEGLDFSGKT-----
  • FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV-
    --ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDD
    F-FPTLEEIDFNGKL-----
  • 3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-N
    VE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE-------------
    -LLKTIRADGAMSALPVLMV
  • . . . .

35
Multiple alignment methods
  • Multi-dimensional dynamic programminggt extension
    of pairwise sequence alignment.
  • Progressive alignmentgt incorporates phylogenetic
    information to guide the alignment process
  • Iterative alignmentgt correct for problems with
    progressive alignment by repeatedly realigning
    subgroups of sequence
Write a Comment
User Comments (0)
About PowerShow.com