Introduction to bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to bioinformatics

Description:

Title: Introduction to bioinformatics Author: pirovano Last modified by: heringa Created Date: 3/14/2006 9:06:45 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 91
Provided by: pir80
Category:

less

Transcript and Presenter's Notes

Title: Introduction to bioinformatics


1
Introduction to bioinformatics 2007Lecture 10
Multiple Sequence Alignment (II)
2
Progressive multiple alignment
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
3
Progressive alignment strategy
  1. Perform pair-wise alignments of all of the
    sequences (all against all e.g. make N(N-1)/2
    alignments)
  2. Use the alignment scores to make a similarity (or
    distance) matrix
  3. Use that matrix to produce a guide tree
  4. Align the sequences successively, guided by the
    order and relationships indicated by the tree
    (N-1 alignment steps).

4
Progressive alignment strategy
  • Methods
  • Biopat (Hogeweg and Hesper 1984 -- first
    integrated method ever)
  • MULTAL (Taylor 1987)
  • DIALIGN (12, Morgenstern 1996)
  • PRRP (Gotoh 1996)
  • ClustalW (Thompson et al 1994)
  • PRALINE (Heringa 1999)
  • T-Coffee (Notredame 2000)
  • POA (Lee 2002)
  • MUSCLE (Edgar 2004)
  • PROBSCONS (Do, 2005)

5
Pair-wise alignment quality versus sequence
identity(Vogt et al., JMB 249, 816-831,1995)
6
Flavodoxin fold aligning 13 Flavodoxins cheY
5(??) fold
7
Flavodoxin-cheY NJ tree
8
Flavodoxin fold helix-beta-helix
9
Flavodoxin family - TOPS diagrams
The basic topology of the flavodoxin fold is
given below, the other four TOPS diagrams show
flavodoxin folds with local insertions of
secondary structure elements.
2
3
4
1
2
3
4
5
?-helix ?-strand
1
5
10
Flavodoxin-cheY NJ tree
11
Flavodoxin-cheY Pre-processing (prepro?1500)
12
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
13
Clustal, ClustalW, ClustalX
  • CLUSTAL W/X (Thompson et al., 1994) uses
    Neighbour Joining (NJ) algorithm (Saitou and Nei,
    1984), widely used in phylogenetic analysis, to
    construct a guide tree (see lecture on
    phylogenetic methods).
  • Sequence blocks are represented by profile, in
    which the individual sequences are additionally
    weighted according to the branch lengths in the
    NJ tree.
  • Further carefully crafted heuristics include
  • (i) local gap penalties
  • (ii) automatic selection of the amino acid
    substitution matrix, (iii) automatic gap penalty
    adjustment
  • (iv) mechanism to delay alignment of sequences
    that appear to be distant at the time they are
    considered.
  • CLUSTAL (W/X) does not allow iteration (Hogeweg
    and Hesper, 1984 Corpet, 1988, Gotoh, 1996
    Heringa, 1999, 2002)

14
ClustalW web-interface
15
  • CLUSTAL X (1.64b) multiple sequence alignment
    Flavodoxin-cheY
  • 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
    VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
    -SLEETGAQGRK
  • FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-E
    VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
    -SLEETGAQGRK
  • FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
    TTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE
    -DLDRAGLKDKK
  • FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
    VELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD
    -SLENADLKGKK
  • FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
    VTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE
    -EFNRFGLAGRK
  • FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
    VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
    -ESSEFNLEGKL
  • FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
    VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
    -TDLAPKLKGKK
  • 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
    VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
    -EEISTKISGKK
  • FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
    --LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
    -ELDDVDFNGKL
  • FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
    --ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
    -KIEGLDFSGKT
  • 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP-
    --IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLY
    DKLPEVDMKDLP
  • FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP-
    --LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
    -TLSEADLTGKT
  • FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD--
    --VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
    -TLEEIDFNGKL
  • 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----
    FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
    -LELLKTIR---
  • . ... .
    .

The secondary structures of 4 sequences are known
and can be used to asses the alignment (red is
?-strand, blue is ?-helix)
16
There are problems
  • Accuracy is very important !!!!
  • Progressive multiple alignment is a greedy
    strategy Alignment errors during the
    construction of the MSA cannot be repaired
    anymore and these errors are propagated into
    later progressive steps.
  • Comparisons of sequences at early steps during
    progressive alignment cannot make use of
    information from other sequences.
  • It is only later during the alignment progression
    that more information from other sequences (e.g.
    through profile representation) becomes employed
    in the alignment steps.

17
Progressive multiple alignment
Once a gap, always a gap Feng Doolittle, 1987
18
Additional strategies for multiple sequence
alignment
  • Profile pre-processing (Praline)
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

19
PRALINE web-interface
20
Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
1
Key Sequence
2
1
Pre-alignment
3
4
5
Master-slave (N-to-1) alignment
A C D . . Y
1
Pre-profile
Pi Px
21
Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
22
Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
23
Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
24
Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
25
PRALINE pre-profile generation
  • Idea use the information from all query
    sequences to make a pre-profile for each query
    sequence that contains information from other
    sequences
  • You can use all sequences in each pre-profile, or
    use only those sequences that will probably align
    correctly. Incorrectly aligned sequences in the
    pre-profiles will increase the noise level.
  • Select using alignment score only allow
    sequences in pre-profiles if their alignment with
    the score higher than a given threshold value.
    In PRALINE, this threshold is given as
    prepro1500 (alignment score threshold value is
    1500 see next two slides)

26
Reliable sequences for pre-profiles
The curve each time gives the number of pairwise
alignments (y) scoring less than x. The range
1500ltxlt1800 shows a flat section of the curve
that can serve as a natural cut-off point for
admitting sequences into the pre-alignment blocks
27
Global pre-processing (prepro?0)
  • Preprocessed profile for sequence 2
  • 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDV
    DDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKD
    LPVAIFGLGDAEGYPD
  • 1fx1 KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDS
    RDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQG
    RKVACFGCGDS-SY-E
  • 4fxn -MKIVYWSGTGNTEKMAELIAKGISGKDVNTINV
    SDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKK
    VALGSYGWGDGKWMRD
  • FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
    SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
    KLVAYfGTGDQIGYAD
  • FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNV
    NRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSG
    KTVALfGLGDQVGYPE
  • FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
    KDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEG
    KLGAAfSTANAGGSDI
  • FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTL
    LNAADASALADYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAG
    RKVAAfASGDQE-Y-E
  • FLAV_DESGI KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
    ADVTAPGLAEGYDVVLLgCSTW--GDDEIELQEDFVP-LYEDLDRAGLKD
    KKVGVfGCGDS-SY-T
  • FLAV_DESSA KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVEL
    KNVTDVSVANGYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKG
    KKVSVfGCGDSD-Y-T
  • FLAV_DESVH KALIVYGSTTGNTEYTaETIAREL-ADAGYEVDS
    RDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQG
    RKVACfGCGDS-SY-E
  • FLAV_ECOLI AIGIFFGSDTGNTENIaKMIQKQLG--KDV-ADV
    HDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNG
    KLVALfGCGDQEDYAE
  • FLAV_ENTAG TIGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDV
    RRATREQFL-SYPVLLLgTPTLGDGLPGVEAGSSWQEFT-NTLSEADLTG
    KTVALfGLGDQLNYSK
  • FLAV_MEGEL MVEIVYWSGTGNTEAMaNEIEAAVAAGADVSVRF
    ED-TNVDDVASKDVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKG
    KKVGLfGYGWGSG---
  • 3chy KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGV
    DALNKLQA-GGYGFVI---SDWNM---PNMDGL---ELLKTIRADGAMSA
    LPVLMV---TAEAKKE
  • 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
    KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV
  • 1fx1 YFCGAVDAIEEKLKNLGA----------------
    EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI--

28
Global pre-processing (prepro?0)
  • Preprocessed profile for sequence 3
  • 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
    VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
    FGSYGWGDGKWMRDFE
  • 1fx1 ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRD
    AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
    FGSYEYFCGA-VDAIE
  • 2fcr IGIFFSTSTGNTTEVADFIGKTL--GAKADAPID
    VDDVTDPQALKDDLLFLGANTGADTERSGTSWDEFLYDKLPEVDMKDLPV
    -AIFGLGDAEGYPDFC
  • FLAV_ANASP IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLD
    VSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAY
    fGTIGYADNDAIGILE
  • FLAV_AZOVI IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALN
    VNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVAL
    fGQVGYPEGELYSFFK
  • FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
    LDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAA
    fSTAGGSDIALLTILN
  • FLAV_DESDE VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLN
    AADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAA
    fAS---GDQEYVPAIE
  • FLAV_DESGI ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
    VADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGV
    fGSYTYFCGA-VDVIE
  • FLAV_DESSA MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKN
    VTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSV
    fGDYTYFCGA-VDAIE
  • FLAV_DESVH ALIVYGSTTGNTEYTaETIARELADAGYEVDSRD
    AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
    fGSYEYFCGA-VDAIE
  • FLAV_ECOLI TGIFFGSDTGNTENIaKMIQK---QLGKDVADVD
    IAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
    fGDYAFCDAGTIRDIE
  • FLAV_ENTAG IGIFFGSDTGQTRKVaKLIHQK-LDGIADA-PLD
    VRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
    fGNYSKNFVSAMRILY
  • FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVR
    FEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGL
    fGSYGWGSGEWMDAWK
  • 3chy DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVE
    EAEDGVD-ALNK-LQAGGYGVISDWNMPNMDGLELLKTI--RADGAMSAL
    PVLMVTAEAKKENIIA
  • 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
    IANI
  • 1fx1 EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHD
    VRGA

29
Reliable sequences for pre-profiles
30
Pre-profiles (prepro?1500)
1
2
31
Pre-profiles (prepro?1500)
13
14
32

Local pre-processing
Local alignments are calculated from high to low
scoring each time the sequence parts
corresponding to a selected local alignment are
blocked such that a next local alignment has to
emerge before or after the earlier selected one
this preserves co-linearity of the local
alignments and assocaited sequence fragments in
the pre-alignments
33
Local pre-processing (locprepro?0)
  • Preprocessed profile for sequence 2 2fcr
  • 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDV
    DDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKD
    LPVAIFGLGDAEGYPD
  • 1fx1 ...IVYGSTTGNTEYTAETIARQL---ANAGYEV
    DDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQG
    RKVACFGCGDS-SY-E
  • 4fxn KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INV
    SDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKG
    KKVALFGWGDGKGYG-
  • FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
    SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
    KLVAYfGTGDQIGYAD
  • FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNV
    NRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSG
    KTVALfGLGDQVGYPE
  • FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
    KDAVDKKFLQESEGIIFgTPTY-------YANISWEKWI-DESSEFNLEG
    KLGAAfSTANSAGGSD
  • FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAAAADA--SA
    ENLAD-----GYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAG
    RKVAAfASGDQE-Y-E
  • FLAV_DESGI ...IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
    ADVTAPGLAEGYDVVLLgCSTW--GDDIELQ----EDFLYEDLDRAGLKD
    KKVGVfGCGDS-SY-T
  • FLAV_DESSA ...IVYGSTTGNTETAaEYVAEAFENK---EIDV
    ENVTD-VSVADYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKG
    KKVSVfGCGDSD-Y-T
  • FLAV_DESVH ...IVYGSTTGNTEYTaETIAREL---ADAGYEV
    DDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQG
    RKVACfGCGDS-SY-E
  • FLAV_ECOLI ..GIFFGSDTGNTENIaKMIQKQLG-K-----DV
    ADVHDKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNG
    KLVALfGCGDQEDYAE
  • FLAV_ENTAG .IGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDV
    RRATREQFL-SYPVLLLgTPT--LG-DGELPGVSWQEFT-NTLSEADLTG
    KTVALfGLGDQLNYSK
  • FLAV_MEGEL .VEIVYWSGTGNTEAMaNEIEKAAGADVESDTNV
    DDV----ASK--DVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKG
    KKVGLfGYGWGSG---
  • 3chy ..................................
    .........................ADKELKFLVVDDFIVRNL----LKE
    L-----GFNNVEEAED
  • 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
    KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV

34
Local pre-processing (locprepro?0)
  • Preprocessed profile for sequence 3 4fxn
  • 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
    VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
    FGSYGWGDGKWMRDFE
  • 1fx1 ..IVYGSTTGNTEYTAETIARQLANAGYEVDSRD
    AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
    FGC---GDSSYVDAIE
  • 2fcr .KIIFFSSTGNTTEVADFIGKTL---GAKADAID
    VDDVTDPQALKDDLLFLGAPTTGADT-ERSSWDEFLPEVDMK--DLPVAI
    F---GLGDAE------
  • FLAV_ANASP ..LFYGTQTGKTESVaEIIRD---EFGNDVVTLD
    VSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAY
    fGTIGYADGKWSTDFN
  • FLAV_AZOVI ..LFFGSNTGKTRKVaKSIKKRFDETMSD--ALN
    VNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVAL
    fGQVGYGEGSWSTD--
  • FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
    LDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIAL
    LGGVAFGKPK------
  • FLAV_DESDE ..IVFGSSTGNTEKLEELIAAG----GHEVTLLN
    AADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAA
    fAS---GDQEY-EHFE
  • FLAV_DESGI ..IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
    VADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGV
    fGC---GDSSYTYDIE
  • FLAV_DESSA ..IVYGSTTGNTETAaEYVAEAFENKEIDVELKN
    VTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSV
    fGC---GDS----DYE
  • FLAV_DESVH ..IVYGSTTGNTEYTaETIARELADAGYEVDSRD
    AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
    fGC---GDSSYVDAIE
  • FLAV_ECOLI ..IFFGSDTGNTENIaKMIQK---QLGKDV--AD
    VHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
    fGC---GD---QEDYA
  • FLAV_ENTAG ..IFFGSDTGQTRKVaKLIHQGIADAPLDVRR--
    ---ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
    f---GLGDQNYSKNFV
  • FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVR
    FEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGL
    fGSYGWGSGEWMDAWK
  • 3chy .RIV......N...LKEL---GFVEEAEDVDALN
    ISDPNMDELLRADVLMVTAEAKKENIIAAAQVKPFLEEKLNKIFEK....
    ................
  • 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
    IANI

35
  • CLUSTAL X (1.64b) multiple sequence alignment
    Flavodoxin-cheY
  • 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
    VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
    -SLEETGAQGRK
  • FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-E
    VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
    -SLEETGAQGRK
  • FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
    TTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE
    -DLDRAGLKDKK
  • FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
    VELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD
    -SLENADLKGKK
  • FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
    VTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE
    -EFNRFGLAGRK
  • FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
    VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
    -ESSEFNLEGKL
  • FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
    VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
    -TDLAPKLKGKK
  • 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
    VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
    -EEISTKISGKK
  • FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
    --LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
    -ELDDVDFNGKL
  • FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
    --ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
    -KIEGLDFSGKT
  • 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP-
    --IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLY
    DKLPEVDMKDLP
  • FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP-
    --LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
    -TLSEADLTGKT
  • FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD--
    --VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
    -TLEEIDFNGKL
  • 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----
    FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
    -LELLKTIR---
  • . ... .
    .

36
Flavodoxin-cheY Pre-processing (prepro?1500)
  • 1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YE
    VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-
    DSLEETGAQGRKVACF
  • FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HE
    VTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-
    EEFNRFGLAGRKVAAf
  • FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YE
    VDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-
    DSLEETGAQGRKVACf
  • FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-ID
    VELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-
    DSLENADLKGKKVSVf
  • FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-ME
    TTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-
    EDLDRAGLKDKKVGVf
  • 2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KA
    DAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLY
    DKLPEVDMKDLPVAIF
  • FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MS
    DA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-
    PKIEGLDFSGKTVALf
  • FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IA
    DAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-
    NTLSEADLTGKTVALf
  • FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DV
    VTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-
    SELDDVDFNGKLVAYf
  • FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DV
    ADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-
    PTLEEIDFNGKLVALf
  • 4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KD
    VNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-
    EEIS-TKISGKKVALF
  • FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-AD
    VESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-
    TDLA-PKLKGKKVGLf
  • FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIE
    VKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-
    DESSEFNLEGKLGAAf
  • 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NV
    EEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-
    KTIRADGAMSALPVLM
  • T
  • 1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD
    ---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-
    -------
  • FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE
    ---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-
    -------
  • FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD
    ---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-
    -------

37
Flavodoxin-cheY Local Pre-processing(locprepro?3
00)
  • 1fx1 --PKALIVYGSTTGNTEYTAETIARQLANAGYEV
    DSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--F
    DSLEETGAQGRKVACF
  • FLAV_DESVH -MPKALIVYGSTTGNTEYTaETIARELADAGYEV
    DSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--F
    DSLEETGAQGRKVACf
  • FLAV_DESSA -MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDV
    ELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--Y
    DSLENADLKGKKVSVf
  • FLAV_DESGI -MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMET
    TVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--Y
    EDLDRAGLKDKKVGVf
  • FLAV_DESDE -MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEV
    TLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--F
    EEFNRFGLAGRKVAAf
  • 4fxn --MK--IVYWSGTGNTEKMAELIAKGIIESGKDV
    NTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--I
    EEIS-TKISGKKVALF
  • FLAV_MEGEL -MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADV
    ESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--F
    TDLA-PKLKGKKVGLf
  • 2fcr ---KIGIFFSTSTGNTTEVADFIGKTLGAKADAP
    I--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-Y
    DKLPEVDMKDLPVAIF
  • FLAV_ANASP -SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTL
    H--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--Y
    SELDDVDFNGKLVAYf
  • FLAV_AZOVI --AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSD
    A-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--L
    PKIEGLDFSGKTVALf
  • FLAV_ENTAG -MATIGIFFGSDTGQTRKVaKLIHQKLDG--IAD
    APLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--T
    NTLSEADLTGKTVALf
  • FLAV_ECOLI --AITGIFFGSDTGNTENIaKMIQKQLGKDVADV
    H--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--F
    PTLEEIDFNGKLVALf
  • FLAV_CLOAB --MKISILYSSKTGKTERVaKLIEEGVKRSGNIE
    VKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWI
    DESSEFNLEGKLGAAf
  • 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEE
    AEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--L
    KTIRADGAMSALPVLM
  • 1fx1 GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEI
    VQD---------------------GLRID--GDPRAARDDIVGWAHDVRG
    AI--------
  • FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEI
    VQD---------------------GLRID--GDPRAARDDIVGwAHDVRG
    AI--------
  • FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVV
    IGD---------------------SLKID--GDPE--RDEIVSwGSGIAD
    KI--------
  • FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATL
    VAS---------------------SLKID--GEPD--SAEVLDwAREVLA
    RV--------

38
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
    (Praline-SS)
  • Globalised local alignment
  • Matrix extension
  • Objective integrate secondary structure
    information to anchor alignments and avoid errors

39
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
40
Why use (predicted) structural information
  • Structure more conserved than sequence
  • Many structural protein families (e.g. globins)
    have family members with very low sequence
    similarities. For example, globin sequences
    identities can be as low as 10 while still
    having an identical fold.
  • This means that you can still observe equivalent
    secondary structures in homologous proteins even
    if sequence similarities are extremely low.
  • But you are dependent on the quality of
    prediction methods. For example, secondary
    structure prediction is currently at 76
    correctness. So, 1 out of 4 predicted amino acids
    is still incorrect.

41
Two superposed protein structures with two
well-superposed helices
The superposed structures lead to close pairs of
C? atoms that are taken as equivalent this
leads to a structural alignment in which the
amino acids corresponding to equivalent C? atom
pairs are matched
Red well superposed Blue low match quality
C5 anaphylatoxin -- human (PDB code 1kjs) and pig
(1c5a)) proteins are superposed
42
How to combine secondary structure and amino acid
information
Amino acid substitution matrices
Dynamic programming search matrix
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
43
In terms of scoring
  • So how would you score a profile using this extra
    information?
  • Same way of scoring as before, but you can use
    sec. struct. specific substitution scores in
    various combinations.
  • Where does it fit in?
  • Very important structure is always more
    conserved than sequence so secondary structure
    elements can help anchoring the alignments

44
Sequences to be aligned
Predict secondary structure
HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHH
HCCCEEEE
CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC
Secondary structure
Align sequences using secondary structure
Multiple alignment
45
Using predicted secondary structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee

eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht

G
46
Strategies for multiple sequence alignment
not for exam
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objectives
  • Instead of single amino acid positions, focus on
    local alignments
  • Consider best local alignment through each cell
    in DP matrix
  • Try to avoid (early) errors

47
Globalised local alignment
not for exam
1. Local (SW) alignment (M Po,e)


2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
48
Globalised local alignment
not for exam
1.
2.
49
M BLOSUM62, Po 0, Pe 0
not for exam
50
M BLOSUM62, Po 12, Pe 1
not for exam
51
M BLOSUM62, Po 60, Pe 5
not for exam
52
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

53
Integrating alignment methods and alignment
information with T-Coffee
  • Integrating different pair-wise alignment
    techniques (NW, SW, ..)
  • Combining different multiple alignment methods
    (consensus multiple alignment)
  • Combining sequence alignment methods with
    structural alignment techniques
  • Plug in user knowledge

54
Matrix extension
  • T-Coffee
  • Tree-based Consistency Objective Function For
    alignmEnt Evaluation
  • Cedric Notredame (Bioinformatics for dummies)
  • Des Higgins
  • Jaap Heringa J. Mol. Biol., 302, 205-2172000

55
Using different sources of alignment information

Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
56
T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight 3 V31 5 L33 10 3 V31 6
L34 14 5 L33 6 R35 21 5 l33 6 I36 35
57
Matrix extension
2
1
3
1
4
1
3
2
4
2
4
3
58
Search matrix extension alignment transitivity
59
T-Coffee
Other sequences
Direct alignment
60
Search matrix extension
61
T-COFFEE web-interface
62
3D-COFFEE
  • Computes structural based alignments
  • Structures associated with the sequences are
    retrieved and the information is used to optimise
    the MSA
  • More accurate but for many (many) proteins we
    do not have the structure!

63
but.....
  • T-COFFEE (V1.23) multiple sequence alignment
  • Flavodoxin-cheY
  • 1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-
    YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
    L-FDSLEETGAQGRK-----
  • FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-
    YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
    L-FDSLEETGAQGRK-----
  • FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-
    METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVP
    L-YEDLDRAGLKDKK-----
  • FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-
    IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIP
    L-YDSLENADLKGKK-----
  • FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-
    HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLS
    L-FEEFNRFGLAGRK-----
  • 4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-
    KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEP
    F-IEEIS-TKISGKK-----
  • FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-
    ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEP
    F-FTDLA-PKLKGKK-----
  • FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGN
    IEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKK
    W-IDESSEFNLEGKL-----
  • 2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA-
    --DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDE
    FLYDKLPEVDMKDLP-----
  • FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA-
    --DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQE
    F-TNTLSEADLTGKT-----
  • FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV-
    --VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEG
    L-YSELDDVDFNGKL-----
  • FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-
    M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEE
    F-LPKIEGLDFSGKT-----
  • FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV-
    --ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDD
    F-FPTLEEIDFNGKL-----
  • 3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-N
    VE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE-------------
    -LLKTIRADGAMSALPVLMV
  • . . . .

64
Multiple alignment methods
  • Multi-dimensional dynamic programminggt extension
    of pairwise sequence alignment.
  • Progressive alignmentgt incorporates phylogenetic
    information to guide the alignment process
  • Iterative alignmentgt correct for problems with
    progressive alignment by repeatedly realigning
    subgroups of sequence

65
Iteration
Iteration can help in cases where one can learn
from the data produced in a preceding step, so
that the next step can be taken in a more
informed way.
Convergence
Limit cycle
Divergence
66
Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
67
Flavodoxin-cheY consistency scores(PRALINE
prepro0)
Completely consistently aligned amino acids
1fx1 --7899999999999TEYTAETIARQL8776-66
57777777777777553799VL999ST97775599989-43556667779
8998878AQGRKVACF FLAV_DESVH
-46788999999999TEYTAETIAREL7777-775777777777777755
3799VL999ST97775599989-435566677798998878AQGRKVACF
FLAV_DESDE -47899999999999999999999988776695
658888777777778763YDAVL999SAW987778987775355666666
9777776789GRKVAAF FLAV_DESGI
-46788999999999TEGVAEAIAKTL9997-766788887777778875
39DVVL999ST987776--9889546667776697776557777888888
FLAV_DESSA 936777999999999999999999999887597
65777888888888876399999999STW77765--99995366666777
97998779999999999 4fxn
-8787799999999999999999997766669675677888888888887
77999999988777776--9889577788888897773237888888888
FLAV_MEGEL 9776779999999999999999997777766-6
65666677788899976799999999987777669--8873623344666
95555455778888888 2fcr
--87899999999999TEVADFIGK9965419003000001122333556
79DLLF99999855312888111224555555407777777888888888
FLAV_ANASP -47899LFYGTQTGKTESVAEIIR977765392
2356677777777897779999999999988843--99985557787778
99998879999999999 FLAV_ECOLI
997789999GSDTGNTENIAKMIQ87742229224566788899999955
69999999999755553----99262225555495777767778999999
FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK998877596
57577888888999777899999999999877761112222222244555
-5555555778999999 FLAV_ENTAG
94789999999999999999999998755229223234555555555555
688899999998875521111111133477777-7777777999999999
FLAV_CLOAB -86999ILYSSKTGKTERVAK999755555505
7678887888887777765778899998522223--98883422344555
97777777777777777 3chy
01222222233333356666655555552229222222222222211121
63335555755553222888877674533344493332222222222222
Avrg Consist 86677788888888899999999987765548
44455566666666665557888888888766544887666334445566
586666556778888888 Conservation
01255386758489697469639464633430452443554465434735
16658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777-
---7777797787787978---5555555667765556777777788887
99------ FLAV_DESVH G888799955555559888888888
899777----7777797787787978---555555566776555677777
778888799------ FLAV_DESDE
A88878685555555999988888889998879--8777788-9877777
7--8555555554433245667777777777599------ FLAV_DESG
I 87775977755555677777777777777778---88888887
667778777775555555555542424667888887777-------- FL
AV_DESSA 977768777555556777777777777777767887
777777778888-978985555555556536556888888888877----
---- 4fxn 86777755555555266666666655555
55778877679998777779777776655555555554444666666665
55798------ FLAV_MEGEL 8577775666666525556777
77888888868997788898877655867788554433322222221223
3223355557-------- 2fcr
87777357333333377776666777776553333333333333332283
3333333332244444567777777888777633------ FLAV_ANAS
P 9777737753333447778888887777777333344444444
44433833333344444444444455577777788777734------ FL
AV_ECOLI 977743786444444777788888888888833334
44444444444424444455555455577566778888888887773411
0000 FLAV_AZOVI 97776355333333466666667777777
77333344444444444448233335555555555554555888888887
7772311---- FLAV_ENTAG 9777738865555558666666
66677666633333333333333322123333344444444455555665
566666555582------ FLAV_CLOAB
76662722222221244444444445555558788222222222222211
1111122222222222344443333333233399------ 3chy
222227222222224111355431113324578-877789976
66556877776322222222222322222323344444422------ A
vrg Consist 86665656444444466666666666666665666
55555655555556555654444434444433444556666666666668
89999 Conservation 736630574333341634645344447
46710000011010011000000010434744645443225474454448
434301000000 Iteration 0 SP 135136.00 AvSP
10.473 SId 3838 AvSId 0.297
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
68
Flavodoxin-cheY consistency scores (PRALINE
prepro1500)
1fx1 -42444IVYGSTTGNTEYTAETIARQL8866
66666577777775667888DLVLLGCSTW77766----99547666676
9-77888788AQGRKVACFFLAV_DESVH
-34444IVYGSTTGNTEYTAETIAREL77666666657777777566788
8DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
FLAV_DESSA -33444IVYGSTTGNTET999998887776557
77668888899666686YDIVLFGCSTW77777----996466666779-
88SL98ADLKGKKVSVFFLAV_DESGI
-34444IVYGSTTGNTEGVA999999999976555567777788666667
8DVVLLGCSTW77777----995466666779-88887688888KKVGVF
FLAV_DESDE -44777IVFGSSTGNTE9887776666555667
77778899999777777YDAVLFGCSAW88877----997587777779-
8887766777GRKVAAF4fxn
-32222IVYWSGTGNTE8888888876666778888888888NI888858
6DILILGCSA888888------8-8888886--66665378ISGKKVALF
FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888
888555555555555485DVILLGCPAMGSE77------572222288--
8888755588GKKVGLF2fcr
-41456IFFSTSTGNTTEVA999998865432222765554443244779
YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
FLAV_ANASP -00456LFYGTQTGKTESVAEII9877553233
22427776666623589YQYLIIGCPTW55532--999843678W98889
9998888888GKLVAYFFLAV_AZOVI
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777
YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL666466
4424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-
8NTLSEADLTGKTVALFFLAV_ECOLI
-51114IFFGSDTGNTENIAKMI987743311111555555588355599
YDILLLGIPT954431----88355225544--44666666779KLVALF
FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE633333333
33333333333366LQESEGIIFGTPTY63--6--------66SWE3333
3333333333GKLGAAF3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGG
YGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
93344599999999999999999887766555555556666677566678
89999999999767658888775555566668967777677889999999
Conservation 023642867584896974696394646334435
43125645654143443665886856755445500000031446544600
55575345547747759 1fx1
G98879-89-999877977--7788899999999955--88888-99
88887798999777778766553344588776666222266899899FL
AV_DESVH G98879-89-999877977--778889999999995
5--88888-99888877989997777787665533445887766662222
66899899FLAV_DESSA G98878-688688888-88--8899
9999999999979988888887788889-89-978777766675664557
7776666654466899899FLAV_DESGI
G98879-898688888987--788888999GATLV7698899-9998789
888-8899787878776663122477788888333276899899FLAV_
DESDE AS8888-68-888888899--9999999999988888-9
99888889887788978887766688542222122555555553332779
999994fxn GS2228-228222222222--2388888
88888888888888888888888888888888777886676553557755
5533221288888888FLAV_MEGEL
G4888--28-8888882MD--AWKQRTEDTGATVI77-------------
--------77222--224444222222244222112--------2fcr
GLGDA5-8Y5DNFC88-88--887777777777776544
45555555555443855557777744653333577999999875553338
99899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKIS
QRGG9997555554444444443328444446666555555555666667
6666433333899899FLAV_AZOVI
GLGDQ5-885777555-55--55555788888888555555555555555
554855555555555666555555888855555544442--288FLAV_
ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG888
8EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE888422426
88688FLAV_ECOLI GC99549784688888987997777777
77888885544444444444444411444477777445577556778888
8887433322100100FLAV_CLOAB
STANS636666333333333333666666666666666666333336336
6336663333336EDENARIFGERIANKVKQI3333336666663chy
VTAEA---KKENIIAA-----------AQAGAS------
-------------------GYVVK-----PFTAATLEEKLNKIFEKLGM-
----- Avrg Consist
99887797877777777779977888888888888667777777777677
66677777676667766655455577776666433355788788Conse
rvation 74664003715454570630035453444474575300
00010100100000000106837601444423355744544484343010
00000 Iteration 0 SP 136702.00
AvSP 10.654 SId 3955 AvSId 0.308
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
69
Consistency iteration
Pre-profiles
Multiple alignment positional consistency scores
70
Pre-profile update iteration
Pre-profiles
Multiple alignment
71
Iterate similarity matrix, guide tree and MSA
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Similarity matrix
Scores
This way of iterating was already implemented in
1984 by Hogeweg and Hesper
55
Guide tree
Multiple alignment
72
Secondary structure-induced alignment
73
PRALINEUsing secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
74
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
75
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
76
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
77
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
78
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
79
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKP
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
80
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
3chy-AA SEQUENCE AA ADKELKFLVVDDFSTMRR
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHH
Write a Comment
User Comments (0)
About PowerShow.com