Multiple sequence alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple sequence alignment

Description:

... (Heringa 1999) T-Coffee (Notredame Higgins Heringa 2000) HMMER (Eddy 1998) [Hidden Markov Model] SAGA (Notredame Higgins1996) [Genetic algorithm] ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 46
Provided by: Vict80
Category:

less

Transcript and Presenter's Notes

Title: Multiple sequence alignment


1
Multiple sequence alignmentWhy?
  • It is the most important means to assess
    relatedness of a set of sequences
  • Gain information about the structure/function of
    a query sequence (conservation patterns)
  • Construct a phylogenetic tree
  • Putting together a set of sequenced fragments
    (Fragment assembly)
  • Recognise alternative splice sites
  • Many bioinformatics methods depend on it
    (secondary/tertiary structure)

2
Multiple sequence alignment (MSA) of 12
Flavodoxin cheY
3
Pairwise alignment
  • Now we know how to do it
  • How do we get a multiple alignment (three or more
    sequences)?
  • Multiple alignment much greater combinatorial
    explosion than with pairwise alignment..

4
Multi-dimensional dynamic programming(Murata et
al. 1985)
5
Simultaneous Multiple alignmentMulti-dimensional
dynamic programming
  • MSA (Lipman et al., 1989, PNAS 86, 4412)
  • extremely slow and memory intensive
  • up to 8-9 sequences of 250 residues
  • DCA (Stoye et al., 1997, CABIOS 13, 625)
  • still very slow

6
Alternative multiple alignment methods
  • Biopat (Hogeweg Hesper 1984, first method ever)
  • MULTAL (Taylor 1987)
  • DIALIGN (Morgenstern 1996)
  • PRRP (Gotoh 1996)
  • Clustal (Thompson Higgins Gibson 1994)
  • Praline (Heringa 1999)
  • T-Coffee (Notredame Higgins Heringa 2000)
  • HMMER (Eddy 1998) Hidden Markov Model
  • SAGA (Notredame Higgins1996) Genetic algorithm

7
Progressive multiple alignment general principles
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
8
General progressive multiple alignment
technique(follow generated tree)
d
1
3
1
3
2
5
1
3
2
5
1
root
3
2
5
4
9
Progressive multiple alignment
  • Problem
  • Accuracy is very important
  • Errors are propagated into the progressive steps
  • Once a gap, always a gap
  • Feng Doolittle, 1987

10
Pair-wise alignment quality versus sequence
identity(Vogt et al., JMB 249, 816-831,1995)
11
Multiple alignment profilesGribskov et al. 1987
i
A C D ? ? ? W Y
0.3 0.1 0 ? ? ? 0.3 0.3
Gap penalties
0.5
1.0
Position dependent gap penalties
12
Profile-sequence alignment
sequence
profile
ACDVWY
13
Profile-profile alignment
profile
A C D . . Y
profile
ACDVWY
14
Clustal, ClustalW, ClustalX
  • CLUSTAL W/X (Thompson et al., 1994) uses
    Neighbour Joining (NJ) algorithm (Saitou and Nei,
    1984), widely used in phylogenetic analysis, to
    construct guide tree.
  • Sequence blocks are represented by profiles, in
    which the individual sequences are additionally
    weighted according to the branch lengths in the
    NJ tree.
  • Further carefully crafted heuristics include
  • (i) local gap penalties
  • (ii) automatic selection of the amino acid
    substitution matrix, (iii) automatic gap penalty
    adjustment
  • (iv) mechanism to delay alignment of sequences
    that appear to be distant at the time they are
    considered.
  • CLUSTAL (W/X) does not allow iteration (Hogeweg
    and Hesper, 1984 Corpet, 1988, Gotoh, 1996
    Heringa, 1999, 2002)

15
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

16
Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
17
Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
18
Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
19
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

20
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
21
One of the Molecular Biology Dogmas
  • Structure more conserved than sequence

22
Secondary structure-induced alignment
23
Using secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
24
Flavodoxin-cheYUsing predicted secondary
structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
SEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQED
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
ENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQD
DFIPLYDS-LENADLKGKKVSVf
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
AGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDD
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVT
DPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKD
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
GKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QC
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
DG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYD
SWQEFTNT-LSEADLTGKTVALf eeee
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
AAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSV
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee

eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DAL
NKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSA
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
----SLKIDGD--P--ERDEIVSwGSGIADKI--------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
NASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYD
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADD
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht

G
25
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

26
Globalised local alignment
1. Local (SW) alignment (M Po,e)


2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
27
M BLOSUM62, Po 0, Pe 0
28
M BLOSUM62, Po 12, Pe 1
29
M BLOSUM62, Po 60, Pe 5
30
Strategies for multiple sequence alignment
  • Profile pre-processing
  • Secondary structure-induced alignment
  • Globalised local alignment
  • Matrix extension
  • Objective try to avoid (early) errors

31
Matrix extension
  • T-Coffee
  • Tree-based Consistency Objective Function For
    alignmEnt Evaluation
  • Cedric Notredame
  • Des Higgins
  • Jaap Heringa J. Mol. Biol., 302, 205-2172000

32
Matrix extension T COFFEE
2
1
3
1
4
1
3
2
4
2
4
3
33
Integrating alignment methods and alignment
information with T-Coffee
  • Integrating different pair-wise alignment
    techniques (NW, SW, ..)
  • Combining different multiple alignment methods
    (consensus multiple alignment)
  • Combining sequence alignment methods with
    structural alignment techniques
  • Plug in user knowledge

34
Using different sources of alignment information

Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
35
Search matrix extension
36
T-Coffee
  • Combine different alignment techniques by adding
    scores
  • W(A(x), B(y)) ?S(A(x), B(y))
  • A(x) is residue x in sequence A
  • summation is over the scores S of the global and
    local alignments containing the residue pair
    (A(x), B(y))
  • S is sequence identity percentage of the
    associated alignment
  • Combine direct alignment seqA- seqB with each
    seqA-seqI-seqB
  • W(A(x), B(y)) W(A(x), B(y))
  • ?I?A,BMin(W(A(x), I(z)), W(I(z), B(y)))
  • Summation over all third sequences I other than A
    or B

37
T-Coffee
Other sequences
Direct alignment
38
Search matrix extension
39
Evaluating multiple alignments
  • Conflicting standards of truth
  • evolution
  • structure
  • function
  • With orphan sequences no additional information
  • Benchmarks depending on reference alignments
  • Quality issue of available reference alignment
    databases
  • Different ways to quantify agreement with
    reference alignment (sum-of-pairs, column score)
  • Charlie Chaplin problem

40
Evaluating multiple alignments
  • As a standard of truth, often a reference
    alignment based on structural superpositioning is
    taken

41
Evaluation measures
Query
Reference
Column score
Sum-of-Pairs score
42
Evaluating multiple alignments
?SP
BAliBASE alignment nseq len
43
Summary
  • Weighting schemes simulating simultaneous
    multiple alignment
  • Profile pre-processing (global/local)
  • Matrix extension (well balanced scheme)
  • Smoothing alignment signals
  • globalised local alignment
  • Using additional information
  • secondary structure driven alignment
  • Schemes strike balance between speed and
    sensitivity

44
References
  • Heringa, J. (1999) Two strategies for sequence
    comparison profile-preprocessed and secondary
    structure-induced multiple alignment. Comp. Chem.
    23, 341-364.
  • Notredame, C., Higgins, D.G., Heringa, J. (2000)
    T-Coffee a novel method for fast and accurate
    multiple sequence alignment. J. Mol. Biol., 302,
    205-217.
  • Heringa, J. (2002) Local weighting schemes for
    protein multiple sequence alignment. Comput.
    Chem., 26(5), 459-477.

45
Where to find this.http//www.ibivu.cs.vu.nl/tea
ching
Write a Comment
User Comments (0)
About PowerShow.com