Title: Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments
1Multiple Sequence Alignment by Iterative
Tree-Neighbor Alignments
- Susan Bibeault
- June 9, 2000
2Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
3Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
4Multiple Sequence Alignment
- Problem
- Given Sequence Set
- Insert gaps into sequences
- so that evolutionary conserved regions are
aligned - Important tool
- Relate Homologous Proteins
- Discover Conserved Regions
VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKV
NVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK
5Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
6Scoring Multiple Alignments
? cost(i,j) 6
? cost(edge) 1 m
7Alignments
- Scoring
- Cost Matrix
- C (aa1, aa2)
-
- Gaps Penalties
- Simple
- C (aa, -)
-
- Affine
- C(-) Len C (aa,-)
V
L
S
P
A
D
N
V
K
A
G
L
S
D
G
E
W
Q
L
V
L
Cost(s1..i,ti..j) min(
Cost(s1..i,ti..j-1) g,
Cost(s1..i-1,ti..j-1) C(si,tj)
Cost(s1..i-1,ti..j) g))
8Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
9Current Approaches
- Global Methods
- Optimal Algorithms (MSA, MWT, MUSEQAL)
- Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL,
AMULT, DFALIGN, MAP, PRRP, AMPS) - Local methods
- PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker,
Iteralign - Combined (GENALIGN, ASSEMBLE, DCA)
- Statistical (HMMT, SAGA, SAM, Match Box)
- Parsimony (MALIGN, TreeAlign)
Global Alignment ABCDEFGHI
ABCD-FGHI Local Alignment XXXABCDYYY ZZZ
ABCDEEEE
- Global Methods
- Optimal Algorithms (MSA, MWT, MUSEQAL)
- Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL,
AMULT, DFALIGN, MAP, PRRP, AMPS) - Local methods
- PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker,
Iteralign - Combined (GENALIGN, ASSEMBLE, DCA)
- Statistical (HMMT, SAGA, SAM, Match Box)
- Parsimony (MALIGN, TreeAlign)
10Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
11Our Heuristic
- Distance Estimation
- Tree Construction
- Node Initialization
- Tree Partitioning
- Iteration
12Estimation of Protein Distance
- Aligned Sequences Estimated Pair Distances
- Issue Implied vs. Optimal Pair Alignments
PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYN
KF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSE
SDVW PEVIRMQDDNPFSFSQSDVY
PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY
13Optimal Pair vs. Implied Pair
14Interior Node Classification
- Interior Nodes Classified by Percent Identity
- PID ( matched residues) / ( total residues)
- User Specified Tiers
- User Specified Cost Criterion
- Example
- PID gt 60 -- PAM 40 High Gap Penalties
- PID gt 40 -- PAM 120 Medium Gap Penalties
- PID lt 40 -- PAM 200 Low Gap Penalty
15Ordering Alignments
- Isolate Sub Trees
- Threshold PID
- Order Alignments
- Sub Tree
- Border Nodes
- Integrate All
16Interior Alignments
- Sum of Pairs
- Bounded Search
- Implementation
- Modular
- Reentrant
- Flexible Cost Criterion
17Generating Consensus
- Alignment (A1,A2,A3)
- Consensus X
- Min (? Di(Ai,X) )
- For Each Position i
- Xi ? ?
A1
D1
D2
X
A2
D3
A3
Min (cost(?, A1i) cost(?, A2i) cost(?, A3i))
18Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
19Testing the Method
- BAliBASE benchmark
- Correct Alignments
- Core Blocks of Conserved Motifs
- Typical Hard Problem Sets
- Protein Parsimony
- Measures Evolutionary Steps of Alignment
20Baseline BAliBASE SP
better
21Baseline BAliBASE TC
better
22Baseline - ProtPars
better
23Orphans/Families BAliBASE SP
better
24Orphans/Families ProtPars
better
25Larger Families
better
26Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
27Conclusions
- Solution Quality
- Captures Evolutionary Information
- Iterations Converge Quickly
- Useful Tool
28Outline
- Problem Statement and Importance
- Terminology
- Current Approaches
- Our Alignment Heuristic
- Performance Results
- Conclusions
- Future Work
29Future Work
- Improved Alignment Consensus
- Multiple Partitioning Thresholds
- Multiple Solutions
- Integrated Phylogeny Modifications
- Parallel Implementation