Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments

Description:

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments. Susan Bibeault. June 9, 2000 ... Optimal Algorithms (MSA, MWT, MUSEQAL) ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 30
Provided by: scie7
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments


1
Multiple Sequence Alignment by Iterative
Tree-Neighbor Alignments
  • Susan Bibeault
  • June 9, 2000

2
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

3
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

4
Multiple Sequence Alignment
  • Problem
  • Given Sequence Set
  • Insert gaps into sequences
  • so that evolutionary conserved regions are
    aligned
  • Important tool
  • Relate Homologous Proteins
  • Discover Conserved Regions

VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKV
NVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK
5
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

6
Scoring Multiple Alignments
? cost(i,j) 6
? cost(edge) 1 m
7
Alignments
  • Scoring
  • Cost Matrix
  • C (aa1, aa2)
  • Gaps Penalties
  • Simple
  • C (aa, -)
  • Affine
  • C(-) Len C (aa,-)

V
L
S
P
A
D
N
V
K
A
G
L
S
D
G
E
W
Q
L
V
L
Cost(s1..i,ti..j) min(
Cost(s1..i,ti..j-1) g,
Cost(s1..i-1,ti..j-1) C(si,tj)
Cost(s1..i-1,ti..j) g))
8
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

9
Current Approaches
  • Global Methods
  • Optimal Algorithms (MSA, MWT, MUSEQAL)
  • Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL,
    AMULT, DFALIGN, MAP, PRRP, AMPS)
  • Local methods
  • PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker,
    Iteralign
  • Combined (GENALIGN, ASSEMBLE, DCA)
  • Statistical (HMMT, SAGA, SAM, Match Box)
  • Parsimony (MALIGN, TreeAlign)

Global Alignment ABCDEFGHI
ABCD-FGHI Local Alignment XXXABCDYYY ZZZ
ABCDEEEE
  • Global Methods
  • Optimal Algorithms (MSA, MWT, MUSEQAL)
  • Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL,
    AMULT, DFALIGN, MAP, PRRP, AMPS)
  • Local methods
  • PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker,
    Iteralign
  • Combined (GENALIGN, ASSEMBLE, DCA)
  • Statistical (HMMT, SAGA, SAM, Match Box)
  • Parsimony (MALIGN, TreeAlign)

10
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

11
Our Heuristic
  • Distance Estimation
  • Tree Construction
  • Node Initialization
  • Tree Partitioning
  • Iteration

12
Estimation of Protein Distance
  • Aligned Sequences Estimated Pair Distances
  • Issue Implied vs. Optimal Pair Alignments

PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYN
KF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSE
SDVW PEVIRMQDDNPFSFSQSDVY
PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY
13
Optimal Pair vs. Implied Pair
14
Interior Node Classification
  • Interior Nodes Classified by Percent Identity
  • PID ( matched residues) / ( total residues)
  • User Specified Tiers
  • User Specified Cost Criterion
  • Example
  • PID gt 60 -- PAM 40 High Gap Penalties
  • PID gt 40 -- PAM 120 Medium Gap Penalties
  • PID lt 40 -- PAM 200 Low Gap Penalty

15
Ordering Alignments
  • Isolate Sub Trees
  • Threshold PID
  • Order Alignments
  • Sub Tree
  • Border Nodes
  • Integrate All

16
Interior Alignments
  • Sum of Pairs
  • Bounded Search
  • Implementation
  • Modular
  • Reentrant
  • Flexible Cost Criterion

17
Generating Consensus
  • Alignment (A1,A2,A3)
  • Consensus X
  • Min (? Di(Ai,X) )
  • For Each Position i
  • Xi ? ?

A1
D1
D2
X
A2
D3
A3
Min (cost(?, A1i) cost(?, A2i) cost(?, A3i))
18
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

19
Testing the Method
  • BAliBASE benchmark
  • Correct Alignments
  • Core Blocks of Conserved Motifs
  • Typical Hard Problem Sets
  • Protein Parsimony
  • Measures Evolutionary Steps of Alignment

20
Baseline BAliBASE SP
better
21
Baseline BAliBASE TC
better
22
Baseline - ProtPars
better
23
Orphans/Families BAliBASE SP
better
24
Orphans/Families ProtPars
better
25
Larger Families
better
26
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

27
Conclusions
  • Solution Quality
  • Captures Evolutionary Information
  • Iterations Converge Quickly
  • Useful Tool

28
Outline
  • Problem Statement and Importance
  • Terminology
  • Current Approaches
  • Our Alignment Heuristic
  • Performance Results
  • Conclusions
  • Future Work

29
Future Work
  • Improved Alignment Consensus
  • Multiple Partitioning Thresholds
  • Multiple Solutions
  • Integrated Phylogeny Modifications
  • Parallel Implementation
Write a Comment
User Comments (0)
About PowerShow.com