Title: Using Dynamic Programming To Align Sequences
1Using Dynamic Programming To Align Sequences
Cédric Notredame
2Our Scope
Understanding the DP concept
Coding a Global and a Local Algorithm
Aligning with Affine gap penalties
Saving memory
Sophisticated variants
3Outline
-Coding Dynamic Programming with Non-affine
Penalties
-Turning a global algorithm into a local Algorithm
-Adding affine penalties
-Using A Divide and conquer Strategy
-Tailoring DP to your needs
-The repeated Matches Algorithm
-Double Dynamic Programming
4Global Alignments Without Affine Gap penalties
Dynamic Programming
5How To align Two Sequences With a Gap Penalty, A
Substitution matrix and Not too Much Time
Dynamic Programming
6A bit of History
-DP invented in the 50s by Bellman -Programm
ing ? Tabulation
-Re-invented in 1970 by Needlman and
Wunsch -It took 10 year to find out
7The Foolish Assumption
The score of each column of the alignment is
independent from the rest of the alignment
It is possible to model the relationship between
two sequences with -A substitution matrix -A
simple gap penalty
8The Principal of DP
If you extend optimally an optimal alignment of
two sub-sequences, the result remains an optimal
alignment
9Finding the score of i,j
-Sequence 1 1-i -Sequence 2 1-j
-The optimal alignment of 1-i vs 1-j can
finish in three different manners
X -
X X
- X
10Finding the score of i,j
- j
1i 1j-1
Three ways to buildthe alignment
1i 1j
i j
1i-1 1j-1
i -
1i-1 1j
11Finding the score of i,j
1i 1j
In order to Compute the score of
All we need are the scores of
1i-1 1j-1
1i 1j-1
1i-1 1j
12Formalizing the algorithm
- X
1i 1j-1
F(i-1,j) Gep
X X
1i-1 1j-1
F(i,j) best
F(i-1,j-1) Mati,j
F(i,j-1) Gep
X -
1i-1 1j
13Arranging Everything in a Table
1I-1 1J-1
1I 1J-1
1I-1 1J
1I 1J
14Taking Care of the Limits
The DP strategy relies on the idea that ALL the
cells in your table have the same environment
This is NOT true of ALL the cells!!!!
In a Dynamic Programming strategy, the most
delicate part is to take care of the
limits -what happens when you start -what
happens when you finish
15Taking Care of the Limits
-
F
A
T
-
0
F
A
S
Match2 MisMatch-1 Gap-1
T
-4
16Filing Up The Matrix
170
18Delivering the alignment Trace-back
Score of 13 Vs 14 ? Optimal Aln Score
19Trace-back possible implementation
while (!(i0 j0)) if
(tbijsub) SUBSTITUTION
alnIaln_lenseqI--i
alnJaln_lenseqJ--j elsif
(tbijdel) DELETION
alnIaln_len'-' alnJaln_lenseqJ--
j elsif (tbijins)
INSERTION alnIaln_lenseqI0--i
alnJaln_len'-' aln_len
20Local Alignments Without Affine Gap penalties
Smith and Waterman
21Getting rid of the pieces of Junk between the
interesting bits
Smith and Waterman
22(No Transcript)
23The Smith and Waterman Algorithm
F(i,j) best
24The Smith and Waterman Algorithm
0 ? Ignore The rest of the Matrix ? Terminate a
local Aln
25Filing Up a SW Matrix
0
26Filling up a SW matrix borders
- A N I C E C A T - 0 0 0 0 0 0 0 0 0 C
0 A 0 T 0 A 0 N 0 D 0 O 0 G
0
27Filling up a SW matrix
- A N I C E C A T - 0 0 0 0 0 0 0 0 0 C
0 0 0 0 2 0 2 0 0 A 0 2
0 0 0 0 0 4 0T 0 0 0
0 0 0 0 2 6A 0 2 0 0
0 0 0 0 4N 0 0 4 2 0
0 0 0 2D 0 0 2 2 0 0
0 0 0O 0 0 0 0 0 0 0
0 0G 0 0 0 0 0 0 0 0
0
Best Local score ? Beginning of the trace-back
28for (i1 iltlen0 i) for (j1
jltlen1 j) if (res00i-1 eq
res10j-1)s2 else s-1
submati-1j-1s delmati
j-1gep insmati-1j gep if
(subgtdel subgtins subgt0) smati
jsubtbijsubcode elsif(delgtins
delgt0 ) smatijdeltbijde
lcode elsif( insgt0 ) smatijins
tbijinscode else smatijzer
otbijstopcode if (smatijgt
best_score) best_scoresmatij
best_ii best_jj
TurningNW into SW
PrepareTrace back
29A few things to remember
SW only works if the substitution matrix has been
normalized to give a Negative score to a random
alignment.
30More than One match
-SW delivers only the best scoring Match
- If you need more than one match
- SIM (Huang and Millers)
- Or
- Waterman and Eggert (Durbin, p91)
31Waterman and Eggert
- Iterative algorithm
- 1-identify the best match
- 2-redo SW with used pairs forbidden
- 3-finish when the last interesting local
extracted
- Delivers a collection of non-overlapping local
alignments - Avoid trivial variations of the optimal.
32Adding Affine Gap Penalties
The Gotoh Algorithm
33Forcing a bit of Biology into your alignment
The Gotoh Formulation
34Why Affine gap Penalties are Biologically better
35But Harder To compute
More Than 3 Ways to extend an Alignment
X -
Deletion
X-XX XXXX
X X
Alignment
- X
Insertion
36More Questions Need to be asked
For instance, what is the cost of an insertion ?
1I-1 ??X 1J-1 ??X
1I ??- 1J-1 ??X
GEP
GOP
1I ??- 1J ??X
37SolutionMaintain 3 Tables
Ix Table that contains the score of every
optimal alignment 1i vs 1j that finishes with
an Insertion in sequence X.
Iy Table that contains the score of every
optimal alignment 1I vs 1J that finishes with
an Insertion in sequence Y.
M Table that contains the score of every
optimal alignment 1I vs 1J that finishes with
an alignment between sequence X and Y
38The Algorithm
39Trace-back?
Ix
Iy
M
M(i,j)
Start From BEST
Ix(i,j)
Iy(i,j)
40Trace-back?
Navigate from one table to the next, knowing that
a gap always finishes with an aligned column
41Going Further ?
With the affine gap penalties, we have increased
the number of possibilities when building our
alignment. CS talk of states and represent this
as a Finite State Automaton (FSA are HMM cousins)
42Going Further ?
43Going Further ?
In Theory, there is no Limit on the number of
states one may consider when doing such a
computation.
44(No Transcript)
45Going Further ?
Imagine a pairwise alignment algorithm where the
gap penalty depends on the length of the gap.
Can you simplify it realistically so that it
can be efficiently implemented?
46(No Transcript)
47A divide and Conquer Strategy
The Myers and Miller Strategy
48Remember Not To Run Out of Memory
The Myers and Miller Strategy
49A Score in Linear Space
You never Need More Than The Previous Row To
Compute the optimal score
50A Score in Linear Space
For I For J R2ijbest For J,
R1jR2j
R1
R2j-1, gep R1j-1mat R1jgep
R2
51A Score in Linear Space
52A Score in Linear Space
You never Need More Than The Previous Row To
Compute the optimal score You only need the
matrix for the Trace-Back,
Or do you ????
53An Alignment in Linear Space
B(i,j)F(i,j)Optimal score of the alignment that
passes through pair i,j
54An Alignment in Linear Space
Forward Algorithm
Forward Algorithm
Backward algorithm
Backward algorithm
Optimal B(i,j)F(i,j)
55(No Transcript)
56An Alignment in Linear Space
Forward Algorithm
Backward algorithm
Recursive divide and conquer strategy Myers
and Miller (Durbin p35)
57An Alignment in Linear Space
58A Forward-only Strategy(Durbin, p35)
Forward Algorithm
-Keep Row M in memory -Keep track of which Cell
in Row M lead to the optimal score -Divide on
this cell
59M
M
60An interesting application finding sub-optimal
alignments
Forward Algorithm
Forward Algorithm
Backward algorithm
Backward algorithm
Sum over the Forw/Bward and identify the score of
the best aln going through cell i,j
61Application Non-local models
Double Dynamic Programming
62Outline
The main limitation of DP Context independent
measure
63Double Dynamic Programming
High Level Smith and Waterman Dynamic Programming
ScoreMax S(i-1, j-1)RMSd score S(i,
j-1)gp S(i, j-1)gp
RMSd Score
1
Rigid Body Superposition where i and j are forced
together
14
1
13
13
12
5
8
9
64Double Dynamic Programming
65Application Repeats
The Durbin Algorithm
66(No Transcript)
67 In The End Wraping it Up
68Dynamic Programming
69(No Transcript)