Using Dynamic Programming To Align Sequences - PowerPoint PPT Presentation

About This Presentation
Title:

Using Dynamic Programming To Align Sequences

Description:

Coding a Global and a Local Algorithm. Understanding the DP concept ... DP invented in the 50s by Bellman -Programming Tabulation ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 70
Provided by: tcof
Learn more at: https://tcoffee.org
Category:

less

Transcript and Presenter's Notes

Title: Using Dynamic Programming To Align Sequences


1
Using Dynamic Programming To Align Sequences
Cédric Notredame
2
Our Scope
Understanding the DP concept
Coding a Global and a Local Algorithm
Aligning with Affine gap penalties
Saving memory
Sophisticated variants
3
Outline
-Coding Dynamic Programming with Non-affine
Penalties
-Turning a global algorithm into a local Algorithm
-Adding affine penalties
-Using A Divide and conquer Strategy
-Tailoring DP to your needs
-The repeated Matches Algorithm
-Double Dynamic Programming
4
Global Alignments Without Affine Gap penalties
Dynamic Programming
5
How To align Two Sequences With a Gap Penalty, A
Substitution matrix and Not too Much Time
Dynamic Programming
6
A bit of History
-DP invented in the 50s by Bellman -Programm
ing ? Tabulation
-Re-invented in 1970 by Needlman and
Wunsch -It took 10 year to find out
7
The Foolish Assumption
The score of each column of the alignment is
independent from the rest of the alignment
It is possible to model the relationship between
two sequences with -A substitution matrix -A
simple gap penalty
8
The Principal of DP
If you extend optimally an optimal alignment of
two sub-sequences, the result remains an optimal
alignment
9
Finding the score of i,j
-Sequence 1 1-i -Sequence 2 1-j
-The optimal alignment of 1-i vs 1-j can
finish in three different manners
X -
X X
- X
10
Finding the score of i,j
- j
1i 1j-1

Three ways to buildthe alignment
1i 1j
i j
1i-1 1j-1

i -
1i-1 1j

11
Finding the score of i,j
1i 1j
In order to Compute the score of
All we need are the scores of
1i-1 1j-1
1i 1j-1
1i-1 1j
12
Formalizing the algorithm
- X
1i 1j-1

F(i-1,j) Gep
X X
1i-1 1j-1
F(i,j) best
F(i-1,j-1) Mati,j

F(i,j-1) Gep
X -
1i-1 1j

13
Arranging Everything in a Table
1I-1 1J-1
1I 1J-1
1I-1 1J
1I 1J
14
Taking Care of the Limits
The DP strategy relies on the idea that ALL the
cells in your table have the same environment
This is NOT true of ALL the cells!!!!
In a Dynamic Programming strategy, the most
delicate part is to take care of the
limits -what happens when you start -what
happens when you finish
15
Taking Care of the Limits
-
F
A
T
-
0
F
A
S
Match2 MisMatch-1 Gap-1
T
-4
16
Filing Up The Matrix


17
0
18
Delivering the alignment Trace-back
Score of 13 Vs 14 ? Optimal Aln Score
19
Trace-back possible implementation
while (!(i0 j0)) if
(tbijsub) SUBSTITUTION
alnIaln_lenseqI--i
alnJaln_lenseqJ--j elsif
(tbijdel) DELETION
alnIaln_len'-' alnJaln_lenseqJ--
j elsif (tbijins)
INSERTION alnIaln_lenseqI0--i
alnJaln_len'-' aln_len
20
Local Alignments Without Affine Gap penalties
Smith and Waterman
21
Getting rid of the pieces of Junk between the
interesting bits
Smith and Waterman
22
(No Transcript)
23
The Smith and Waterman Algorithm
F(i,j) best
24
The Smith and Waterman Algorithm
0 ? Ignore The rest of the Matrix ? Terminate a
local Aln
25
Filing Up a SW Matrix


0
26
Filling up a SW matrix borders
- A N I C E C A T - 0 0 0 0 0 0 0 0 0 C
0 A 0 T 0 A 0 N 0 D 0 O 0 G
0
27
Filling up a SW matrix
- A N I C E C A T - 0 0 0 0 0 0 0 0 0 C
0 0 0 0 2 0 2 0 0 A 0 2
0 0 0 0 0 4 0T 0 0 0
0 0 0 0 2 6A 0 2 0 0
0 0 0 0 4N 0 0 4 2 0
0 0 0 2D 0 0 2 2 0 0
0 0 0O 0 0 0 0 0 0 0
0 0G 0 0 0 0 0 0 0 0
0
Best Local score ? Beginning of the trace-back
28
for (i1 iltlen0 i) for (j1
jltlen1 j) if (res00i-1 eq
res10j-1)s2 else s-1
submati-1j-1s delmati
j-1gep insmati-1j gep if
(subgtdel subgtins subgt0) smati
jsubtbijsubcode elsif(delgtins
delgt0 ) smatijdeltbijde
lcode elsif( insgt0 ) smatijins
tbijinscode else smatijzer
otbijstopcode if (smatijgt
best_score) best_scoresmatij
best_ii best_jj
TurningNW into SW
PrepareTrace back
29
A few things to remember
SW only works if the substitution matrix has been
normalized to give a Negative score to a random
alignment.
30
More than One match
-SW delivers only the best scoring Match
  • If you need more than one match
  • SIM (Huang and Millers)
  • Or
  • Waterman and Eggert (Durbin, p91)

31
Waterman and Eggert
  • Iterative algorithm
  • 1-identify the best match
  • 2-redo SW with used pairs forbidden
  • 3-finish when the last interesting local
    extracted
  • Delivers a collection of non-overlapping local
    alignments
  • Avoid trivial variations of the optimal.

32
Adding Affine Gap Penalties
The Gotoh Algorithm
33
Forcing a bit of Biology into your alignment
The Gotoh Formulation
34
Why Affine gap Penalties are Biologically better
35
But Harder To compute
More Than 3 Ways to extend an Alignment
X -
Deletion
X-XX XXXX
X X
Alignment
- X
Insertion
36
More Questions Need to be asked
For instance, what is the cost of an insertion ?
1I-1 ??X 1J-1 ??X
1I ??- 1J-1 ??X
GEP
GOP
1I ??- 1J ??X
37
SolutionMaintain 3 Tables
Ix Table that contains the score of every
optimal alignment 1i vs 1j that finishes with
an Insertion in sequence X.
Iy Table that contains the score of every
optimal alignment 1I vs 1J that finishes with
an Insertion in sequence Y.
M Table that contains the score of every
optimal alignment 1I vs 1J that finishes with
an alignment between sequence X and Y
38
The Algorithm
39
Trace-back?
Ix
Iy
M
M(i,j)
Start From BEST
Ix(i,j)
Iy(i,j)
40
Trace-back?
Navigate from one table to the next, knowing that
a gap always finishes with an aligned column
41
Going Further ?
With the affine gap penalties, we have increased
the number of possibilities when building our
alignment. CS talk of states and represent this
as a Finite State Automaton (FSA are HMM cousins)
42
Going Further ?
43
Going Further ?
In Theory, there is no Limit on the number of
states one may consider when doing such a
computation.
44
(No Transcript)
45
Going Further ?
Imagine a pairwise alignment algorithm where the
gap penalty depends on the length of the gap.
Can you simplify it realistically so that it
can be efficiently implemented?
46
(No Transcript)
47
A divide and Conquer Strategy
The Myers and Miller Strategy
48
Remember Not To Run Out of Memory
The Myers and Miller Strategy
49
A Score in Linear Space
You never Need More Than The Previous Row To
Compute the optimal score
50
A Score in Linear Space
For I For J R2ijbest For J,
R1jR2j
R1
R2j-1, gep R1j-1mat R1jgep
R2
51
A Score in Linear Space
52
A Score in Linear Space
You never Need More Than The Previous Row To
Compute the optimal score You only need the
matrix for the Trace-Back,
Or do you ????
53
An Alignment in Linear Space
B(i,j)F(i,j)Optimal score of the alignment that
passes through pair i,j
54
An Alignment in Linear Space
Forward Algorithm
Forward Algorithm
Backward algorithm
Backward algorithm
Optimal B(i,j)F(i,j)
55
(No Transcript)
56
An Alignment in Linear Space
Forward Algorithm
Backward algorithm
Recursive divide and conquer strategy Myers
and Miller (Durbin p35)
57
An Alignment in Linear Space
58
A Forward-only Strategy(Durbin, p35)
Forward Algorithm
-Keep Row M in memory -Keep track of which Cell
in Row M lead to the optimal score -Divide on
this cell
59
M
M
60
An interesting application finding sub-optimal
alignments
Forward Algorithm
Forward Algorithm
Backward algorithm
Backward algorithm
Sum over the Forw/Bward and identify the score of
the best aln going through cell i,j
61
Application Non-local models
Double Dynamic Programming
62
Outline
The main limitation of DP Context independent
measure
63
Double Dynamic Programming
High Level Smith and Waterman Dynamic Programming
ScoreMax S(i-1, j-1)RMSd score S(i,
j-1)gp S(i, j-1)gp

RMSd Score
1
Rigid Body Superposition where i and j are forced
together
14
1
13
13
12
5
8
9
64
Double Dynamic Programming
65
Application Repeats
The Durbin Algorithm
66
(No Transcript)
67
In The End Wraping it Up
68
Dynamic Programming
69
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com