Pairwise Sequence Comparison

About This Presentation

Title:

Pairwise Sequence Comparison

Description:

Pairwise Sequence Comparison Stat 246, Spring 2002, Week 5, – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 83

Provided by: Comput517

Learn more at: https://www.stat.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Pairwise Sequence Comparison

1
Pairwise Sequence Comparison
Stat 246, Spring 2002, Week 5,
2
Sequence comparison topics
General concepts Dot plots Global
alignments Scoring matrices Gap penalties
Dynamic programming Chance or common ancestry?
3
Dot Plot

This is the earliest, simplest and most complete
method for comparing two sequences
It is possible to filter the plot to minimise
noise whilst preserving the obvious
relationship
This plot can identify
regions of similarity
internal repeats
rearrangement events

4
(Add a guard row and colum.)
Sequence 2 along
b
A
C
A
C
A
C
T
A
Sequence 1 down
.
a
A
A dot goes where the two sequences match
G
C
A
Connect the dots along diagonals.
C
A
C
A
5
Extensions to dot plots

Modern dot plots are more sophisticated, using
the notions of
window size of diagonal strip centered on an
entry, over which matching is accumulated, and
stringency the extent of agreement required
over the window, before a dot is placed at the
central entry.
e.g. for a window of size 5, we might require
at least 3 matches, and then we put a dot in the
central spot. More complex scoring rules can be
used.

6
Human ? globin vs. human myoglobin
7
Human LDL receptor vs. itself (w30, s9)
8
Human LDL receptor vs. itself (40, 15)
9
Human LDL receptor vs. itself (40, 17.5)
10
Human LDL receptor vs. itself (40, 20)
11
Plasmodium falciparum MSP3 vs. itself (30,9)
12
Plasmodium falciparum MSP3 vs. itself (20,9)
13
Plasmodium falciparum MSP3 vs. itself (10,9)
14
Global alignment
An alignment of two sequences a and b is an
arrangement of a and b by position, where a and b
can be padded with gap symbols to achieve the
same length a AGCACAC-A
or AG-CACACA b A-CACACTA
ACACACT-A If we
read the alignment column-wise, we have a
protocol of edit operations that lead from a to
b. Left Match (A,A) Right Match
(A,A) Delete (G,-) Replace (G,C) Match
(C,C) Insert (-,A) Match (A,A) Match
(C,C) Match (C,C) Match (A,A) Match
(A,A) Match (C,C) Match
(C,C) Replace (A,T) Insert (-,T) Delete
(C,-) Match (A,A) Match (A,A) The
left-hand alignment shows one Delete, one Insert,
and the other edit operations are Matches. The
right-hand alignment shows one Insert, one
Delete, two Replaces, and some trivial ones.
15
Cost (scoring) of global alignments optimal
global alignments
Next we turn the edit protocol into a measure of
distance by assigning a cost or weight S to
each operation. For example, for arbitrary
characters u,v from A we may define S(u,u)
0 S(u,v) 1 for u ? v S(u,-) S(-,v) 1.
(Unit Cost) This scheme is known as the
Levenshtein distance, also called unit cost
model. Its predominant virtue is its simplicity.
In general, more sophisticated cost models must
be used. For example, replacing an amino acid by
a biochemically similar one should weight less
than a replacement by an amino acid with totally
different properties. Details shortly. Now we
are ready to define the most important notion for
sequence analysis The cost of an alignment of
two sequences a and b is the sum of the costs of
all the edit operations that lead from a to
b. An optimal alignment of a and b is an
alignment which has minimal cost among all
possible alignments. The edit distance of a and
b is the cost of an optimal alignment of a and b
under a cost function S. We denote it by
d(a,b). Using the unit cost model for S in our
previous example, we obtain the following
cost a AGCACAC-A or AG-CACACA b
A-CACACTA ACACACT-A cost 2 cost
4 Here it is easily seen that the left-hand
assignment is optimal under the unit cost model,
and hence the edit distance d(a,b) 2.
16
More general scores - costs see later.
134 LQQGELDLVMTSDILPRSELHYSPMFDFEVRLVLAPDHPLASKTQI
TPEDLASETLLI
137 LDSNSVDLVLMGVPPRNVEV
EAEAFMDNPLVVIAPPDHPLAGERAISLARLAEETFVM
DD 6
DR -2
From Henikoff 1996
17
Scoring Matrices
Physical/Chemical similarities comparing two
sequences according to the properties of their
residues may highlight regions of structural
similarity Identity matrices by stressing only
identities in the alignment, stretches of
sequence that may have diverged will not penalise
any remaining common features
18
Scoring Matrices (ctd)
As the direct source of residue by residue
comparison scores the scoring matrix you choose
will have a major impact on the alignment
calculated The most commonly used will be one of
the mutation matrices PAM or BLOSUM Von
Bing will explain the derivation of these and
other mutation matrices next Tuesday. The matrix
that performs best will be the matrix that best
reflects the evolutionary separation of the
sequences being aligned.
19
Statistical motivation for alignment scores
pr(dataH) pr( H) pr( H) x ...
(1-p)apd d disagreements, a
agreements, p (1-e-8at) pr(dataR) pr( R)
pr( R) x ... ( )a( )d
a ? log d ? log . Since p
lt , log lt0, log gt0 score a ? s d ?
(-m) s gt0 match score, -m lt0 mismatch
penalty Note that if at 0, p 6at, 1-p 1 and
so s log4, while -m log8at is large and
negative a big difference in the two
scores. Conversely, if at is large, p (1-e),
1-e, and m log(1-e) -e, while 1-p
(13e), 13e, and so s log(13e) 3e.
Thus the scores are about 31.
AA
GA
AA
GA
20
We can do the same with any other Markov
substitution matrix for molecular evolution.
E.g. with a PAM or BLOSUM matrix of probabilities,
m P 1
P i
pr(dataH) paipaibi(2t) pr(dataR)
paipbi log log
S i
paibi(2t)/ pbi
The elements of a log-odds score matrix are
typically gt 0 on the diagonal and lt 0 off the
diagonal, but not always. Also the relative sizes
of match and mismatch penalties increase as PAMs
(?t) decreases. Thus PAM(120) is more stringent
than PAM(250), while PAM(360) is less stringent
than it. PAM(0) the identity matrix is the
toughest. There are plenty of score matrices
based on other principles.
21
Below diagonal BLOSUM62 substitution
matrixAbove diagonal Difference matrix
obtained by subracting the PAM 160 matrix
entrywise.From Henikoff Henikoff 1992
22
Above diagonal SG scoring system (Feng et al.,
1985) Below diagonal Log-odds matrix for 250
PAMs (Dayhoff et al., 1978)
23
Gap penalties
Gap penalties are usually composed of two
parts Gap opening penalty This reduces the
alignment score and therefore must create more
significant alignment downstream than would be
present if no gap were created The size of the
penalty is usually of the order of one to three
times the size of values in the scoring matrix
24
Gap penalties (ctd)
Gap extension penalty If a gap has been created
then extending it should not be as hard to do On
the other hand we want to limit the size of the
gap to practical lengths A smaller gap extension
penalty may allow an alignment to resolve
situations where complete loops may be missing
between one structure and another
25
Low gap penalty
eclustalw May 24, 1999 1844 lgb1_pea.pep
ck 2970 from 1 to 147 Length 147
hbhu.pep ck 3588 from 1 to
147 Length 147 Pairwise similarity
parameter K-Tuple length 1 Gap
Penalty 3 Number of diagonals
5 Diagonal window size 5 Scoring
Method Percentage Multiple
alignment parameter Gap Penalty (fixed)
1.00 Gap Penalty (varying) 0.05
Gap separation penalty range 8 Percent.
identity for delay 40 List of hydrophilic
residue GPSNDQEKR Protein Weight Matrix
blosum 10
20 30 40 50 60
. . . .
. . LGB1_PEA.pep
--GFTDKQE-ALVNSSSEFKQNLPGYSILFYTIVLEKAPAAKGLF-SF--
LKDTAGVEDS HBHU.pep MVHLTPEEKSAVTALWGKVNVDE
VGGEALGRLLVVY--PWTQRFFESFGDLSTPDAVMGN
. . .. ..
LGB1_PEA.pep
PKLQAHAEQVFGLVRDSAAQLR-TKGEVVLGNATLGAIHVQKGVTNP-HF
VVVKEALLQT HBHU.pep PKVKAHGKKVLGAFSDGLAHLDN
LKGTF----ATLSELHCDKLHVDPENFRLLGNVLVCV
.. . . .
. .. . LGB1_PEA.pep
IKKASGNNWSEELNTAWEVAYDGLATAIKKAMKTA HBHU.pep
LAHHFGKEFTPPVQAAYQKVVAGVANAL--AHKYH
. . . ... . ...
26
Middling gap penalty
eclustalw May 24, 1999 1850 lgb1_pea.pep
ck 2970 from 1 to 147
Length 147 hbhu.pep ck
3588 from 1 to 147 Length 147
Pairwise similarity parameter K-Tuple length
1 Gap Penalty 3 Number
of diagonals 5 Diagonal window size 5
Scoring Method Percentage
Multiple alignment parameter Gap Penalty
(fixed) 25.00 Gap Penalty
(varying) 0.05 Gap separation penalty
range 8 Percent. identity for delay 40
List of hydrophilic residue GPSNDQEKR
Protein Weight Matrix blosum
10 20 30 40
50 60 .
. . . .
. LGB1_PEA.pep ----GFTDKQEALVNSSSEFKQNLPGYSILFY
TIVLEKAPAAKGLFSFLKDTAGVEDSPK HBHU.pep
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPK .
. .. . .
LGB1_PEA.pep LQAHAEQVFGLVRDSAAQLRTKGEVVLGNA
TLGAIHVQKGVTNP-HFVVVKEALLQTIKK HBHU.pep
VKAHGKKVLGAFSDGLAHLDN---LKGTFATLSELHCDKLHVDPENFRLL
GNVLVCVLAH .. . . .
. . . .. . .
. LGB1_PEA.pep ASGNNWSEELNTAWEVAYDGLATAIKKAMKT
A HBHU.pep HFGKEFTPPVQAAYQKVVAGVANALAHKYH--
. ... . ... . .
27
Very high gap penalty
eclustalw May 24, 1999 1852 lgb1_pea.pep
ck 2970 from 1 to 147
Length 147 hbhu.pep ck
3588 from 1 to 147 Length 147
Pairwise similarity parameter K-Tuple length
1 Gap Penalty 3 Number
of diagonals 5 Diagonal window size 5
Scoring Method Percentage
Multiple alignment parameter Gap Penalty
(fixed) 50.00 Gap Penalty
(varying) 0.05 Gap separation penalty
range 8 Percent. identity for delay 40
List of hydrophilic residue GPSNDQEKR
Protein Weight Matrix blosum
10 20 30 40
50 60 .
. . . .
. LGB1_PEA.pep ----GFTDKQEALVNSSSEFKQNLPGYSILFY
TIVLEKAPAAKGLFSFLKDTAGVEDSPK HBHU.pep
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPK .
. .. . .
LGB1_PEA.pep LQAHAEQVFGLVRDSAAQLRTKGEVVLGNA
TLGAIHVQKGVTNPHFVVVKEALLQTIKKA HBHU.pep
VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPEN--FRLLG
NVLVCVLAHH .. . . .
. . ... .. . . .
LGB1_PEA.pep SGNNWSEELNTAWEVAYDGLATAIKKAMKTA
HBHU.pep FGKEFTPPVQAAYQKVVAGVANALAHKYH--
. ... . ... . .
28
Dynamic Programming
For obtaining optimal alignments
This is a mathematical implementation that can
be seen as an extension of the dotplot
method Rather than dots, the comparison matrix
positions are assigned values that reflect the
scores in the scoring matrix
29
Dynamic Programming
The optimum alignment is obtained by tracing the
highest scoring path from the top left-hand
corner to the bottom right-hand corner of the
matrix When the alignment steps away from the
diagonal this implies an insertion or deletion
event, the impact of which can be assessed by the
application of a gap penalty
30
b
A
C
A
C
A
C
T
A
a
A
0
1
0
1
0
1
1
0
G
1
1
1
1
1
1
1
1
C
1
0
1
0
1
0
1
1
A
0
1
0
1
0
1
1
0
C
1
0
1
0
1
0
1
1
A
0
1
0
1
0
1
1
0
C
1
0
1
0
1
0
1
1
A
0
1
0
1
0
1
1
0
31
Dynamic programming the formula
Suppose that our two sequences are a(a1,...,am)
and b(b1,...,bn), and that we denote by dij the
edit distance between the initial segments
ai(a1,...,ai) and bj(b1,...,bj) of a and
b. Extend this to ij0 by writing
d000. Supposing that a deletion or an insertion
incurs a penalty of 1, the following formula
summarizes our verbal argument
dijmin(di-1,j-1 s(ai,bj), di,j-1 1, di-1,j
1). (More is needed to give a complete
algorithm what is it?)
32
b
A
C
A
C
A
C
T
A
0
1
2
3
4
5
6
7
8
a
A
1
0
1
2
3
4
5
6
7
G
2
1
1
2
3
4
5
6
7
C
3
2
1
2
2
3
4
5
6
A
4
3
2
1
2
2
3
4
5
C
5
4
3
2
1
2
2
3
4
A
6
5
4
3
2
1
2
3
3
C
7
6
5
4
3
2
1
2
3
A
8
7
6
5
4
3
2
2
2
33
Chance or common ancestry?
Idea calculate optimal alignment scores for
pairs of sequences where one is a randomized
(shuffled) version of the original. This will
give a distribution of random scores,
representing chance similarity rather than
homology. The score from our original pair of
sequences can be referred to this distribution
and assigned a Z-score (subtract mean of randoms
and divide by SD of randoms), or (better) a
p-value. Criticism Such random a.a. sequences
might have plausible a.a. compositions but are
quite unlike real protein sequences. Partial
reply a) restrict the randomization to blocks
or, b) create a distribution of chance similarity
scores using real a.a. sequences known or assumed
not to be homologous to our query sequence.
Other approaches use theory, but this is still
subject to the criticism above.
34
Dynamic ProgrammingBased on notes by George
Rudy, formerly WEHI.
35
Life must be lived forwards and understood
backwards. Søren Kierkegaard
36
What is DP?
Operations research A mathematical formalism
applicable to problems involving optimization of
decisions over time. (after R. Bellman and
S. Dreyfus) Bioinformatics An algorithm for
finding optimal sequence alignments given an
additive alignment score. ( after R.
Durbin, et al.) Computer programming An
approach to algorithm design whereby the target
problem is decomposed into smaller problems that
are then solved independently. (after R.
Sedgewick)
37
Where did DP come from?
- Richard Bellman - The RAND Corporation -
Dynamic and Programming
38
Where can DP be applied?
- Both discrete and continuous problems
concerning deterministic, stochastic, or adaptive
processes - Multiple fields research, industry,
finance, - Examples allocation
processes smoothing and scheduling
processes optimal search and stopping
techniques optimal trajectories multistage
production processes feedback control
processes Markovian decision processes
39
DP in biomedical literature (1)
40
DP in biomedical literature (2)
- A symmetric-iterated multiple alignment of
protein sequences. Brocchieri, L. and Karlin
S., J. Mol. Biol. 276(1)249-64, 1998. -
Sequence assembly validation by multiple
restriction digest fragment coverage
analysis. Rouchka, E.C. and States, D.J., ISMB.
6140-7, 1998. - Automated protein sequence
database classification. I. Integration of
compositional similarity search, local similarity
search, and multiple sequence
alignment. Gracy, J. and Argos, P.,
Bioinformatics 14(2)164-73, 1998. - A
segment-based dynamic programming algorithm for
predicting gene structure. Wu, T.D., J.
Comput. Biol. 3(3)375-94, 1996. - Automatic
detection of cardiac contours on MR images using
fuzzy logic and dynamic programming. Lalande A.
et al., Proc. AMIA Annu. Fall Symp. 474-8,
1997. - Process models for production of
beta-lactam antibiotics. Bellgardt, K.H., Adv.
Biochem. Eng. Biotechnol. 60153-94, 1998. -
Dynamic programming approach for newborns
incubator humidity control. Bouattoura, D. et
al., IEEE Trans. Biomed. Eng. 45(1)48-55,
1998. - Minimum energy trajectories of the swing
ankle when stepping over obstacles of different
heights. Chou L.S. et al., J. Biomech.
30(2)115-20, 1997. - A theoretical study of the
socioecology of ungulates. II. A dynamic
programming study of the stochastic
formulation. Paveri-Fontana, S.L. and Focardi,
S. Theor. Popul. Biol. 46(3)279-99, 1994.
41
What problems are suitable for DP?
- Essential components (common to all OR
problems) a decision-maker access to results
of decisions - Additionally decisions are
sequential later decisions are affected by
earlier ones effect of a decision can be
calculated independently of other decisions
42
The Stagecoach Problem (1)
H
2
3
E
L
5
5
1
3
C
I
O
4
1
2
4
1
2
F
M
A
B
1
2
8
7
2
0
P
D
J
3
4
2
5
G
N
2
4
K
after S. E. Dreyfus
43
Some terminology
- Vertex - Edge - Path -Monotonic-to-the-right -
(Admissible) path - Stage - State
44
The Stagecoach Problem (2)
H
2
3
E
L
5
5
1
3
C
I
O
4
1
2
4
1
2
F
M
A
B
0
1
2
8
7
2
0
P
D
J
4
3
2
5
G
N
2
4
K
45
The Stagecoach Problem (2)
H
2
3
E
L
5
5
1
3
C
I
O
4
2
1
2
4
1
2
F
M
A
B
0
1
2
8
7
2
0
P
D
J
1
4
3
2
5
G
N
2
4
K
46
The Stagecoach Problem (2)
H
2
3
E
L
5
5
1
3
C
I
O
4
2
1
2
4
1
2
F
M
A
B
0
4
1
2
8
7
2
0
P
D
J
1
4
3
2
5
G
N
2
4
K
47
The Stagecoach Problem (2)
H
10
2
3
E
L
7
5
5
1
3
C
I
O
8
4
2
1
2
4
1
2
F
M
A
B
0
4
1
2
8
7
2
0
P
D
J
1
6
4
3
2
5
G
N
5
2
4
7
K
48
The Stagecoach Problem (2)
H
10
2
3
E
L
9
7
5
5
1
3
C
I
O
12
8
4
2
1
2
4
1
2
F
M
A
B
13
0
8
4
1
2
8
7
2
0
P
D
J
14
1
6
4
3
2
5
G
N
11
5
2
4
7
K
49
Some more terminology
- Optimal value function - Policy - Optimal
policy function
50
The Stagecoach Problem (3)
H
10
2
3
E
L
9
7
5
5
1
3
C
I
O
12
8
4
2
1
2
4
1
2
F
M
A
B
13
0
8
4
1
2
8
7
2
0
D
J
P
14
1
6
4
3
2
5
G
N
11
5
2
4
7
K
51
The Stagecoach Problem (3)
H
10
2
3
E
L
9
7
5
5
1
3
C
I
O
12
8
4
2
1
2
4
1
2
F
M
A
B
13
0
8
4
1
2
8
7
2
0
D
J
P
14
1
6
4
3
2
5
G
N
11
5
2
4
7
K
52
The Stagecoach Problem (3)
H
10
2
3
E
L
9
7
5
5
1
3
C
I
O
12
8
4
2
1
2
4
1
2
F
M
A
B
13
0
8
4
1
2
8
7
2
0
D
J
P
14
1
6
4
3
2
5
G
N
11
5
2
4
7
K
53
The Stagecoach Problem (4)
H
10
2
3
E
L
9
7
5
5
1
3
C
I
O
12
8
4
2
1
2
4
1
2
F
M
A
B
13
0
8
4
1
2
8
7
2
0
D
J
P
14
1
6
4
3
2
5
G
N
11
5
2
4
7
K
54
Efficiency of the DP approach
- At each of 9 vertices where a real choice
existed 2 additions 1 binary comparison -
At the other 6 vertices 1 addition Total
24 additions 9 comparisons - Compare
this with direct evaluation of the original
problem by enumeration of all 20 admissible
paths 5 additions/path 100 additions

20 comparisons
55
Efficiency (2), and the Curse of Dimensionality
In general, for the n-stage problem treated
here, DP involves (n2/2) n additions
Direct enumeration generates
paths, or additions. Thus, for n20, DP
requires 220 additions while direct enumeration
would demand 3,510,364 additions.
56
The Stagecoach Problem (5)
y
H
3
E
L
2
C
I
O
1
F
M
x
A
B
1
2
3
4
5
6
D
J
P
-1
G
N
-2
-3
K
57
The Principle of Optimality, or Bellmans
Principle
An optimal policy has the property that
whatever the initial state and initial decision
are, the remaining decisions must constitute an
optimal policy with regard to the state resulting
from the first decision. (Bellman) or, An
optimal sequence of decisions in a multistage
decision process problem has the property that
whatever the initial stage, state, and decision
are, the remaining decisions must constitute an
optimal sequence of decisions for the remaining
problem, with the stage and state resulting from
the first decision considered as initial
conditions. (Dreyfus) or, An optimal policy
must have the property that no matter what path
is taken to enter a particular state, the
remaining stages (decisions) taken must
constitute an optimal policy for departure from
that state. or, An optimal policy is comprised
of optimal subpolicies. or, An optimal policy
from any state is independent of the path taken
to that state, and is made up entirely of optimal
subpolicies. or, ...
58
The optimal value function
S(x,y) the value of the minimum-value
admissible path connecting the vertex (x,y)
and the terminal vertex (6,0) eu(x,y) the value
of the edge connecting the vertices (x,y) and
(x1, y1) ed(x,y) the value of the edge
connecting the vertices (x,y) and (x1,
y-1) S(x,y) min eu(x,y) S(x1, y1),
ed(x,y) S(x1, y-1) S(6,0) 0.
59
A more formal restatement of common features of
DP problems
A physical system characterized at any stage by a
small set of parameters, the state variables At
each stage of the process there is a choice of a
number of decisions The effect of a decision is
a transformation of the state variables The
past history of the system is of no importance in
determining future actions The purpose of the
process is to maximize some function of the
state variables.
60
The practice of DP
Imbed the specific given problem in a more
general family of problems Define the optimal
value function which associates a value with each
of the various possible initial conditions of
problems in that family Invoke the principle of
optimality in order to deduce a recurrence
relation characterizing that function Seek the
solution of the recurrence relation in order to
obtain the optimal policy function which
furnishes the solution to the specific given
problem and all other problems in the more
general family as well.
61
More practically speaking,
Determine the decision-maker and the decisions to
be made Determine the stages Determine the
possible states Formulate the optimal value
function in the form of a recurrence
relation Calculate and tabulate the optimal
value function for each stage and state Find the
optimal policy (ies) for the problem.
62
New problem, new terminology
Edit operations M(atch), R(eplacement),
I(nsert), D(elete). Edit transcript A string
over the alphabet M, R, I, D that describes a
transformation of one string into another.
Example R D I M D M M A - T H
S A - R T - S Edit (Levens(h)tein)
distance The minimum number of edit operations
necessary to transform one string into another.
(Note matches are not counted.) Example R
D I M D M 1 1 1 0 1 0 4
63
Once again,
Imbed the problem in the more general
family Define the optimal value function Deduce
the recurrence relation Solve for the recurrence
relation to obtain the optimal policy function.
64
The recurrence
Stage position in the edit transcript State
I, D, M, or R Optimal value function D(i,
j) where D(i, j) edit distance of Seq11...i
and Seq21...j Recurrence relation
D(i, j) min 1 D(i-1, j),1 D(i, j-1), t(i,
j) D(i-1, j-1) , where t(i, j) 0 if
Seq1(I) Seq2(j), and 1 otherwise.
65
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 M 1 A
2 T 3 H 4 S 5
66
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 M 1
A 2 T 3 H 4 S 5
67
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 M
1 A 2 T 3 H 4 S 5
68
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
M 1 A 2 T 3 H 4 S 5
69
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 A 2 2 T 3 3 H 4 4 S 5 5
70
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 A 2 2 T 3 3 H 4 4 S 5 5
71
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 A 2 2 T 3 3 H 4 4 S 5 5
72
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 H 4 4 S 5 5
73
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 S 5 5
74
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
75
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
76
The solutions - 1
1 0 1 1 0 3 D M R R M M A T H S - A R
T S
77
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
78
The solutions - 2
1 0 1 0 1 0 3 D M I M D M M A - T H S
- A R T - S
79
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
80
The solutions - 3
1 1 0 1 0 3 R R M D M M A T H S A R
T - S
81
DP, in general (well, for a discrete,
deterministic, additive process, anyway)
F(t, s) Opt r(t, s, x) aF(t, s) x in
X(t, s) and s T(t, s, x) Need not be
additive. When a stochastic process, r and F are
expected values the state transform is random
with a probability distribution PT(t, s, x)
s s, x, and F(t, s) is replaced by ?s
F(t, s) PT(t, s, x) s s, x
82
Life must be lived forwards and understood
backwards. Søren Kierkegaard

Write a Comment

User Comments (0)