Title: Dynamic programming algorithms for all-pairs shortest path and longest common subsequences
1Dynamic programming algorithms for all-pairs
shortest path and longest common subsequences
- We will study a new techniquedynamic programming
algorithms (typically for optimization problems) - Ideas
- Characterize the structure of an optimal solution
- Recursively define the value of an optimal
solution - Compute the value of an optimal solution in a
bottom-up fashion (using matrix to compute) - Backtracking to construct an optimal solution
from computed information.
2Floyd-Warshall algorithm for shortest path
- Use a different dynamic-programming formulation
to solve the all-pairs shortest-paths problem on
a directed graph G(V,E). - The resulting algorithm, known as the
Floyd-Warshall algorithm, runs in O (V3) time. - negative-weight edges may be present,
- but we shall assume that there are no
negative-weight cycles.
3The structure of a shortest path
- We use a different characterization of the
structure of a shortest path than we used in the
matrix-multiplication-based all-pairs algorithms. - The algorithm considers the intermediate
vertices of a shortest path, where an
intermediate vertex of a simple path
pltv1,v2,,vlgt is any vertex in p other than v1
or vl, that is, any vertex in the set
v2,v3,,vl-1
4Continue
- Let the vertices of G be V1,2,,n, and
consider a subset 1,2,,k of vertices for some
k. - For any pair of vertices i,j ? V, consider all
paths from i to j whose intermediate vertices are
all drawn from 1,2,,k,and let p be a
minimum-weight path from among them. - The Floyd-Warshall algorithm exploits a
relationship between path p and shortest paths
from i to j with all intermediate vertices in the
set 1,2,,k-1.
5Relationship
- The relationship depends on whether or not k is
an intermediate vertex of path p. - If k is not an intermediate vertex of path p,
then all intermediate vertices of path p are in
the set 1,2,,k-1. Thus, a shortest path from
vertex i to vertex j with all intermediate
vertices in the set 1,2,,k-1 is also a
shortest path from i to j with all intermediate
vertices in the set 1,2,,k. - If k is an intermediate vertex of path p,then we
break p down into i k
j as shown Figure 2.p1 is a shortest path from i
to k with all intermediate vertices in the set
1,2,,k-1, so as p2.
6All intermediate vertices in 1,2,,k-1
p2
k
p1
j
i
Pall intermediate vertices in 1,2,,k
Figure 2. Path p is a shortest path from vertex
i to vertex j,and k is the highest-numbered
intermediate vertex of p. Path p1, the portion
of path p from vertex i to vertex k,has all
intermediate vertices in the set 1,2,,k-1.The
same holds for path p2 from vertex k to vertex j.
7A recursive solution to the all-pairs shortest
paths problem
- Let dij(k) be the weight of a shortest path from
vertex i to vertex j with all intermediate
vertices in the set 1,2,,k. A recursive
definition is given by - dij(k) wij
if k0, - min(dij(k-1),dik(k-1)dkj(k-1))
if k 1. - The matrix D(n)(dij(n)) gives the final
answer-dij(n) for all i,j
V-because all intermediate vertices are in the
set 1,2,,n.
8 Computing the shortest-path weights bottom up
- FLOYD-WARSHALL(W)
- n rowsW
- D(0) W
- for k 1 to n
- do for i 1 to n
- do for j 1 to n
- dij(k)
min(dij(k-1),dik(k-1)dkj(k-1)) - return D(n)
9Example
2
4
3
1
3
8
1
-5
-4
2
7
5
4
6
10(No Transcript)
11D(2)
(2)
(3)
D(3)
12D(4)
(4)
(5)
D(5)
13Comparison of two strings
- Longest common subsequence
- Shortest common supersequence
- Edit distance between two sequences
141. Longest common subsequence
- Definition 1 Given a sequence Xx1x2...xm,
another sequence Zz1z2...zk is a subsequence of
X if there exists a strictly increasing sequence
i1i2...ik of indices of X such that for all
j1,2,...k, we have xijzj. - Example 1 If Xabcdefg, Zabdg is a subsequence
of X. Xabcdefg,Zab d g
15- Definition 2 Given two sequences X and Y. A
sequence Z is a common subsequence of X and Y if
Z is a subsequence of both X and Y. - Example 2 Xabcdefg and Yaaadgfd. Zadf is a
common subsequence of X and Y. - Xabc defg
- Yaaaadgfd
- Za d f
16- Definition 3 A longest common subsequence of X
and Y is a common subsequence of X and Y with the
longest length. (The length of a sequence is the
number of letters in the seuqence.) - Longest common subsequence may not be unique.
17Longest common subsequence problem
- Input Two sequences Xx1x2...xm, and
Yy1y2...yn. - Output a longest common subsequence of X and Y.
- A brute-force approach
- Suppose that m?n. Try all subsequence of X
(There are 2m subsequence of X), test if such a
subsequence is also a subsequence of Y, and
select the one with the longest length.
18Charactering a longest common subsequence
- Theorem (Optimal substructure of an LCS)
- Let Xx1x2...xm, and Yy1y2...yn be two
sequences, and - Zz1z2...zk be any LCS of X and Y.
- 1. If xmyn, then zkxmyn and Z1..k-1 is an
LCS of X1..m-1 and Y1..n-1. - 2. If xm ?yn, then zk?xm implies that Z is an LCS
of X1..m-1 and Y. - 2. If xm ?yn, then zk?yn implies that Z is an LCS
of X and Y1..n-1.
19The recursive equation
- Let ci,j be the length of an LCS of X1...i
and X1...j. - ci,j can be computed as follows
- 0
if i0 or j0, - ci,j ci-1,j-11 if
i,jgt0 and xiyj, - maxci,j-1,ci-1,j if i,jgt0
and xi?yj. - Computing the length of an LCS
- There are n?m ci,js. So we can compute them in
a specific order.
20The algorithm to compute an LCS
- 1. for i1 to m do
- 2. ci,00
- 3. for j0 to n do
- 4. c0,j0
- 5. for i1 to m do
- 6. for j1 to n do
- 7.
- 8. if xi yj then
- 9. ci,jci-1,j-11
- 10 bi,j1
- 11. else if ci-1,jgtci,j-1 then
- 12. ci,jci-1,j
- 13. bi,j2
- 14. else ci,jci,j-1
- 15. bi,j3
- 14
21- Example 3 XBDCABA and YABCBDAB.
22Constructing an LCS (back-tracking)
- We can find an LCS using bi,js.
- We start with bn,m and track back to some cell
b0,i or bi,0. - The algorithm to construct an LCS
- 1. im
- 2. jn
- 3. if i0 or j0 then exit
- 4. if bi,j1 then
-
- ii-1
- jj-1
- print xi
-
- 5. if bi,j2 ii-1
- 6. if bi,j3 jj-1
- 7. Goto Step 3.
- The time complexity O(nm).
232. Shortest common supersequence
- Definition Let X and Y be two sequences. A
sequence Z is a supersequence of X and Y if both
X and Y are subsequence of Z. - Shortest common supersequence problem
- Input Two sequences X and Y.
- Output a shortest common supersequence of X and
Y.
24- Recursive Equation
- Let ci,j be the length of an LCS of X1...i
and X1...j. - ci,j can be computed as follows
- j
if i0 - i
if j0, - ci,j ci-1,j-11 if
i,jgt0 and xiyj, - minci,j-11,ci-1,j1 if
i,jgt0 and xi?yj.
25(No Transcript)
263. Edit distance between two sequences
- Three operations
- insertion inserting an x into abc (between a
and b), we get axbc. - deletion deleting b from abc, we get ac.
- replacement Given a sequence abc, replacing a
with x, we get xbc.
27- Definition Suppose that we can use three edit
operations (insertion, deletion, and replacement)
to edit a sequence into another. The edit
distance between two sequences is the minimum
number of operations required to edit one
sequence into another. - Note each operation is counted as 1.
- Weighted edit distance
- There is a weight on each operation.
- For example s(a,b)1, s(a, _)1.5, s(b,a)1,
s(b,_)1.5. - Where the weight comes from
- For DNA and protein sequences, it is from
statistics.
28Alignment of sequences -- an alternative
- An alignment of two sequences is obtained by
inserting spaces into or at either end of X and
Ysuch that the two resulting sequences X and Y
are of the same length. That is, every letter in
X is opposite to a unique letter in Y. - The alignment value is defined as
- where Xi and Yi denote the two letters in
column i of the alignment and s(Xi, Yi)
is the score (weight) of these opposing letters. - There are several popular socre schemes for DNA
and protein sequences.
29- Facts The edit distance between two sequences is
the same as the alignment value of two sequences
if we use the same score scheme. - Recursive equation
- ci,jmin ci-1, j-1s(Xi, Yj), ci,
j-1s(_,Yj), ci-1, j)s(Xi,_). - Time and space complexity
- Both are O(nm) or O(n2) if both sequences have
equal length n. - Why?
- We have to compute ci,j (the cost) and bi,j
(for back-tracking). Each will take O(n2).
30Linear space algorithm
- Hints Computing ci,j needs linear space
whereas back-tracking needs O(nm) time.
31- To compute ci,j, we need ci-1,j-1, ci,j-1,
ci-1,j. - So, to get cn,m, we only have to keep dark
cells. - However, if we do not have all the bi,js, we
can not get the alignment (nor the edit process,
the subsequence, the supersequence).
32- Discussion Each time we only keep a few bi,js
and we can re-compute the bi,js again. In this
way, we can get a linear space algorithm.
However, the time complexity is increased to
O(n3).
33- A Better Idea find a cuting point.
- For the problems of smaller size, we do the same
thing until one of the segment contains 1 letter.
- Key each time, we fix the middle point (n/2) of
X.
34- Example Xabcdefgh and Yaacdefhh.
- Score scheme match -- 0 and mismatch -- 1.
- The alignment
- abcdefgh abcd efgh
- aacdefhh aacd efhh
- /\
- cutting point (4,4).
35- Finding the cutting point
- Let Xx1x2x3...xn and Yy1y2y3...ym.
- Define XTxnxn-1...x1 and YTymym-1 ...y1.
- Let ci,j be the cost of optimal alignment for
X1...i and Y1...j and cck,l be the cost of
optimal alignment for XT1...k and YT1...l. - for (i1, iltn i)
- if( (cn/2, iccn-n/2, m-i)
cn,n) - point i
- We need to check two rows, cn/2,1,
cn/2,2, ...cn/2,m and ccn-n/2, 1,
ccn-n/2,2, ... ccn-n/2,m. O(m) space.
36The algorithm
- 1. compute cn,n, the n/2-th row and the
(n/21)-th row of c. - 2. find the cutting point (n/2, i) as shown
above. - 3. if i-n/2 1 then compute the alignment
of X1...n/2) and Y1...i. - 4. if n-n/21 1 then compute the alignment
of Xn/21...n and Yi1...n. - 5. if i-n/2 ! 1 and n-n/21 !1 then
- recursive on step 1-4 for the two pairs of
sequences X1...n/2) and Y1...i, and
Xn/21...n and Yi1...n finally combine
the two alignments for the two pairs of
sequences.
37- Time complexity analysis
- The first round needs T time, where T is the
time for the normal algorithm. (O(n2).) - 2nd round needs 1/2 T. (0.5 n ? i 0.5 n ?
(n-i)0.5n2.) - 3rd round need 1/4 T.
- i-th round needs 1/2i-1 T.
- Total time T(1/21/41/8 ... )T 2T O(n2).