Dynamic programming algorithms for all-pairs shortest path and longest common subsequences - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic programming algorithms for all-pairs shortest path and longest common subsequences

Description:

Dynamic programming algorithms for all-pairs shortest path and ... min(dij(k-1),dik(k-1)+dkj(k-1 ... Figure 3 Comparison of two strings Longest common ... – PowerPoint PPT presentation

Number of Views:308

Avg rating:3.0/5.0

Slides: 38

Provided by: xuying

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic programming algorithms for all-pairs shortest path and longest common subsequences

1
Dynamic programming algorithms for all-pairs
shortest path and longest common subsequences

We will study a new techniquedynamic programming
algorithms (typically for optimization problems)
Ideas
Characterize the structure of an optimal solution
Recursively define the value of an optimal
solution
Compute the value of an optimal solution in a
bottom-up fashion (using matrix to compute)
Backtracking to construct an optimal solution
from computed information.

2
Floyd-Warshall algorithm for shortest path

Use a different dynamic-programming formulation
to solve the all-pairs shortest-paths problem on
a directed graph G(V,E).
The resulting algorithm, known as the
Floyd-Warshall algorithm, runs in O (V3) time.
negative-weight edges may be present,
but we shall assume that there are no
negative-weight cycles.

3
The structure of a shortest path

We use a different characterization of the
structure of a shortest path than we used in the
matrix-multiplication-based all-pairs algorithms.
The algorithm considers the intermediate
vertices of a shortest path, where an
intermediate vertex of a simple path
pltv1,v2,,vlgt is any vertex in p other than v1
or vl, that is, any vertex in the set
v2,v3,,vl-1

4
Continue

Let the vertices of G be V1,2,,n, and
consider a subset 1,2,,k of vertices for some
k.
For any pair of vertices i,j ? V, consider all
paths from i to j whose intermediate vertices are
all drawn from 1,2,,k,and let p be a
minimum-weight path from among them.
The Floyd-Warshall algorithm exploits a
relationship between path p and shortest paths
from i to j with all intermediate vertices in the
set 1,2,,k-1.

5
Relationship

The relationship depends on whether or not k is
an intermediate vertex of path p.
If k is not an intermediate vertex of path p,
then all intermediate vertices of path p are in
the set 1,2,,k-1. Thus, a shortest path from
vertex i to vertex j with all intermediate
vertices in the set 1,2,,k-1 is also a
shortest path from i to j with all intermediate
vertices in the set 1,2,,k.
If k is an intermediate vertex of path p,then we
break p down into i k
j as shown Figure 2.p1 is a shortest path from i
to k with all intermediate vertices in the set
1,2,,k-1, so as p2.

6
All intermediate vertices in 1,2,,k-1
p2
k
p1
j
i
Pall intermediate vertices in 1,2,,k
Figure 2. Path p is a shortest path from vertex
i to vertex j,and k is the highest-numbered
intermediate vertex of p. Path p1, the portion
of path p from vertex i to vertex k,has all
intermediate vertices in the set 1,2,,k-1.The
same holds for path p2 from vertex k to vertex j.
7
A recursive solution to the all-pairs shortest
paths problem

Let dij(k) be the weight of a shortest path from
vertex i to vertex j with all intermediate
vertices in the set 1,2,,k. A recursive
definition is given by
dij(k) wij
if k0,
min(dij(k-1),dik(k-1)dkj(k-1))
if k 1.
The matrix D(n)(dij(n)) gives the final
answer-dij(n) for all i,j
V-because all intermediate vertices are in the
set 1,2,,n.

8
Computing the shortest-path weights bottom up

FLOYD-WARSHALL(W)
n rowsW
D(0) W
for k 1 to n
do for i 1 to n
do for j 1 to n
dij(k)
min(dij(k-1),dik(k-1)dkj(k-1))
return D(n)

9
Example

Figure 3

2
4
3
1
3
8
1
-5
-4
2
7
5
4
6
10
(No Transcript)
11
D(2)
(2)
(3)
D(3)
12
D(4)
(4)
(5)
D(5)
13
Comparison of two strings

Longest common subsequence
Shortest common supersequence
Edit distance between two sequences

14
1. Longest common subsequence

Definition 1 Given a sequence Xx1x2...xm,
another sequence Zz1z2...zk is a subsequence of
X if there exists a strictly increasing sequence
i1i2...ik of indices of X such that for all
j1,2,...k, we have xijzj.
Example 1 If Xabcdefg, Zabdg is a subsequence
of X. Xabcdefg,Zab d g

Definition 2 Given two sequences X and Y. A
sequence Z is a common subsequence of X and Y if
Z is a subsequence of both X and Y.
Example 2 Xabcdefg and Yaaadgfd. Zadf is a
common subsequence of X and Y.
Xabc defg
Yaaaadgfd
Za d f

Definition 3 A longest common subsequence of X
and Y is a common subsequence of X and Y with the
longest length. (The length of a sequence is the
number of letters in the seuqence.)
Longest common subsequence may not be unique.

17
Longest common subsequence problem

Input Two sequences Xx1x2...xm, and
Yy1y2...yn.
Output a longest common subsequence of X and Y.
A brute-force approach
Suppose that m?n. Try all subsequence of X
(There are 2m subsequence of X), test if such a
subsequence is also a subsequence of Y, and
select the one with the longest length.

18
Charactering a longest common subsequence

Theorem (Optimal substructure of an LCS)
Let Xx1x2...xm, and Yy1y2...yn be two
sequences, and
Zz1z2...zk be any LCS of X and Y.
1. If xmyn, then zkxmyn and Z1..k-1 is an
LCS of X1..m-1 and Y1..n-1.
2. If xm ?yn, then zk?xm implies that Z is an LCS
of X1..m-1 and Y.
2. If xm ?yn, then zk?yn implies that Z is an LCS
of X and Y1..n-1.

19
The recursive equation

Let ci,j be the length of an LCS of X1...i
and X1...j.
ci,j can be computed as follows
0
if i0 or j0,
ci,j ci-1,j-11 if
i,jgt0 and xiyj,
maxci,j-1,ci-1,j if i,jgt0
and xi?yj.
Computing the length of an LCS
There are n?m ci,js. So we can compute them in
a specific order.

20
The algorithm to compute an LCS

1. for i1 to m do
2. ci,00
3. for j0 to n do
4. c0,j0
5. for i1 to m do
6. for j1 to n do
7.
8. if xi yj then
9. ci,jci-1,j-11
10 bi,j1
11. else if ci-1,jgtci,j-1 then
12. ci,jci-1,j
13. bi,j2
14. else ci,jci,j-1
15. bi,j3
14

Example 3 XBDCABA and YABCBDAB.

22
Constructing an LCS (back-tracking)

We can find an LCS using bi,js.
We start with bn,m and track back to some cell
b0,i or bi,0.
The algorithm to construct an LCS
1. im
2. jn
3. if i0 or j0 then exit
4. if bi,j1 then
ii-1
jj-1
print xi
5. if bi,j2 ii-1
6. if bi,j3 jj-1
7. Goto Step 3.
The time complexity O(nm).

23
2. Shortest common supersequence

Definition Let X and Y be two sequences. A
sequence Z is a supersequence of X and Y if both
X and Y are subsequence of Z.
Shortest common supersequence problem
Input Two sequences X and Y.
Output a shortest common supersequence of X and
Y.

Recursive Equation
Let ci,j be the length of an LCS of X1...i
and X1...j.
ci,j can be computed as follows
j
if i0
i
if j0,
ci,j ci-1,j-11 if
i,jgt0 and xiyj,
minci,j-11,ci-1,j1 if
i,jgt0 and xi?yj.

25
(No Transcript)
26
3. Edit distance between two sequences

Three operations
insertion inserting an x into abc (between a
and b), we get axbc.
deletion deleting b from abc, we get ac.
replacement Given a sequence abc, replacing a
with x, we get xbc.

Definition Suppose that we can use three edit
operations (insertion, deletion, and replacement)
to edit a sequence into another. The edit
distance between two sequences is the minimum
number of operations required to edit one
sequence into another.
Note each operation is counted as 1.
Weighted edit distance
There is a weight on each operation.
For example s(a,b)1, s(a, _)1.5, s(b,a)1,
s(b,_)1.5.
Where the weight comes from
For DNA and protein sequences, it is from
statistics.

28
Alignment of sequences -- an alternative

An alignment of two sequences is obtained by
inserting spaces into or at either end of X and
Ysuch that the two resulting sequences X and Y
are of the same length. That is, every letter in
X is opposite to a unique letter in Y.
The alignment value is defined as
where Xi and Yi denote the two letters in
column i of the alignment and s(Xi, Yi)
is the score (weight) of these opposing letters.
There are several popular socre schemes for DNA
and protein sequences.

Facts The edit distance between two sequences is
the same as the alignment value of two sequences
if we use the same score scheme.
Recursive equation
ci,jmin ci-1, j-1s(Xi, Yj), ci,
j-1s(_,Yj), ci-1, j)s(Xi,_).
Time and space complexity
Both are O(nm) or O(n2) if both sequences have
equal length n.
Why?
We have to compute ci,j (the cost) and bi,j
(for back-tracking). Each will take O(n2).

30
Linear space algorithm

Hints Computing ci,j needs linear space
whereas back-tracking needs O(nm) time.

To compute ci,j, we need ci-1,j-1, ci,j-1,
ci-1,j.
So, to get cn,m, we only have to keep dark
cells.
However, if we do not have all the bi,js, we
can not get the alignment (nor the edit process,
the subsequence, the supersequence).

Discussion Each time we only keep a few bi,js
and we can re-compute the bi,js again. In this
way, we can get a linear space algorithm.
However, the time complexity is increased to
O(n3).

A Better Idea find a cuting point.
For the problems of smaller size, we do the same
thing until one of the segment contains 1 letter.
Key each time, we fix the middle point (n/2) of
X.

Example Xabcdefgh and Yaacdefhh.
Score scheme match -- 0 and mismatch -- 1.
The alignment
abcdefgh abcd efgh
aacdefhh aacd efhh
/\
cutting point (4,4).

Finding the cutting point
Let Xx1x2x3...xn and Yy1y2y3...ym.
Define XTxnxn-1...x1 and YTymym-1 ...y1.
Let ci,j be the cost of optimal alignment for
X1...i and Y1...j and cck,l be the cost of
optimal alignment for XT1...k and YT1...l.
for (i1, iltn i)
if( (cn/2, iccn-n/2, m-i)
cn,n)
point i
We need to check two rows, cn/2,1,
cn/2,2, ...cn/2,m and ccn-n/2, 1,
ccn-n/2,2, ... ccn-n/2,m. O(m) space.

36
The algorithm

1. compute cn,n, the n/2-th row and the
(n/21)-th row of c.
2. find the cutting point (n/2, i) as shown
above.
3. if i-n/2 1 then compute the alignment
of X1...n/2) and Y1...i.
4. if n-n/21 1 then compute the alignment
of Xn/21...n and Yi1...n.
5. if i-n/2 ! 1 and n-n/21 !1 then
recursive on step 1-4 for the two pairs of
sequences X1...n/2) and Y1...i, and
Xn/21...n and Yi1...n finally combine
the two alignments for the two pairs of
sequences.

Time complexity analysis
The first round needs T time, where T is the
time for the normal algorithm. (O(n2).)
2nd round needs 1/2 T. (0.5 n ? i 0.5 n ?
(n-i)0.5n2.)
3rd round need 1/4 T.
i-th round needs 1/2i-1 T.
Total time T(1/21/41/8 ... )T 2T O(n2).

Write a Comment

User Comments (0)