Dynamic programming algorithms for all-pairs shortest path and longest common subsequences - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic programming algorithms for all-pairs shortest path and longest common subsequences

Description:

Dynamic programming algorithms for all-pairs shortest path and ... min(dij(k-1),dik(k-1)+dkj(k-1 ... Figure 3 Comparison of two strings Longest common ... – PowerPoint PPT presentation

Number of Views:308
Avg rating:3.0/5.0
Slides: 38
Provided by: xuying
Category:

less

Transcript and Presenter's Notes

Title: Dynamic programming algorithms for all-pairs shortest path and longest common subsequences


1
Dynamic programming algorithms for all-pairs
shortest path and longest common subsequences
  • We will study a new techniquedynamic programming
    algorithms (typically for optimization problems)
  • Ideas
  • Characterize the structure of an optimal solution
  • Recursively define the value of an optimal
    solution
  • Compute the value of an optimal solution in a
    bottom-up fashion (using matrix to compute)
  • Backtracking to construct an optimal solution
    from computed information.

2
Floyd-Warshall algorithm for shortest path
  • Use a different dynamic-programming formulation
    to solve the all-pairs shortest-paths problem on
    a directed graph G(V,E).
  • The resulting algorithm, known as the
    Floyd-Warshall algorithm, runs in O (V3) time.
  • negative-weight edges may be present,
  • but we shall assume that there are no
    negative-weight cycles.

3
The structure of a shortest path
  • We use a different characterization of the
    structure of a shortest path than we used in the
    matrix-multiplication-based all-pairs algorithms.
  • The algorithm considers the intermediate
    vertices of a shortest path, where an
    intermediate vertex of a simple path
    pltv1,v2,,vlgt is any vertex in p other than v1
    or vl, that is, any vertex in the set
    v2,v3,,vl-1

4
Continue
  • Let the vertices of G be V1,2,,n, and
    consider a subset 1,2,,k of vertices for some
    k.
  • For any pair of vertices i,j ? V, consider all
    paths from i to j whose intermediate vertices are
    all drawn from 1,2,,k,and let p be a
    minimum-weight path from among them.
  • The Floyd-Warshall algorithm exploits a
    relationship between path p and shortest paths
    from i to j with all intermediate vertices in the
    set 1,2,,k-1.

5
Relationship
  • The relationship depends on whether or not k is
    an intermediate vertex of path p.
  • If k is not an intermediate vertex of path p,
    then all intermediate vertices of path p are in
    the set 1,2,,k-1. Thus, a shortest path from
    vertex i to vertex j with all intermediate
    vertices in the set 1,2,,k-1 is also a
    shortest path from i to j with all intermediate
    vertices in the set 1,2,,k.
  • If k is an intermediate vertex of path p,then we
    break p down into i k
    j as shown Figure 2.p1 is a shortest path from i
    to k with all intermediate vertices in the set
    1,2,,k-1, so as p2.

6
All intermediate vertices in 1,2,,k-1
p2
k
p1
j
i
Pall intermediate vertices in 1,2,,k
Figure 2. Path p is a shortest path from vertex
i to vertex j,and k is the highest-numbered
intermediate vertex of p. Path p1, the portion
of path p from vertex i to vertex k,has all
intermediate vertices in the set 1,2,,k-1.The
same holds for path p2 from vertex k to vertex j.
7
A recursive solution to the all-pairs shortest
paths problem
  • Let dij(k) be the weight of a shortest path from
    vertex i to vertex j with all intermediate
    vertices in the set 1,2,,k. A recursive
    definition is given by
  • dij(k) wij
    if k0,
  • min(dij(k-1),dik(k-1)dkj(k-1))
    if k 1.
  • The matrix D(n)(dij(n)) gives the final
    answer-dij(n) for all i,j
    V-because all intermediate vertices are in the
    set 1,2,,n.

8
Computing the shortest-path weights bottom up
  • FLOYD-WARSHALL(W)
  • n rowsW
  • D(0) W
  • for k 1 to n
  • do for i 1 to n
  • do for j 1 to n
  • dij(k)
    min(dij(k-1),dik(k-1)dkj(k-1))
  • return D(n)

9
Example
  • Figure 3

2
4
3
1
3
8
1
-5
-4
2
7
5
4
6
10
(No Transcript)
11
D(2)
(2)
(3)
D(3)
12
D(4)
(4)
(5)
D(5)
13
Comparison of two strings
  • Longest common subsequence
  • Shortest common supersequence
  • Edit distance between two sequences

14
1. Longest common subsequence
  • Definition 1 Given a sequence Xx1x2...xm,
    another sequence Zz1z2...zk is a subsequence of
    X if there exists a strictly increasing sequence
    i1i2...ik of indices of X such that for all
    j1,2,...k, we have xijzj.
  • Example 1 If Xabcdefg, Zabdg is a subsequence
    of X. Xabcdefg,Zab d g

15
  • Definition 2 Given two sequences X and Y. A
    sequence Z is a common subsequence of X and Y if
    Z is a subsequence of both X and Y.
  • Example 2 Xabcdefg and Yaaadgfd. Zadf is a
    common subsequence of X and Y.
  • Xabc defg
  • Yaaaadgfd
  • Za d f

16
  • Definition 3 A longest common subsequence of X
    and Y is a common subsequence of X and Y with the
    longest length. (The length of a sequence is the
    number of letters in the seuqence.)
  • Longest common subsequence may not be unique.

17
Longest common subsequence problem
  • Input Two sequences Xx1x2...xm, and
    Yy1y2...yn.
  • Output a longest common subsequence of X and Y.
  • A brute-force approach
  • Suppose that m?n. Try all subsequence of X
    (There are 2m subsequence of X), test if such a
    subsequence is also a subsequence of Y, and
    select the one with the longest length.

18
Charactering a longest common subsequence
  • Theorem (Optimal substructure of an LCS)
  • Let Xx1x2...xm, and Yy1y2...yn be two
    sequences, and
  • Zz1z2...zk be any LCS of X and Y.
  • 1. If xmyn, then zkxmyn and Z1..k-1 is an
    LCS of X1..m-1 and Y1..n-1.
  • 2. If xm ?yn, then zk?xm implies that Z is an LCS
    of X1..m-1 and Y.
  • 2. If xm ?yn, then zk?yn implies that Z is an LCS
    of X and Y1..n-1.

19
The recursive equation
  • Let ci,j be the length of an LCS of X1...i
    and X1...j.
  • ci,j can be computed as follows
  • 0
    if i0 or j0,
  • ci,j ci-1,j-11 if
    i,jgt0 and xiyj,
  • maxci,j-1,ci-1,j if i,jgt0
    and xi?yj.
  • Computing the length of an LCS
  • There are n?m ci,js. So we can compute them in
    a specific order.

20
The algorithm to compute an LCS
  • 1. for i1 to m do
  • 2. ci,00
  • 3. for j0 to n do
  • 4. c0,j0
  • 5. for i1 to m do
  • 6. for j1 to n do
  • 7.
  • 8. if xi yj then
  • 9. ci,jci-1,j-11
  • 10 bi,j1
  • 11. else if ci-1,jgtci,j-1 then
  • 12. ci,jci-1,j
  • 13. bi,j2
  • 14. else ci,jci,j-1
  • 15. bi,j3
  • 14

21
  • Example 3 XBDCABA and YABCBDAB.

22
Constructing an LCS (back-tracking)
  • We can find an LCS using bi,js.
  • We start with bn,m and track back to some cell
    b0,i or bi,0.
  • The algorithm to construct an LCS
  • 1. im
  • 2. jn
  • 3. if i0 or j0 then exit
  • 4. if bi,j1 then
  • ii-1
  • jj-1
  • print xi
  • 5. if bi,j2 ii-1
  • 6. if bi,j3 jj-1
  • 7. Goto Step 3.
  • The time complexity O(nm).

23
2. Shortest common supersequence
  • Definition Let X and Y be two sequences. A
    sequence Z is a supersequence of X and Y if both
    X and Y are subsequence of Z.
  • Shortest common supersequence problem
  • Input Two sequences X and Y.
  • Output a shortest common supersequence of X and
    Y.

24
  • Recursive Equation
  • Let ci,j be the length of an LCS of X1...i
    and X1...j.
  • ci,j can be computed as follows
  • j
    if i0
  • i
    if j0,
  • ci,j ci-1,j-11 if
    i,jgt0 and xiyj,
  • minci,j-11,ci-1,j1 if
    i,jgt0 and xi?yj.

25
(No Transcript)
26
3. Edit distance between two sequences
  • Three operations
  • insertion inserting an x into abc (between a
    and b), we get axbc.
  • deletion deleting b from abc, we get ac.
  • replacement Given a sequence abc, replacing a
    with x, we get xbc.

27
  • Definition Suppose that we can use three edit
    operations (insertion, deletion, and replacement)
    to edit a sequence into another. The edit
    distance between two sequences is the minimum
    number of operations required to edit one
    sequence into another.
  • Note each operation is counted as 1.
  • Weighted edit distance
  • There is a weight on each operation.
  • For example s(a,b)1, s(a, _)1.5, s(b,a)1,
    s(b,_)1.5.
  • Where the weight comes from
  • For DNA and protein sequences, it is from
    statistics.

28
Alignment of sequences -- an alternative
  • An alignment of two sequences is obtained by
    inserting spaces into or at either end of X and
    Ysuch that the two resulting sequences X and Y
    are of the same length. That is, every letter in
    X is opposite to a unique letter in Y.
  • The alignment value is defined as
  • where Xi and Yi denote the two letters in
    column i of the alignment and s(Xi, Yi)
    is the score (weight) of these opposing letters.
  • There are several popular socre schemes for DNA
    and protein sequences.

29
  • Facts The edit distance between two sequences is
    the same as the alignment value of two sequences
    if we use the same score scheme.
  • Recursive equation
  • ci,jmin ci-1, j-1s(Xi, Yj), ci,
    j-1s(_,Yj), ci-1, j)s(Xi,_).
  • Time and space complexity
  • Both are O(nm) or O(n2) if both sequences have
    equal length n.
  • Why?
  • We have to compute ci,j (the cost) and bi,j
    (for back-tracking). Each will take O(n2).

30
Linear space algorithm
  • Hints Computing ci,j needs linear space
    whereas back-tracking needs O(nm) time.

31
  • To compute ci,j, we need ci-1,j-1, ci,j-1,
    ci-1,j.
  • So, to get cn,m, we only have to keep dark
    cells.
  • However, if we do not have all the bi,js, we
    can not get the alignment (nor the edit process,
    the subsequence, the supersequence).

32
  • Discussion Each time we only keep a few bi,js
    and we can re-compute the bi,js again. In this
    way, we can get a linear space algorithm.
    However, the time complexity is increased to
    O(n3).

33
  • A Better Idea find a cuting point.
  • For the problems of smaller size, we do the same
    thing until one of the segment contains 1 letter.
  • Key each time, we fix the middle point (n/2) of
    X.

34
  • Example Xabcdefgh and Yaacdefhh.
  • Score scheme match -- 0 and mismatch -- 1.
  • The alignment
  • abcdefgh abcd efgh
  • aacdefhh aacd efhh
  • /\
  • cutting point (4,4).

35
  • Finding the cutting point
  • Let Xx1x2x3...xn and Yy1y2y3...ym.
  • Define XTxnxn-1...x1 and YTymym-1 ...y1.
  • Let ci,j be the cost of optimal alignment for
    X1...i and Y1...j and cck,l be the cost of
    optimal alignment for XT1...k and YT1...l.
  • for (i1, iltn i)
  • if( (cn/2, iccn-n/2, m-i)
    cn,n)
  • point i
  • We need to check two rows, cn/2,1,
    cn/2,2, ...cn/2,m and ccn-n/2, 1,
    ccn-n/2,2, ... ccn-n/2,m. O(m) space.

36
The algorithm
  • 1. compute cn,n, the n/2-th row and the
    (n/21)-th row of c.
  • 2. find the cutting point (n/2, i) as shown
    above.
  • 3. if i-n/2 1 then compute the alignment
    of X1...n/2) and Y1...i.
  • 4. if n-n/21 1 then compute the alignment
    of Xn/21...n and Yi1...n.
  • 5. if i-n/2 ! 1 and n-n/21 !1 then
  • recursive on step 1-4 for the two pairs of
    sequences X1...n/2) and Y1...i, and
    Xn/21...n and Yi1...n finally combine
    the two alignments for the two pairs of
    sequences.

37
  • Time complexity analysis
  • The first round needs T time, where T is the
    time for the normal algorithm. (O(n2).)
  • 2nd round needs 1/2 T. (0.5 n ? i 0.5 n ?
    (n-i)0.5n2.)
  • 3rd round need 1/4 T.
  • i-th round needs 1/2i-1 T.
  • Total time T(1/21/41/8 ... )T 2T O(n2).
Write a Comment
User Comments (0)
About PowerShow.com