Sequence Alignment - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Sequence Alignment

Description:

each char of S (T) aligned with char of T (S) or space -' in O(nm) time, ... Local alignment is often called Smith-Waterman alignment. 21. Gap alignment models ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 33
Provided by: nathanjoh
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment


1
Sequence Alignment
  • Lecture 15 October 20, 2005
  • Algorithms in Biosequence Analysis
  • Nathan Edwards - Fall, 2005

2
Sequence Alignment
  • Global alignment of S, length m, and T, length n
  • each char of S (T) aligned with char of T (S) or
    space -
  • in O(nm) time, using (n1)(m1) space
  • (min) edit-distance and (max) similarity
    formulations
  • Dynamic Programming
  • Base conditions recurrence relation
  • Dynamic programming table (bottom up)
  • Traceback from (n,m) to (0,0) to obtain sequence
    alignment

3
Recurrence Relation
  • Base conditions
  • V(i,0) Ski s(S(k),-), for all i 0,...,n
  • V(0,j) Skj s(-,T(k)), for all j 0,...,m
  • Recurrence
  • V(i,j) max V(i-1,j) s(S(i),-),
    V(i,j-1) s(-,T(j)),
    V(i-1,j-1) s(S(i),T(j))

4
Dynamic Programming Table
5
Similar Protein Sequences(Human v Worm)
  • 8 FAKDFLAGGVAAAISKTAVAPIERVKLLLQVQHASKQITADKQYK
    GIIDCVVRIPKEQGV 67
  • F D GG AAASKTAVAPIERVKLLLQVQ ASK I
    DKYKGID RPKEQGV
  • 12 FLIDLASGGTAAAVSKTAVAPIERVKLLLQVQDASKAIAVDKRYK
    GIMDVLIRVPKEQGV 71
  • 68 LSFWRGNLANVIRYFPTQALNFAFKDKYKQIFLGGVDKRTQFWRY
    FAGNLASGGAAGATS 127
  • WRGNLANVIRYFPTQANFAFKD YK IFL GDK
    FWFAGNLASGGAAGATS
  • 72 AALWRGNLANVIRYFPTQAMNFAFKDTYKAIFLEGLDKKKDFWKF
    FAGNLASGGAAGATS 131
  • 128 LCFVYPLDFARTRLAADVGKAGAEREFRGLGDCLVKIYKSDGIKG
    LYQGFNVSVQGIIIY 187
  • LCFVYPLDFARTRLAADGKA REFGL DCLKI KSDG
    GLYGF VSVQGIIIY
  • 132 LCFVYPLDFARTRLAADIGKAN-DREFKGLADCLIKIVKSDGPIG
    LYRGFFVSVQGIIIY 190
  • 188 RAAYFGIYDTAKGML-PDPKNTHIVISWMIAQTVTAVAGLTSYPF
    DTVRRRMMMQSGRKG 246
  • RAAYFGDTAK D W IAQ VT G
    SYPDTVRRRMMMQSGRK
  • 191 RAAYFGMFDTAKMVFASDGQKLNFFAAWGIAQVVTVGSGILSYPW
    DTVRRRMMMQSGRK- 249
  • 247 TDIMYTGTLDCWRKIARDEGGKAFFKGAWSNVLRGMGGAFVLVLY
    DEIKKY 297
  • DIY TLDC KI EG A FKGA SNV RG GGA VL
    YDEIK
  • 250 -DILYKNTLDCAKKIIQNEGMSAMFKGALSNVFRGTGGALVLAIY
    DEIQKF 299

6
Global Alignment Schematic
T
(0,0)
S
(n,m)
7
End-space free variant
T
(0,0)
S
(n,m)
8
End-space free variant
T
(0,0)
S
(n,m)
9
End-space free variant
T
(0,0)
S
(n,m)
10
End-space free variant
  • Dont charge for optimal alignment starting in
    cells (i,0) or (0,j)
  • Base conditions V(i,0) V(0,j) 0
  • Dont charge for adding spaces at end of
    alignment
  • Find cell (n,j) or (i,m) with maximum similarity
    value, begin traceback from there

11
Approximate Search
T
T
(0,0)
P
(n,m)
Similarity P T d
12
Approximate Search
  • Dont charge for optimal alignment starting in
    cells (0,j)
  • Base conds V(0,j) 0, V(i,0) Ski s(S(k),-)
  • Dont charge for ending alignment at end of P
    (but not necc. T)
  • Find cell (n,j) with similarity value d

13
Local alignment
  • In many biological contexts, two strings may only
    have regions of similarity.
  • S pqraxabcstvq, T xyaxbacsll
  • poor global alignment, but for a axabcs and ß
    axbacs, there is strong similarity.

14
Local alignment problem
  • Given two sequences S, length n, and T, length m,
    find substrings a from S and ß from T whose
    similarity is maximum over all pairs of
    substrings from S and T
  • For S pqraxabcstvq, T xyaxbacsll, a x a b
    c s a x b a c shas similarity 8 for match
    score 2, mismatch -2, and space -1.

15
Local alignment
  • Surprisingly, the optimal local alignment can be
    computed in O(nm) time and O(nm) space.
  • Base cond v(i,0) v(0,j) 0 for all i,j
  • Recurrence v(i,j) max 0, v(i-1,j)
    s(S(i),-),
    v(i,j-1) s(-,T(j)),
    v(i-1,j-1) s(S(i),T(j))
  • Check each cell to find max v(i,j) for all i,j.

16
Local Alignment Schematic
T
(0,0)
S
(n,m)
17
Local Alignment Schematic
T
(0,0)
S
(n,m)
18
Local Alignment Schematic
T
(0,0)
S
(n,m)
19
Local alignment
  • Dont charge for optimal alignment starting in
    any cell (i,j)
  • Base conds V(i,0) V(0,j) 0
  • Can re-start alignment in any cell.
  • Dont charge for ending alignment in any cell
  • Find cell (i,j) with maximum similarity value
  • Traceback from end of alignment.

20
Terminology
  • Global alignment is often called Needleman-Wunsch
    alignment
  • Local alignment is often called Smith-Waterman
    alignment

21
Gap alignment models
  • Consecutive run of spaces in a sequence
    alignment
  • Need to model block insertions and deletions
    better than linear gap model does.
  • No encouragement for long gaps to form
  • Arbitrary gap model
  • cost of gap of length g is w(g)
  • Affine gap model (open extension cost)
  • cost of gap of length g is o e.g

22
Gap alignment models
  • Have to keep track of whether we are opening or
    extending a gap
  • Current DP formulation doesnt cut it!
  • Consider any alignment of S1...i and T1...j.
    Either
  • 1.) S(i) and T(j) are aligned with each other
  • 2.) S(i) is aligned to T(j), with j lt j
  • 3.) T(j) is aligned to S(i), with i lt i or

23
Gap alignment models
  • Let G(i,j) be maximum value of any alignment with
    S(i) aligned with T(j) 1
  • Let E(i,j) be maximum value of any alignment with
    T(j) aligned with a gap 2
  • Let F(i,j) be maximum value of any alignment with
    S(i) aligned with a gap 3
  • Let V(i,j) max E(i,j), F(i,j), G(i,j)

24
Arbitrary gap cost recurrence
  • Alignment type 1
  • G(i,j) V(i-1,j-1) s(S(i),T(j))
  • Alignment type 2
  • E(i,j) max 0kj-1 V(i,k) w(j-k)
  • Alignment type 3
  • F(i,j) max0ki-1 V(k,j) w(i-k)
  • V(i,j) max E(i,j), F(i,j), G(i,j)

25
Arbitrary gap cost recurrence
  • Base conditions
  • V(i,0) -w(i), E(i,0) -w(i)
  • V(0,j) -w(j), F(0,j) -w(j)
  • V(0,0) G(0,0) 0
  • Optimal value of alignment is found in cell (n,m)
  • Traceback may jump multiple cells horizontally or
    vertically
  • Running time is O(nm(nm)), space is O(nm) as
    before.

26
Affine gap model recurrence
  • Base conditions
  • V(i,0) E(i,0) o e.i
  • V(0,j) F(0,j) o e.j
  • V(0,0) G(0,0) 0
  • Recurrences
  • V(i,j) max E(i,j), F(i,j), G(i,j)
  • G(i,j) V(i-1,j-1) s(S(i),T(j))
  • E(i,j) max E(i,j-1) e, V(i,j-1) o e
  • F(i,j) max F(i-1,j) e, V(i-1,j) o e
  • Running time O(nm), space O(nm)

27
Linear space global alignment algorithm
  • Notice that if we only wanted the value of the
    optimal alignment, then O(m) space is sufficient
  • Only use previous row of table when computing
    current row
  • So V(n,m) in O(m) space and O(nm) time.
  • How can we recover the optimal alignment without
    giving up O(m) space?

28
Optimal global alignment in linear space
  • Define VR(i,j) to be the similarity of SR1...i
    and TR1...j
  • Run DP from bottom right corner up left
  • V(n,m) max0km V(n/2,k) VR(n/2,m-k)
  • The optimal alignment can be broken into the
    piece for S1...n/2 and the piece for
    Sn/21...n
  • T1...k aligns with the first half of S,
    whileTk1...m aligns with the second half of
    S.

29
Optimal global alignment in linear space
  • Compute the values in row n/2 of V in O(nm) time
    and O(m) space.
  • Compute the values in row n/2 of VR in O(nm) time
    and O(m) space.
  • Check each possible k to find k in O(m) time.
  • We know there is an optimal alignment passing
    through cell (n/2,k).

30
Optimal global alignment in linear space
  • When computing row n/2 of V and VR, retain DP
    back-pointers.
  • Use back-pointers of V to find an optimal path
    from (n/2,k) to (n/2-1,k1)
  • Use back-pointers of VR to find an optimal path
    from (n/2,k) to (n/21,k2)
  • Recursively solve global alignment of
    S1...n/2-1 T1...k1 and Sn/21...n
    Tk2...m

31
Optimal global alignment in linear space
k
k1
A
n/2-1
n/2
n/21
B
k2
32
Optimal global alignment in linear space
  • Running time analysis
  • T(n,m) T(n/2,k) T(n/2,m-k) O(nm)
  • Final term is time to find k.
  • In second phase, time to find each k
  • first subproblem O(n/2 k),
  • second subproblem O(n/2 (m-k)).
  • Total O(nm/2)
  • T(n,m) O(nm nm/2 nm/4 ....) O(2nm)
Write a Comment
User Comments (0)
About PowerShow.com