Sequence Alignment - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Sequence Alignment

Description:

Sequence Alignment Slides courtesy of Serafim Batzoglou, Stanford Univ. – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: root1153
Learn more at: http://www.cbcb.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment


1
Sequence Alignment
Slides courtesy of Serafim Batzoglou, Stanford
Univ.
2
Evolution at the DNA level
C
ACGGTGCAGTCACCA
ACGTTGCAGTCCACCA
SEQUENCE EDITS
REARRANGEMENTS
3
Sequence conservation implies function
100
40
  • Interleukin region in human and mouse

4
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGG
TCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC-
-GACCGC--GGTCGATTTGCCCGAC
Definition Given two strings x x1x2...xM, y
y1y2yN, an alignment is an assignment of
gaps to positions 0,, M in x, and 0,, N in y,
so as to line up each letter in one sequence
with either a letter, or a gap in the other
sequence
5
Scoring Function
  • Sequence edits
  • AGGCCTC
  • Mutations
  • AGGACTC
  • Insertions
  • AGGGCCTC
  • Deletions
  • AGG.CTC
  • Scoring Function
  • Match m
  • Mismatch -s
  • Gap -d
  • Score F ( matches) ? m - ( mismatches) ? s
    (gaps) ? d

6
How do we compute the best alignment?
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
Too many possible alignments O( 2MN)
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
7
Alignment is additive
  • Observation
  • The score of aligning x1xM
  • y1yN
  • is additive
  • Say that x1xi xi1xM
  • aligns to y1yj yj1yN
  • The two scores add up
  • F(x1M, y1N) F(x1i, y1j)
    F(xi1M, yj1N)

8
Dynamic Programming
  • We will now describe a dynamic programming
    algorithm
  • Suppose we wish to align
  • x1xM
  • y1yN
  • Let
  • F(i,j) optimal score of aligning
  • x1xi
  • y1yj

9
Dynamic Programming (contd)
  • Notice three possible cases
  • xi aligns to yj
  • x1xi-1 xi
  • y1yj-1 yj
  • 2. xi aligns to a gap
  • x1xi-1 xi
  • y1yj -
  • yj aligns to a gap
  • x1xi -
  • y1yj-1 yj

m, if xi yj F(i,j) F(i-1, j-1)
-s, if not
F(i,j) F(i-1, j) - d
F(i,j) F(i, j-1) - d
10
Dynamic Programming (contd)
  • How do we know which case is correct?
  • Inductive assumption
  • F(i, j-1), F(i-1, j), F(i-1, j-1) are optimal
  • Then,
  • F(i-1, j-1) s(xi, yj)
  • F(i, j) max F(i-1, j) d
  • F( i, j-1) d
  • Where s(xi, yj) m, if xi yj -s, if not

11
Example
  • x AGTA m 1
  • y ATA s -1
  • d -1

F(i,j) i 0 1 2 3 4
A G T A
0 -1 -2 -3 -4
A -1 1 0 -1 -2
T -2 0 0 1 0
A -3 -1 -1 0 2
Optimal Alignment F(4,3) 2 AGTA A - TA
j 0
1
2
3
12
The Needleman-Wunsch Matrix
x1 xM
Every nondecreasing path from (0,0) to (M, N)
corresponds to an alignment of the two
sequences
y1 yN
Can think of it as a divide-and-conquer algorithm
13
The Needleman-Wunsch Algorithm
  • Initialization.
  • F(0, 0) 0
  • F(0, j) - j ? d
  • F(i, 0) - i ? d
  • Main Iteration. Filling in partial alignments
  • For each i 1M
  • For each j 1N
  • F(i-1,j-1) s(xi, yj) case 1
  • F(i, j) max F(i-1, j) d case
    2
  • F(i, j-1) d case 3
  • DIAG, if case 1
  • Ptr(i,j) LEFT, if case 2
  • UP, if case 3
  • Termination. F(M, N) is the optimal score, and
  • from Ptr(M, N) can trace back optimal alignment

14
Performance
  • Time
  • O(NM)
  • Space
  • O(NM)
  • More efficient methods are available

15
A variant of the basic algorithm
  • Maybe it is OK to have an unlimited of gaps in
    the beginning and end

----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCG
AGTTCATCTATCAC--GACCGC--GGTCG--------------
  • Then, we dont want to penalize gaps in the ends

16
Different types of overlaps
17
The Overlap Detection variant
  • Changes
  • Initialization
  • For all i, j,
  • F(i, 0) 0
  • F(0, j) 0
  • Termination
  • maxi F(i, N)
  • FOPT max maxj F(M, j)

x1 xM
y1 yN
18
The local alignment problem
  • Given two strings x x1xM,
  • y y1yN
  • Find substrings x, y whose similarity is
    maximal
  • x aaaacccccgggg
  • y cccgggaaccaacc

19
Why local alignment
  • Genes are shuffled between genomes
  • Portions of proteins (domains) are often conserved

20
Cross-species genome similarity
  • 98 of genes are conserved between any two
    mammals
  • gt70 average similarity in protein sequence

hum_a GTTGACAATAGAGGGTCTGGCAGAGGCTC------------
--------- _at_ 57331/400001 mus_a
GCTGACAATAGAGGGGCTGGCAGAGGCTC---------------------
_at_ 78560/400001 rat_a GCTGACAATAGAGGGGCTGGCAGAGA
CTC--------------------- _at_ 112658/369938 fug_a
TTTGTTGATGGGGAGCGTGCATTAATTTCAGGCTATTGTTAACAGGCTCG
_at_ 36008/68174 hum_a CTGGCCGCGGTGCGGAGCGTCTGGA
GCGGAGCACGCGCTGTCAGCTGGTG _at_ 57381/400001 mus_a
CTGGCCCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTG
_at_ 78610/400001 rat_a CTGGCCCCGGTGCGGAGCGTCTGGAG
CGGAGCACGCGCTGTCAGCTGGTG _at_ 112708/369938 fug_a
TGGGCCGAGGTGTTGGATGGCCTGAGTGAAGCACGCGCTGTCAGCTGGCG
_at_ 36058/68174 hum_a AGCGCACTCTCCTTTCAGGCAGCT
CCCCGGGGAGCTGTGCGGCCACATTT _at_ 57431/400001 mus_a
AGCGCACTCG-CTTTCAGGCCGCTCCCCGGGGAGCTGAGCGGCCACATTT
_at_ 78659/400001 rat_a AGCGCACTCG-CTTTCAGGCCGCTCC
CCGGGGAGCTGCGCGGCCACATTT _at_ 112757/369938 fug_a
AGCGCTCGCG------------------------AGTCCCTGCCGTGTCC
_at_ 36084/68174 hum_a AACACCATCATCACCCCTCCCCGGC
CTCCTCAACCTCGGCCTCCTCCTCG _at_ 57481/400001 mus_a
AACACCGTCGTCA-CCCTCCCCGGCCTCCTCAACCTCGGCCTCCTCCTCG
_at_ 78708/400001 rat_a AACACCGTCGTCA-CCCTCCCCGGCC
TCCTCAACCTCGGCCTCCTCCTCG _at_ 112806/369938 fug_a
CCGAGGACCCTGA-------------------------------------
_at_ 36097/68174
atoh enhancer in human, mouse, rat, fugu fish
21
The Smith-Waterman algorithm
  • Idea Ignore badly aligning regions
  • Modifications to Needleman-Wunsch
  • Initialization F(0, j) F(i, 0) 0
  • 0
  • Iteration F(i, j) max F(i 1, j) d
  • F(i, j 1) d
  • F(i 1, j 1) s(xi, yj)

22
The Smith-Waterman algorithm
  • Termination
  • If we want the best local alignment
  • FOPT maxi,j F(i, j)
  • If we want all local alignments scoring gt t
  • For all i, j find F(i, j) gt t, and trace back

23
Affine gaps
?(n)
  • ?(n) d (n 1)?e
  • gap gap
  • open extend
  • To compute optimal alignment,
  • At position i,j, need to remember best score if
    gap is open
  • best score if gap is not open
  • F(i, j) score of alignment x1xi to y1yj
  • if xi aligns to yj
  • G(i, j) score if xi, or yj, aligns to a gap

e
d
24
Needleman-Wunsch with affine gaps
  • Initialization F(i, 0) d (i 1)?e
  • F(0, j) d (j 1)?e
  • Iteration
  • F(i 1, j 1) s(xi, yj)
  • F(i, j) max
  • G(i 1, j 1) s(xi, yj)
  • F(i 1, j) d
  • F(i, j 1) d
  • G(i, j) max
  • G(i, j 1) e
  • G(i 1, j) e
  • Termination same
Write a Comment
User Comments (0)
About PowerShow.com