Title: Introduction to Sequence Alignment
1Introduction to Sequence Alignment
2(No Transcript)
3Why Align Sequences?
- Find homology within the same species
- Find clues to gene function
- Practical issues in experiments
- Find homology in other species
- Gather info for an evolutionary model
- Gene families
4The Most Visual Way of Aligning Two Sequences
5Dot Matrix Alignment
CACTAGGC AGCTAGGA
Gibbs McIntyre (1970)
6Dot Matrix Alignment
7- Has many variations
- Can be used to find sequence repeats
- Find self-complimentary subsequences of RNA to
predict secondary structure - Still used today
8Alignment using Dynamic Programming
9An Example
- GCGCATGGATTGAGCGA
- TGCGCCATTGATGACCA
- A possible alignment
- -GCGC-ATGGATTGAGCGA
- TGCGCCATTGAT-GACC-A
10Alignments
- -GCGC-ATGGATTGAGCGA
- TGCGCCATTGAT-GACC-A
- Three elements
- Perfect matches
- Mismatches
- Gaps
11Choosing Alignments
- There are many possible alignments
- For example, compare
- -GCGC-ATGGATTGAGCGA
- TGCGCCATTGAT-GACC-A
- to
- ------GCGCATGGATTGAGCGA
- TGCGCC----ATTGATGACCA--
- Which one is better?
12Scoring Rule
- Example Score
- ( matches) ( mismatches) ( gaps) x 2
13Example
- -GCGC-ATGGATTGAGCGA
- TGCGCCATTGAT-GACC-A
- Score (1x13) (-1x2) (-2x4) 3
- ------GCGCATGGATTGAGCGA
- TGCGCC----ATTGATGACCA--
- Score (1x5) (-1x6) (-2x11) -23
14Optimal Alignment
- Optimal alignment is achieved at best similarity
score d, thus is determined by the scoring rule
15Finding the Best Alignment Score
- The additive form of the score allows to perform
dynamic programming to find the best score
efficiently - Guaranteed to find the best alignment
16Assume that an Optimal Score Exists
- d(s,t) Optimal score for globally aligning s
and t
17The Idea
- The best alignment that ends at a given pair of
bases the best among best alignments of the
sequences up to that point, plus the score for
aligning the two additional bases.
18Dynamic Programming
- Consider the best alignment score of two
sequences s, t at base/residue i1, j1,
respectively
19Dynamic Programming
- The best alignment must be in one of three cases
- 1. Last position is (si1,tj 1 )
- 2. Last position is (-, tj 1 )
- 3. Last position is (si 1,-)
20Dynamic Programming
- The best alignment must be in one of three cases
- 1. Last position is (si1,tj 1 )
- 2. Last position is (-, tj 1 )
- 3. Last position is (si 1,-)
21Dynamic Programming
- The best alignment must be in one of three cases
- 1. Last position is (si1,tj 1 )
- 2. Last position is (-, tj 1 )
- 3. Last position is (si 1,-)
22Dynamic Programming
23Dynamic Programming
- Of course, we first need to handle the base cases
in the recursion
24Dynamic Programming
A G C A A A C
We fill the matrix using the recurrence rule
25Dynamic Programming
26Dynamic Programming
Conclusion d(AAAC,AGC) -1
27Reconstructing the Best Alignment
AAAC AG-C
28More than one best alignment
AAAC A-GC
29Complexity
- Space O(mn)
- Time O(mn)
- Filling the matrix O(mn)
- Backtrace O(mn)
30Needleman Wunsch (1970)
- A General Method Applicable to the Search for
Similarities in the Amino Acid Sequence of Two
Proteins - J. Mol. Biol. 48 443-453
31Local Alignment
- We just introduced global alignment
- Now introduce local alignment
- A local Alignment between sequence s and sequence
t is an alignment with maximum similarity between
a substring of s and a substring of t.
32Smith and Waterman (1981)
- Identification of Common Molecular Subsequences
- J. Mol. Biol., 147195-197
33Best-aligned Subsequences
The best score or start over
34- Note different scoring rule
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Best aligned subsequences