Introduction to Sequence Alignment - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Introduction to Sequence Alignment

Description:

Gibbs & McIntyre (1970) Dot Matrix Alignment. Has many variations. Can be used to find sequence repeats ... Find self-complimentary subsequences of RNA to ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 42

Provided by: sch17

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Sequence Alignment

1
Introduction to Sequence Alignment
2
(No Transcript)
3
Why Align Sequences?

Find homology within the same species
Find clues to gene function
Practical issues in experiments
Find homology in other species
Gather info for an evolutionary model
Gene families

4
The Most Visual Way of Aligning Two Sequences
5
Dot Matrix Alignment
CACTAGGC AGCTAGGA
Gibbs McIntyre (1970)
6
Dot Matrix Alignment
7

Has many variations
Can be used to find sequence repeats
Find self-complimentary subsequences of RNA to
predict secondary structure
Still used today

8
Alignment using Dynamic Programming
9
An Example

GCGCATGGATTGAGCGA
TGCGCCATTGATGACCA
A possible alignment
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A

10

Alignments

-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
Three elements
Perfect matches
Mismatches
Gaps

11
Choosing Alignments

There are many possible alignments
For example, compare
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
to
------GCGCATGGATTGAGCGA
TGCGCC----ATTGATGACCA--
Which one is better?

12
Scoring Rule

Example Score
( matches) ( mismatches) ( gaps) x 2

13
Example

-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
Score (1x13) (-1x2) (-2x4) 3
------GCGCATGGATTGAGCGA
TGCGCC----ATTGATGACCA--
Score (1x5) (-1x6) (-2x11) -23

14
Optimal Alignment

Optimal alignment is achieved at best similarity
score d, thus is determined by the scoring rule

15
Finding the Best Alignment Score

The additive form of the score allows to perform
dynamic programming to find the best score
efficiently
Guaranteed to find the best alignment

16
Assume that an Optimal Score Exists

d(s,t) Optimal score for globally aligning s
and t

17
The Idea

The best alignment that ends at a given pair of
bases the best among best alignments of the
sequences up to that point, plus the score for
aligning the two additional bases.

18
Dynamic Programming

Consider the best alignment score of two
sequences s, t at base/residue i1, j1,
respectively

19
Dynamic Programming

The best alignment must be in one of three cases
1. Last position is (si1,tj 1 )
2. Last position is (-, tj 1 )
3. Last position is (si 1,-)

20
Dynamic Programming

The best alignment must be in one of three cases
1. Last position is (si1,tj 1 )
2. Last position is (-, tj 1 )
3. Last position is (si 1,-)

21
Dynamic Programming

The best alignment must be in one of three cases
1. Last position is (si1,tj 1 )
2. Last position is (-, tj 1 )
3. Last position is (si 1,-)

22
Dynamic Programming
23
Dynamic Programming

Of course, we first need to handle the base cases
in the recursion

24
Dynamic Programming
A G C A A A C
We fill the matrix using the recurrence rule
25
Dynamic Programming
26
Dynamic Programming
Conclusion d(AAAC,AGC) -1
27
Reconstructing the Best Alignment
AAAC AG-C
28
More than one best alignment
AAAC A-GC
29
Complexity

Space O(mn)
Time O(mn)
Filling the matrix O(mn)
Backtrace O(mn)

30
Needleman Wunsch (1970)

A General Method Applicable to the Search for
Similarities in the Amino Acid Sequence of Two
Proteins
J. Mol. Biol. 48 443-453

31
Local Alignment

We just introduced global alignment
Now introduce local alignment
A local Alignment between sequence s and sequence
t is an alignment with maximum similarity between
a substring of s and a substring of t.

32
Smith and Waterman (1981)