Sequence analysis of nucleic acids and proteins: part 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence analysis of nucleic acids and proteins: part 1

Description:

Sequence analysis of nucleic acids and proteins: part 1 ... of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000 ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 26
Provided by: cen7187
Category:

less

Transcript and Presenter's Notes

Title: Sequence analysis of nucleic acids and proteins: part 1


1
Sequence analysis of nucleic acids and proteins
part 1
Similarity search Lecture by Terry Speed
  • Based on Chapter 3 of
    Post-genome Bioinformatics by Minoru
    Kanehisa, Oxford University
    Press, 2000

2
Pairwise sequence alignment by the dynamic
programming algorithm. The algorithm involves
finding the optimal path in the path matrix. (a),
which is equivalent to searching the optimal
solution in the search tree (b).
  • (a) Path Matrix (b) Search Tree

A
I
M
S
A
M
O
S
X
X
. . . . .
. . . . . . . . .
Alignment AIM-S A-MOS
Pruning by an optimization function
3
Methods for computing the optimal score in the
dynamic programming algorithm (a ) the gap
penalty is a constant. (b) the gap
penalty is a linear function of the gap length.
  • (a) (b)

Di-1, j-1
d
ws(i), t(j)
d
b
Di, j(2)
Di,j
ws(i), t(j)
b
Di,j(1)
Di,j(3)
4
Concepts of global and local optimality in the
pairwise sequence alignment. The distinction is
made as to how the initial values are assigned to
the path matrix.
(a) Global vs. Global
(b) Local vs. Global
0
0 0 . . . . . . 0
(c) Local vs. Local
0 0 . . . . . . 0
. . . . 0
X
5
Dynamic programming to find edit distances
- Edit operation M, R, I, D - Edit transcript
A string over the alphabet M, R, I, D that
describes a transformation of one string into
another. Example R D I M D M M A -
T H S A - R T - S - Edit
(Levens(h)tein) distance The minimum number of
edit operations necessary to transform one
string into another. (Note matches are not
counted.) Example R D I M D M 1 1 1
0 1 0 4
6
The recurrence
- Stage position in the edit transcript -
State I, D, M, or R - Optimal value function
D(i, j) where D(i, j) edit distance of
Seq11...i and Seq21...j - Recurrence
relation 1 D(i-1, j) D(i, j) min 1
D(i, j-1) t(i, j) D(i-1, j-1) ,
where t(i, j)
7
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 M 1 A
2 T 3 H 4 S 5
8
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 M 1
A 2 T 3 H 4 S 5
9
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 M
1 A 2 T 3 H 4 S 5
10
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
M 1 A 2 T 3 H 4 S 5
11
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 A 2 2 T 3 3 H 4 4 S 5 5
12
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 A 2 2 T 3 3 H 4 4 S 5 5
13
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 A 2 2 T 3 3 H 4 4 S 5 5
14
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 H 4 4 S 5 5
15
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 S 5 5
16
The tabulation , D(i, j)
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
17
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
18
The solutions - 1
1 0 1 1 0 3 D M R R M M A T H S - A R
T S
19
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
20
The solutions - 2
1 0 1 0 1 0 3 D M I M D M M A - T H S
- A R T - S
21
The traceback
Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2
3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4
4 3 3 3 3 S 5 5 4 4 4 3
22
The solutions - 3
1 1 0 1 0 3 R R M D M M A T H S A R
T - S
Life must be lived forwards and understood
backwards. - Søren Kierkegaard
23
BLOSUM62 SCORING MATRIX
134 LQQGELDLVMTSDILPRSELHYSPMFDFEVRLVLAPDHPLASKTQI
TPEDLASETLLI
137 LDSNSVDLVLMGVPPRNVEV
EAEAFMDNPLVVIAPPDHPLAGERAISLARLAEETFVM
DD 6
DR -2
From Henikoff 1996
24
Scoring Matrices
  • Physical/Chemical similarities
  • comparing two sequences according to the
    properties of their residues may highlight
    regions of structural similarity
  • Identity matrices
  • by stressing only identities in the alignment,
    stretches of sequence that may have diverged will
    not penalise any remaining common features

25
Scoring Matrices (ctd)
  • As the direct source of residue by residue
    comparison scores the scoring matrix you choose
    will have a major impact on the alignment
    calculated
  • The most commonly used will be one of the
    mutation matrices
  • PAM, BLOSUM
  • The matrix that performs best will be the matrix
    that reflects the evolutionary separation of the
    sequences being aligned
Write a Comment
User Comments (0)
About PowerShow.com