Alignment methods - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Alignment methods

Description:

Title: Biology and computers Author: jmomand Last modified by: Cal State L.A. Created Date: 3/26/2001 11:44:52 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 23
Provided by: jmomand
Category:

less

Transcript and Presenter's Notes

Title: Alignment methods


1
Alignment methods
  • April 21, 2009
  • Quiz 1-April 23 (JAM lectures through today)
  • Writing assignment topic due Tues, April 23
  • Hand in homework 3
  • Why has HbS stayed in the population?
  • Learning objectives- Understand difference
    between global alignment and local alignment.
    Understand the Needleman-Wunsch algorithm.
    Understand the Smith-Waterman algorithm in global
    alignment mode.
  • Workshop-Perform alignment of two nucleotide
    sequences
  • Homework 4 due Tues, April 23

2
Evolutionary Basis of Sequence Alignment
Why are there regions of identity when comparing
protein sequences? 1) Conserved function-amino
acid residues participate in reaction. 2)
Structural (For example, conserved cysteine
residues that form a disulfide linkage) 3)
Historical-Residues that are conserved solely due
to a common ancestor gene.
3
Identity Matrix
1
A
1
0
C
1
0
0
I
1
0
0
0
L
L
I
C
A
Simplest type of scoring matrix
4
Similarity
It is easy to score if an amino acid is identical
to another (the score is 1 if identical and 0 if
not). However, it is not easy to give a score
for amino acids that are somewhat similar.
CO2-
CO2-
NH3
NH3
Isoleucine
Leucine
Should they get a 0 (non-identical) or a 1
(identical) or Something in between?
5
One is mouse trypsin and the other is crayfish
trypsin. They are homologous proteins. The
sequences share 41 identity.
6
(No Transcript)
7
Evolutionary Basis of Sequence Alignment (Cont. 2)
Note it is possible that two proteins share a
high degree of similarity but have two different
functions. For example, human gamma-crystallin
is a lens protein that has no known enzymatic
activity. It shares a high percentage of
identity with E. coli quinone oxidoreductase.
These proteins likely had a common ancestor but
their functions diverged.
Analogous to railroad car and diner. Both have
the same form but different functions.
8
Global Alignment Method
For example, the two hypothetical sequences
abcdefghajklm abbdhijk could be aligned like
this abcdefghajklm
abbd...hijk As shown, there are 6 matches, 2
mismatches, and one gap of length 3.
9
Global Alignment Method Scored
The alignment is scored according to a payoff
matrix payoff match gt match,
mismatch gt mismatch,
gap_open gt gap_open,
gap_extend gt gap_extend For correct
operation, an algorithm is created such that the
match must be positive and the other payoff
entities must be negative.
10
Global Alignment Method (cont. 3)
Example Given the payoff matrix payoff
match gt 4, mismatch gt
-3, gap_open gt -2,
gap_extend gt -1
11
Global Alignment Method (cont. 4)
The sequences abcdefghajklm abbdhijk are
aligned and scored like this a b
c d e f g h a j k l m
a b b d . . . h i j k
match 4 4 4 4 4 4
mismatch -3 -3 gap_open
-2 gap_extend -1-1-1 for a total
score of 24-6-2-3 13.
12
Global Alignment Method (cont. 5)
The algorithm should guarantee that no
other alignment of these two sequences has
a higher score under this payoff matrix.
13
Lets align the following with a simple payoff
matrix ABCNJRQCLCRPM and AJCJNRCKCRBP Where
match 1 mismatch 0 gap 0 gap extension
0
Alignment A Sequence 1 ABCNJ-RQCLCR-PM
Sequence 2 AJC-JNR-CKCRBP- Score
101010101011010 Total Score 8 Alignment B
Sequence 1 ABC-NJRQCLCR-PM Sequence 2
AJCJN-R-CKCRBP- Score 101010101011010 Total
Score 8
14
Three steps in Dynamic Programming
1. Initialization 2. Matrix fill or scoring 3.
Traceback and alignment
15
Initialization step
16
Matrix Fill (bottom two rows)
17
Matrix Fill (bottom three rows)
18
Matrix Fill (entire matrix)
Sequence 1 ABC-NJRQCLCR-PM Sequence 2
AJCJN-R-CKCRBP- Score 101010101011010 Total
Score 8
Sequence 1 ABCNJ-RQCLCR-PM Sequence 2
AJC-JNR-CKCRBP- Score 101010101011010 Total
Score 8
19
Smith-Waterman algorithm
Mi,j MAXIMUM Mi-1, j-1 si,,j (match or
mismatch in the diagonal), Mi, j-1 w (gap in
sequence 1), Mi-1, j w (gap in sequence
2), 0 Where Mi-1, j-1 is the value in the
cell diagonally juxtaposed to Mi,j. (The i-1,
j-1 cell is up and to the left of mi,nj). Where
si,j is the value for the match or mismatch in
the minj cell. Where Mi, j-1 is the value in
the cell above Mi,j. Where w is the value for
the gap penalty. Where Mi-1, j is the value in
the cell to the left of Mi,j.
20
Initialization step Create Matrix with M 1
columns and N 1 rows. M number of letters in
sequence 1 and N number of letters in sequence
2. First column (M-1) and first row (N-1) will
be filled with 0s.
21
Matrix fill step Each position Mi,j is defined
to be the MAXIMUM score at position i,j Mi,j
MAXIMUM Mi-1, j-1 si,,j (match or mismatch
in the diagonal) Mi, j-1 w (gap in sequence
1) Mi-1, j w (gap in sequence 2)
row
column
22
Sequence 1 ABCNJ-RQCLCR-PM Sequence 2
AJC-JNR-CKCRBP- Score 8
Write a Comment
User Comments (0)
About PowerShow.com