Title: A Perl Program for
1A Perl Program for Sequence Alignment
2Sequence Alignment
- The different steps of dynamic programming
- build the DP matrix
- Trace-back
- Outputs the alignment
- Parameters
- The scoring matrix
- Gap penalty
3Sequence Alignment
The different steps of a Perl program for
Sequence Alignment
4Sequence Alignment
The different steps of a Perl program for
Sequence Alignment
5Sequence Alignment
Step 2 Initialisation
What we need - a 2D array to store the DP
matrix _at_MatAlign - a 2D array to store row
pointer _at_PointerI - a 2D array to store column
pointer _at_PointerJ Size of the arrays
length2 row and length1 column
0 1 2 3 length1-1
0 1 2 length2-1
. . .
. . .
. . .
. . .
6Sequence Alignment initialization
Initialize the three matrices we need
-
the alignment matrix MatAlign - pointer
along i PointerI - pointer along j
PointerJ for (i0 iltlength2 i)
for (j0 jltlength1 j)
MatAlignij0
PointerIij0
PointerJij0
7Sequence Alignment Compute DP matrix
1. Initialize first row
first row First amino acid in sequence2,
and its position in scoring matrix AAi
seq20 Pos_i positionAAi
hash array that gives position
position of AA in
Score. Loop over all amino acids of sequence
1 for (j0 jltlength1 j) AAj
seq1j Pos_j positionAAj
MatAlign0j ScorePos_iPos_j
if(j gt 0) MatAlign0j -Gop
Would allow for gaps
in first row
8Sequence Alignment Compute DP matrix
2. Initialize first column
first column First amino acid in
sequence1, and its position in scoring
matrix AAj seq10 Pos_j
positionAAj Loop over all amino acids of
sequence 1 for (i0 iltlength2 i)
AAi seq2i Pos_i
positionAAi MatAligni0
ScorePos_iPos_j if(i gt 0)
MatAligni0 -Gop
9Sequence Alignment Compute DP matrix
3. Propagate
3.1 Find score of aligning i with j AAi
seq2i AAj seq1j Pos_i
positionAAi Pos_j positionAAj score_
ij ScorePos_iPos_j
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j
10Sequence Alignment Compute DP matrix
3. Propagate
3.2 Score coming from (i-1,j-1) i-1
aligned with j-1 score1
MatAligni-1j-1 ipos1 i-1
jpos2 j-1
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j
11Sequence Alignment Compute DP matrix
3.3 Gap in sequence 1 (only for jgt1) For k
0 gap Gop(j-1)Gext score2
MatAligni-10 -gap ipos2 i -1 jpos2
0 for remaining k values
for (k 1 kltj-1 k) gap
Gop (j-1-k)Gext
score_test MatAligni-1k-gap
if(score_test gt score2)
score2 score_test
jpos2 k
3. Propagate
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j
12Sequence Alignment Compute DP matrix
3.3 Gap in sequence 2 (only for igt1) For k
0 gap Gop(i-1)Gext score3
MatAlign0j-1 -gap ipos3 0 jpos3
j-1 for remaining k values
for (k 1 klti-1 k) gap
Gop (i-1-k)Gext
score_test MatAlignkj-1-gap
if(score_test gt score3)
score3 score_test
ipos3 k
3. Propagate
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j
13Sequence Alignment Compute DP matrix
3.4 Combine the 3 find optimal bestscore
score1 ipos ipos1 jpos jpos1 if(j
gt 1) if(score2 gt bestscore)
bestscore
score2 ipos ipos2
jpos jpos2
if(i gt 1) if(score3 gt
bestscore)
bestscore score3
ipos ipos3 jpos
jpos3
3. Propagate
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j
14Sequence Alignment Compute DP matrix
3. Propagate
3.5 Finally, update score pointers
MatAlignijscore_ijbestscore PointerI
ijipos PointerJijjpos
k,j-1 0 k I-2
i-1, k 0k j-2
i-1, j-1
i,j