Sequence Alignment - PowerPoint PPT Presentation

About This Presentation

Title:

Sequence Alignment

Description:

Use of the amino acid similarities based on physico-chemical properties ... a is mutated to an amino acid x in the first PAM, and then to b in the next, ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 22

Provided by: LHA63

Learn more at: https://courses.missouristate.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sequence Alignment

1
Sequence Alignment

Csc 487/687 Computing for bioinformatics

2
Refining the Scoring Scheme- Scoring Matrix

To measure the relative probability of any
particular substitution.
The relative frequencies of such changes to form
a scoring matrix for substitution
A likely change will score higher than a rare one.

3
Scoring matrix for nucleic acid sequences

A simple scheme for substitutions
1 for a match, -1 for a mismatch.
A more complicated scheme based on the higher
frequency of transition mutations than
transversion mutations
a g and t c
(a or g) (t or c)

4
Refining the Scoring Scheme- Scoring Matrix

The scheme should return high values for
alignment of homologous proteins
Should reward higher alignment of amino acids
often seen in corresponding positions in
homologous proteins

5
Scoring Matrices

Importance of scoring matrices
Scoring matrices appear in all analyses involving
sequence comparisons.
The choice of matrix can strongly influence the
outcome of the analysis.
Scoring matrices implicitly represent a
particular theory of relationships.
Understanding theories underlying a given scoring
matrix can aid in making proper choice.

6
Identity Matrix
1
A
1
0
C
1
0
0
I
1
0
0
0
L
L
I
C
A
Simplest type of scoring matrix
7
Similarity
It is easy to score if an amino acid is identical
to another (the score is 1 if identical and 0 if
not). However, it is not easy to give a score
for amino acids that are somewhat similar.
CO2-
CO2-
NH3
NH3
Isoleucine
Leucine
Should they get a 0 (non-identical) or a 1
(identical) or Something in between?
8
Scoring matrices

Gives scores between each pair of amino acids
Should reflect
The degree of biological relatedness
The probability that two amino acids occurring
in different sequences have common ancestor
Should be symmetric
Substitution matrices
The probability that an amino acid a is changed
to amino acid b (in a certain evolutionary time)
Is generally not symmetric

9
Scoring matrices

Identity matrix (scoring 0/1)
Use of the distances in the genetic codes
Use of the amino acid similarities based on
physico-chemical properties
Scoring matrices based on experimental data (PAM
BLOSUM)

10
DAYHOFFs PAM-MATRICES

Based on experimental data
t evolutionary time interval
Sequences from 34 superfamilies were used
Divide the sequences into groups (71) of
homologous sequences, and make a multiple
alignment for each of them
Construct evolutionary trees for each group, and
estimate the mutations that have occurred
Define an evolutionary model to explain the
evolution
Construct substitution matrices, for each amino
acid pairs (a,b) an estimate of the probability
that an amino acid a has mutated to an amino acid
b in time interval t
Construct scoring matrices from the substitution
matrices.
Note that a and b are variables that mean any
amino acid.

11
Example
12
The model of the evolution

The probability of a mutation in a position is
independent on
Position and neighbour residues
Previous mutations in the position
The biological (evolutionary) clock is assumed
(meaning constant rate of mutations)
This means that evolutionary time can be measured
in number of mutations (here substitutions)
The measure is PAM (Point Accepted Mutations)
1 PAM is one accepted mutation per 100 residues

13
The Point-Accepted-Mutation (PAM) model of
evolution and the PAM scoring matrix
A 1-PAM unit is equivalent to 1 mutation found in
a stretch of 2 sequences each containing 100
amino acids that are aligned Example 1
..CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPE
IQV..
..CNGTTDQVDKIVKIRNEGQIASTDVVEVVVSPPYVFLPV
VKSQLRPEIQV.. length 100, 1 Mismatch, PAM
distance 1 A k-PAM unit is equivalent to k
1-PAM units (or Mk).
14
Substitution matrix M1
15
Calculate Mz by matrix multiplication, show for
z2

Z2 mean two mutations per 100 residues
A residue a can be changed to residue b after 2
PAM of following reasons
a is mutated to b in first PAM, unchanged in the
next, with probability MabMbb
a is unchanged in first PAM, changed in the next,
probability MaaMab
a is mutated to an amino acid x in the first PAM,
and then to b in the next, probability MaxMxb, x
being any amino acid unequal (a,b)
These three cases are disjunctive, hence

16
Final Scoring Matrix is the Log-Odds Scoring
Matrix

S (a,b) 10 log10(Mab/Pb)

Replacement amino acid
Original amino acid
Frequency of amino acid b
Mutational probability matrix number
17
M250
18
PAM-250 scoring matrix
19
BLOSUM (Henikoff Henikoff)

Perform best in identifying distant relationships
Making use of the much larger amount of data that
become available since Dayhoffs work
Based on BLOCKS database of aligned protein
sequence

20
BLOSUM (Henikoff Henikoff)

Make multiple alignments and discover blocks not
containing gaps (used over 2,000 blocks)
...KIFIMK.......GDEVK...
...NLFKTR GDSKK...
KIFKTK GDPKA
KLFESR GDAER
KIFKGR GDAAK
For each column in each block they counted the
number of occurrences of each pair of amino acids
(210 different pairs (2021/2) )
A block of length w from an alignment of n
sequences has wn(n-1)/2 occurrences of amino acid
pairs
Let hab be the number of occurrences of the pair
(ab) in all blocks (habhba)
T total number of pairs
fabhab/T