Sequence Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence Alignment

Description:

Use of the amino acid similarities based on physico-chemical properties ... a is mutated to an amino acid x in the first PAM, and then to b in the next, ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 22
Provided by: LHA63
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment


1
Sequence Alignment
  • Csc 487/687 Computing for bioinformatics

2
Refining the Scoring Scheme- Scoring Matrix
  • To measure the relative probability of any
    particular substitution.
  • The relative frequencies of such changes to form
    a scoring matrix for substitution
  • A likely change will score higher than a rare one.

3
Scoring matrix for nucleic acid sequences
  • A simple scheme for substitutions
  • 1 for a match, -1 for a mismatch.
  • A more complicated scheme based on the higher
    frequency of transition mutations than
    transversion mutations
  • a g and t c
  • (a or g) (t or c)

4
Refining the Scoring Scheme- Scoring Matrix
  • The scheme should return high values for
    alignment of homologous proteins
  • Should reward higher alignment of amino acids
    often seen in corresponding positions in
    homologous proteins

5
Scoring Matrices
  • Importance of scoring matrices
  • Scoring matrices appear in all analyses involving
    sequence comparisons.
  • The choice of matrix can strongly influence the
    outcome of the analysis.
  • Scoring matrices implicitly represent a
    particular theory of relationships.
  • Understanding theories underlying a given scoring
    matrix can aid in making proper choice.

6
Identity Matrix
1
A
1
0
C
1
0
0
I
1
0
0
0
L
L
I
C
A
Simplest type of scoring matrix
7
Similarity
It is easy to score if an amino acid is identical
to another (the score is 1 if identical and 0 if
not). However, it is not easy to give a score
for amino acids that are somewhat similar.
CO2-
CO2-
NH3
NH3
Isoleucine
Leucine
Should they get a 0 (non-identical) or a 1
(identical) or Something in between?
8
Scoring matrices
  • Gives scores between each pair of amino acids
  • Should reflect
  • The degree of biological relatedness
  • The probability that two amino acids occurring
    in different sequences have common ancestor
  • Should be symmetric
  • Substitution matrices
  • The probability that an amino acid a is changed
    to amino acid b (in a certain evolutionary time)
  • Is generally not symmetric

9
Scoring matrices
  • Identity matrix (scoring 0/1)
  • Use of the distances in the genetic codes
  • Use of the amino acid similarities based on
    physico-chemical properties
  • Scoring matrices based on experimental data (PAM
    BLOSUM)

10
DAYHOFFs PAM-MATRICES
  • Based on experimental data
  • t evolutionary time interval
  • Sequences from 34 superfamilies were used
  • Divide the sequences into groups (71) of
    homologous sequences, and make a multiple
    alignment for each of them
  • Construct evolutionary trees for each group, and
    estimate the mutations that have occurred
  • Define an evolutionary model to explain the
    evolution
  • Construct substitution matrices, for each amino
    acid pairs (a,b) an estimate of the probability
    that an amino acid a has mutated to an amino acid
    b in time interval t
  • Construct scoring matrices from the substitution
    matrices.
  • Note that a and b are variables that mean any
    amino acid.

11
Example
12
The model of the evolution
  • The probability of a mutation in a position is
    independent on
  • Position and neighbour residues
  • Previous mutations in the position
  • The biological (evolutionary) clock is assumed
    (meaning constant rate of mutations)
  • This means that evolutionary time can be measured
    in number of mutations (here substitutions)
  • The measure is PAM (Point Accepted Mutations)
  • 1 PAM is one accepted mutation per 100 residues

13
The Point-Accepted-Mutation (PAM) model of
evolution and the PAM scoring matrix
A 1-PAM unit is equivalent to 1 mutation found in
a stretch of 2 sequences each containing 100
amino acids that are aligned Example 1
..CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPE
IQV..
..CNGTTDQVDKIVKIRNEGQIASTDVVEVVVSPPYVFLPV
VKSQLRPEIQV.. length 100, 1 Mismatch, PAM
distance 1 A k-PAM unit is equivalent to k
1-PAM units (or Mk).
14
Substitution matrix M1
15
Calculate Mz by matrix multiplication, show for
z2
  • Z2 mean two mutations per 100 residues
  • A residue a can be changed to residue b after 2
    PAM of following reasons
  • a is mutated to b in first PAM, unchanged in the
    next, with probability MabMbb
  • a is unchanged in first PAM, changed in the next,
    probability MaaMab
  • a is mutated to an amino acid x in the first PAM,
    and then to b in the next, probability MaxMxb, x
    being any amino acid unequal (a,b)
  • These three cases are disjunctive, hence

16
Final Scoring Matrix is the Log-Odds Scoring
Matrix
  • S (a,b) 10 log10(Mab/Pb)

Replacement amino acid
Original amino acid
Frequency of amino acid b
Mutational probability matrix number
17
M250
18
PAM-250 scoring matrix
19
BLOSUM (Henikoff Henikoff)
  • Perform best in identifying distant relationships
  • Making use of the much larger amount of data that
    become available since Dayhoffs work
  • Based on BLOCKS database of aligned protein
    sequence

20
BLOSUM (Henikoff Henikoff)
  • Make multiple alignments and discover blocks not
    containing gaps (used over 2,000 blocks)
  • ...KIFIMK.......GDEVK...
  • ...NLFKTR GDSKK...
  • KIFKTK GDPKA
  • KLFESR GDAER
  • KIFKGR GDAAK
  • For each column in each block they counted the
    number of occurrences of each pair of amino acids
    (210 different pairs (2021/2) )
  • A block of length w from an alignment of n
    sequences has wn(n-1)/2 occurrences of amino acid
    pairs
  • Let hab be the number of occurrences of the pair
    (ab) in all blocks (habhba)
  • T total number of pairs
  • fabhab/T

21
Gap weighting
  • CLUSTAL-W
  • For aligning DNA sequences
  • Use of identity matrix for substitution
  • Gap penalties 10 for gap initiation and 0.1 for
    gap extension by one residue
  • For aligning protein sequences
  • BLOSUM62 matrix
  • Gap penalties 11 for gap initiation and 1 for gap
    extension by one residue
Write a Comment
User Comments (0)
About PowerShow.com