Scoring Matrices - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Scoring Matrices

Description:

Scoring matrices implicitly represent a particular theory of evolution ... Mutations accepted by natural selection. Constructing PAM Matrix: Training Data ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 48
Provided by: sch17
Category:

less

Transcript and Presenter's Notes

Title: Scoring Matrices


1
Scoring Matrices
2
Diff. Scoring Rules Lead to Diff. Alignments
  • Example Score
  • 5 x ( matches) (-4) x ( mismatches)
  • (-7) x (total length of all gaps)
  • Example Score
  • 5 x ( matches) (-4) x ( mismatches)
  • (-5) x ( gap openings) (-2) x (total
    length of all gaps)

3
(No Transcript)
4
(No Transcript)
5
Scoring Rules/Matrices
  • Why are they important?
  • The choice of a scoring rule can strongly
    influence the outcome of sequence analysis
  • What do they mean?
  • Scoring matrices implicitly represent a
    particular theory of evolution
  • Elements of the matrices specify the similarity
    of one residue to another

6
The Sij in a Scoring Matrix (as log likelihood
ratio)
7
  • The alignment score of aligning two sequences is
    the log likelihood ratio of the alignment under
    two models
  • Common ancestry
  • By chance

8
Likelihood Ratio for Aligning a Single Pair of
Residues
  • Above the probability that two residues are
    aligned by evolutionary descent
  • Below the probability that they are aligned by
    chance
  • Pi, Pj are frequencies of residue i and j in all
    sequences (abundance)

9
Likelihood Ratio of Aligning Two Sequences
10
PAM Accepted Mutations1500 changes in 71
groups w/ gt 85 similarity BLOSUM Blocks
Substitution Matrix2000 blocks from 500
families
Two classes of widely used protein scoring
matrices
11
  • PAM and BLOSUM matrices are all log likelihood
    matrices
  • More specifically
  • An alignment that scores 6 means that the
    alignment by common ancestry is 2(6/2)8 times
    as likely as expected by chance.

12
Constructing BLOSUM Matrices
  • Blocks Substitution Matrices

13
BLOSUM Matrices of Specific Similarities
  • Sequences with above a threshold similarity are
    clustered.
  • If clustering threshold is 62, final matrix is
    BLOSUM62

14
  • A toy example of constructing a BLOSUM matrix
    from 4 training sequences

15
Constructing a BLOSUM matr.1. Counting mutations
16
2. Tallying mutation frequencies
17
3. Matrix of mutation probs.
18
4. Calculate abundance of each residue (Marginal
prob)
19
5. Obtaining a BLOSUM matrix
20
  • Constructing the real BLOSUM62 Matrix

21
1.2.3.Mutation Frequency Table
22
4. Calculate Amino Acid Abundance
23
5. Obtaining BLOSUM62 Matrix
24
(No Transcript)
25
BLOSUM matrices reference
  • S. Henikoff and J. Henikoff (1992). Amino acid
    substitution matrices from protein blocks. PNAS
    89 10915-10919
  • Training Data 2000 conserved blocks from BLOCKS
    database. Ungapped, aligned protein segments.
    Each block represents a conserved region of a
    protein family

26
Break
  • Homework

27
PAM Matrices (Point Accepted Mutations)
  • Mutations accepted by natural selection

28
Constructing PAM Matrix Training Data
29
PAM Phylogenetic Tree
30
PAM Accepted Point Mutation
31
Mutability of Residue j
32
Total Mutation Rate
is the total mutation rate of all amino acids
33
Normalize Total Mutation Rate to 1
This defines an evolutionary period the period
during which the 1 of all sequences are mutated
(accepted of course)
34
Mutation Probability Matrix Normalized Such that
the Total Mutation Rate is 1
35
Mutation Probability Matrix (transposed) M10000
36
-- PAM1 mutation prob. matr. --
PAM2 Mutation Probability Matrix? -- Mutations
that happen in twice the evolution period of that
for a PAM1
37
PAM Matrix Assumptions
38
In two PAM1 periods
  • A?R A?A and A?R or
  • A?N and N?R or
  • A?D and D?R or
  • or
  • A?V and V?R

39
Entries in a PAM-2 Mut. Prob. Matr.
40
PAM-k Mutation Prob. Matrix
41
PAM-k log-likelihood matrix
42
PAM-250
43
  • PAM6060, PAM8050,
  • PAM12040
  • PAM-250 matrix provides a better scoring
    alignment than lower-numbered PAM matrices for
    proteins of 14-27 similarity

44
PAM Matrices Reference
  • Atlas of Protein Sequence and Structure,
  • Suppl 3, 1978, M.O. Dayhoff.
  • ed. National Biomedical Research Foundation,
    1

45
Choice of Scoring Matrix
46
Comparing Scoring Matrix
  • PAM
  • Based on extrapolation of a small evol. Period
  • Track evolutionary origins
  • Homologous seq.s during evolution
  • BLOSUM
  • Based on a range of evol. Periods
  • Conserved blocks
  • Find conserved domains

47
Sources of Error in PAM
Write a Comment
User Comments (0)
About PowerShow.com