Why is pairwise sequence alignment different - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Why is pairwise sequence alignment different

Description:

Scoring systems and matrices for protein data. 3. Wet experience for pairwise sequence alignment ... http://www.imb-jena.de/IMAGE_AA.html. Protein Scoring Matrices ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 21
Provided by: bch7
Category:

less

Transcript and Presenter's Notes

Title: Why is pairwise sequence alignment different


1
Lecture 5
  • Why is pairwise sequence alignment different
  • for proteins and for nucleic acids ?
  • General protein introduction.
  • Scoring systems and matrices for protein data.
  • 3. Wet experience for pairwise sequence
    alignment
  • (for proteins, more options).
  • 4. Special Blast pages.
  • 5. Why is multiple alignment better ?
  • 6. Wet experience for MSA (for proteins).

2
Scoring Systems
  • Identity Count the number of identical matches,
  • divide by length of aligned region (in ).
  • Similarity A less well defined measure of how
    close 2 sequences are.
  • Chemical similarities among amino acids

http//www.imb-jena.de/IMAGE_AA.html
3
Related Amino Acids
http//www.imb-jena.de/IMAGE_AA.html
4
Protein Scoring Matrices
  • Family of matrices listing the likelihood of
    changes from one sequence to another during
    evolution.
  • The two most popular matrices are the PAM and the
    BLOSUM matrices.

5
PAM Matrix - Point Accepted Mutations
  • PAM matrices are based
  • on related sequences.
  • In these related proteins, the
  • function was not significantly changed.

The changes are accepted by natural selection
(mutations survived during evolution).
6
PAM Scoring Matrices
PAM units measure evolutionary distance.
PAM 1 matrix - Substitution scores arising from
sequences where one percent of amino acid
pairs are different. Note PAM 1 is a small
change -gt the sequences will be almost identical.
7
PAM Scoring Matrices
  • In general
  • Low PAM numbers are used for aligning short
    sequences
  • with strong local similarities.
  • High PAM numbers used for aligning long
    sequences
  • with weak similarities.
  • When there is no information about evolutionary
    distance,
  • 3 matrices are recommended for sequence
    comparison
  • PAM 40, PAM 120 and PAM 250.

8
PAM Family of Matrices (Dayhoff, 78)
(log odds)
Values gt 0 in the logs odd PAM matrix indicate
likely mutations values 0 are neutral values lt
0 indicate unlikely mutations.
Note Numbers along diagonals are not all equal.
The diagonal indicates how conserved a
residue tend to be (W is VERY conserved).
Calculate PAM Matrix Enter the desired PAM value
in the box below (value must be greater than 1,
and less than 512) http//www.cmbi.kun.nl/bioinf
/tools/pam.shtml
9
THE BLOSUM Family of Matrices
Blocks Substitution Matrices- (BLOSUM
matrices based on a much larger dataset then PAM).
  • Blocks are short conserved patterns of 3-60 aa
    long.
  • Proteins can be divided into families by common
    blocks.
  • Different BLOSUM matrices emerge by looking
  • at sequences with different identity
    percentage.Example BLOSUM62 is derived from an
    alignment
  • of sequences that share at least 62
    identity.

Block A B C D
10
THE BLOSUM Family of Matrices
Blocks Substitution Matrices
(log odds)
11
PAM vs. BLOSUM Matrices
Widely used
  • Tips for protein similarity search
  • Start with BLOSUM 62 or PAM 120, default gap
    penalties.
  • If no significant results found, use BLOSUM 45
    or PAM 250
  • and lower gap penalties, to find more
    divergent results.
  • Examine results above E-value 0.05 for
    divergent sequences.
  • Use PSI-BLAST to discover weak but biologically
    significant
  • sequence similarities.

http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/Sc
oring2.html
12
Lecture 5
  • Why is pairwise sequence alignment different
  • for proteins and for nucleic acids ?
  • General protein introduction.
  • Scoring systems and matrices for protein data.
  • 3. Wet experience for pairwise sequence
    alignment
  • (for proteins, more options).
  • 4. Special Blast pages.
  • 5. Why is multiple alignment better ?
  • 6. Wet experience for MSA (for proteins).

13
http//www.ncbi.nlm.nih.gov/BLAST/
14
http//www.ncbi.nlm.nih.gov/BLAST/
15
http//www.ebi.ac.uk/swissprot/
16
Protein Query
17
Options for Advanced
18
(No Transcript)
19
Examples of Alignment Formats http//www.ncbi.nlm
.nih.gov/Education/BLASTinfo/multi_formats.html
20
Pair wise Alignment in BLAST Output
low complexity sequence filtered
Positives
Write a Comment
User Comments (0)
About PowerShow.com