Biology 301 Computational Biology HomologySimilarity, The Matrices, BLAST Bioinformatics Ch. 8 secti - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Biology 301 Computational Biology HomologySimilarity, The Matrices, BLAST Bioinformatics Ch. 8 secti

Description:

BLASTing Proteins. BLASTp - aa sequence:protein db (domains) ... In general, you should NOT change BLAST parameters in this course! ... – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 21
Provided by: sarahb73
Category:

less

Transcript and Presenter's Notes

Title: Biology 301 Computational Biology HomologySimilarity, The Matrices, BLAST Bioinformatics Ch. 8 secti


1
Biology 301 - Computational BiologyHomology/Simil
arity, The Matrices, BLAST Bioinformatics Ch. 8
section MEP Ch. 2-3 Blast Tutorial
2
General Introduction
  • Introduction to terms
  • Similarity same monomers
  • Homology inference, predicts phylogeny
  • BFD, like many molecular biologists, improperly
    bandies around these terms. Which are "results"
    and which are "discussion? What did you think
    of the genome papers use of these data and terms?

3
  • Compared molecules SHOULD
  • Have a common ancestor
  • Have a similar function
  • Be homologous at ALL levels
  • - enough similarity to be aligned
  • - enough variability to be informative
  • EACH monomer and domain one trait
  • THEREFORE

4
(No Transcript)
5
  • Alignment - what we are building up to
  • Lining up homologous monomers/domains
  • Every alignment decision assumption
  • Known functional or structural domains
  • Computer programs can only deal with simple,
    predictable patterns - similarity. Ultimately,
    the researcher has to examine computer
    alignments, integrating biological information
    determined from the literature or bench.

6
  • Substitute
  • Hemoglobin from each of taxa
  • Thermal stability
  • Enhanced O2-binding
  • Decreased CO-binding
  • Binds 4 peptides
  • Binds 2 peptides

7
Thermal, O2-binding, CO-binding, 4 peptides, 2
peptides
8
(No Transcript)
9
  • Alignments compute
  • S substitutions/mutations
  • G length of gaps (insertions/deletions)
  • W gap penalty (VERY subjective)
  • Computer programs use different matrices for
    weighing (or not weighing) substitutions.
    Sometimes, you can control them, sometimes they
    are default

10
e.g. Dayhoffs Amino Acid Matrix -
PAMsubsitutions occur more often between amino
acids that are similar in terms of biochemistry
  • General Characteristics of PAM
  • PAM accepted point mutations
  • Based on changes/100 aa (eukaryotic)
  • Table units likelihood of change
  • Some aas seldom change C, G, W
  • High values similar (A/G 21 A/K 4)
  • Studying microbial and mitochondrial proteins
    has lead to other matrices.

11
e.g. Contrast Two Nucleotide Matrices
  • Jukes/Cantor Model
  • All nucleotides equal, typically all 1
  • Kimuras Two-Parameter Model
  • Transitions more frequent than transversions
  • Scores can vary but typically T 2 t 1
  • Many WAY complicated matrices have been
    developed for nucleotides more later.

12
Basic Local Alignment and Search ToolRapid
alignment and similarity program.
  • In an nutshell
  • Allows you to input any "unknown" sequence
  • Compares unknown to ENTIRE database
  • Outputs ranked hitlist with scores and links

13
  • Things BLAST is useful for
  • Finding CDS in a large genome segment
  • Predicting protein function - Hmmm
  • Predicting protein structure - Hmmm
  • Finding gene or protein family members
  • Checking for sequencing errors
  • BUT - "BLAST hits are not transitive, unless the
    alignments are overlapping"

14
  • SIMPLE explanation of how BLAST aligns
  • Assumes all sequences, lengths AVERAGE
  • EXTREME gap penalty (called gap cost)
  • Often results in SHORT regions aligned
  • PAM or Equal Weighting
  • FAST and POWERFUL but not intricate
  • No "biology" used for nucleotides
  • Some versions back- or forward-translates every
    possible frame for comparison.

15
  • Overview of BLAST output - 35 pages
  • Graphic - clickable color-coded similarity
  • Hitlist - ranked names/accessions, scores
  • Alignments - EVERY used query/hit
  • Parameters - list of parameters used/selected

16
(No Transcript)
17
  • The Hitlist Scores
  • Bit Scores statistical significance
  • Less than 50 is considered bad
  • E-value probability alignment due to chance
  • Keep in mind - these scores apply ONLY to the
    OFTEN-SHORT alignments used! Read Box p. 229 and
    make sure you actually LOOK at your alignments!

18
  • BLAST Alignments
  • Identity within aligned region, same
  • 25 or higher is considered good
  • Length the actual length of the alignment

19
  • BLASTing DNA
  • BLASTn DNADNA db (70 similar)
  • TBLASTx translated DNADNA db (families)
  • BLASTx translated DNAprotein db (CDS)
  • BLASTing Proteins
  • BLASTp - aa sequenceprotein db (domains)
  • TBLASTn - aa sequenceDNA db (families)
  • Families - goal is research and discovery of new
    relationships.

20
  • Changing BLAST parameters
  • Direct searching some db or db members
  • MASK some regions (conserved, repetitive)
  • Change gap penalties
  • Change substitution matrix
  • All of these changes alter output significantly
    and it is IMPORTANT to record changes in
    procedures. In general, you should NOT change
    BLAST parameters in this course!
Write a Comment
User Comments (0)
About PowerShow.com