Title: Biology 301 Computational Biology HomologySimilarity, The Matrices, BLAST Bioinformatics Ch. 8 secti
1Biology 301 - Computational BiologyHomology/Simil
arity, The Matrices, BLAST Bioinformatics Ch. 8
section MEP Ch. 2-3 Blast Tutorial
2General Introduction
- Introduction to terms
- Similarity same monomers
- Homology inference, predicts phylogeny
- BFD, like many molecular biologists, improperly
bandies around these terms. Which are "results"
and which are "discussion? What did you think
of the genome papers use of these data and terms?
3- Compared molecules SHOULD
- Have a common ancestor
- Have a similar function
- Be homologous at ALL levels
- - enough similarity to be aligned
- - enough variability to be informative
- EACH monomer and domain one trait
- THEREFORE
4(No Transcript)
5- Alignment - what we are building up to
- Lining up homologous monomers/domains
- Every alignment decision assumption
- Known functional or structural domains
-
- Computer programs can only deal with simple,
predictable patterns - similarity. Ultimately,
the researcher has to examine computer
alignments, integrating biological information
determined from the literature or bench.
6- Substitute
- Hemoglobin from each of taxa
- Thermal stability
- Enhanced O2-binding
- Decreased CO-binding
- Binds 4 peptides
- Binds 2 peptides
7Thermal, O2-binding, CO-binding, 4 peptides, 2
peptides
8(No Transcript)
9- Alignments compute
- S substitutions/mutations
- G length of gaps (insertions/deletions)
- W gap penalty (VERY subjective)
- Computer programs use different matrices for
weighing (or not weighing) substitutions.
Sometimes, you can control them, sometimes they
are default
10e.g. Dayhoffs Amino Acid Matrix -
PAMsubsitutions occur more often between amino
acids that are similar in terms of biochemistry
- General Characteristics of PAM
- PAM accepted point mutations
- Based on changes/100 aa (eukaryotic)
- Table units likelihood of change
- Some aas seldom change C, G, W
- High values similar (A/G 21 A/K 4)
- Studying microbial and mitochondrial proteins
has lead to other matrices.
11e.g. Contrast Two Nucleotide Matrices
- Jukes/Cantor Model
- All nucleotides equal, typically all 1
- Kimuras Two-Parameter Model
- Transitions more frequent than transversions
- Scores can vary but typically T 2 t 1
- Many WAY complicated matrices have been
developed for nucleotides more later.
12Basic Local Alignment and Search ToolRapid
alignment and similarity program.
- In an nutshell
- Allows you to input any "unknown" sequence
- Compares unknown to ENTIRE database
- Outputs ranked hitlist with scores and links
13- Things BLAST is useful for
- Finding CDS in a large genome segment
- Predicting protein function - Hmmm
- Predicting protein structure - Hmmm
- Finding gene or protein family members
- Checking for sequencing errors
- BUT - "BLAST hits are not transitive, unless the
alignments are overlapping"
14- SIMPLE explanation of how BLAST aligns
- Assumes all sequences, lengths AVERAGE
- EXTREME gap penalty (called gap cost)
- Often results in SHORT regions aligned
- PAM or Equal Weighting
- FAST and POWERFUL but not intricate
- No "biology" used for nucleotides
- Some versions back- or forward-translates every
possible frame for comparison.
15- Overview of BLAST output - 35 pages
- Graphic - clickable color-coded similarity
- Hitlist - ranked names/accessions, scores
- Alignments - EVERY used query/hit
- Parameters - list of parameters used/selected
16(No Transcript)
17- The Hitlist Scores
- Bit Scores statistical significance
- Less than 50 is considered bad
- E-value probability alignment due to chance
- Keep in mind - these scores apply ONLY to the
OFTEN-SHORT alignments used! Read Box p. 229 and
make sure you actually LOOK at your alignments!
18- BLAST Alignments
- Identity within aligned region, same
- 25 or higher is considered good
- Length the actual length of the alignment
19- BLASTing DNA
- BLASTn DNADNA db (70 similar)
- TBLASTx translated DNADNA db (families)
- BLASTx translated DNAprotein db (CDS)
- BLASTing Proteins
- BLASTp - aa sequenceprotein db (domains)
- TBLASTn - aa sequenceDNA db (families)
- Families - goal is research and discovery of new
relationships.
20- Changing BLAST parameters
- Direct searching some db or db members
- MASK some regions (conserved, repetitive)
- Change gap penalties
- Change substitution matrix
- All of these changes alter output significantly
and it is IMPORTANT to record changes in
procedures. In general, you should NOT change
BLAST parameters in this course!