BCB 444/544 - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

BCB 444/544

Description:

BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST. 2. Exhaustive vs Heuristic Methods ... Today's Lab: focus on BLAST. Basic Local Alignment Search Tool. STEPS: ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 13
Provided by: publicI
Category:
Tags: bcb | lab

less

Transcript and Presenter's Notes

Title: BCB 444/544


1
BCB 444/544
  • Lab 3
  • BLAST
  • Scoring Matrices Alignment Statistics
  • Sept6

2
Exhaustive vs Heuristic Methods
  • Exhaustive - tests every possible solution
  • guaranteed to give best answer
  • (identifies optimal solution)
  • can be very time/space intensive!
  • e.g., Dynamic Programming
  • as in Smith-Waterman algorithm
  • Heuristic - does NOT test every possibility
  • no guarantee that answer is best
  • (but, often can identify optimal
    solution)
  • sacrifices accuracy (potentially) for speed
  • uses "rules of thumb" or "shortcuts"
  • e.g., BLAST FASTA

3
Today's Lab focus on BLAST Basic Local
Alignment Search Tool
  • STEPS
  • Create list of very possible "word" (e.g., 3-11
    letters) from query sequence
  • Search database to identify sequences that
    contain matching words
  • Score match of word with sequence, using a
    substitution matrix
  • Extend match (seed) in both directions, while
    calculating alignment score at each step
  • Continue extension until score drops below a
    threshold (due to mismatches)
  • Contiguous aligned segment pair (no gaps) is
    called
  • High Scoring Segment Pair (HSP)

4
Today's Lab focus on BLAST Basic Local
Alignment Search Tool
  • Results?
  • Original version of BLAST?
  • List of HSPs Maximum Scoring Pairs
  • More recent, improved versionof BLAST?
  • Allows gaps Gapped Alignment
  • How? Allows score to drop below threshold,
  • (but only temporarily)

5
BLAST - a few details
  • Developed by Stephen Aultschul at NCBI in 1990
  • Word length?
  • Typically 3 aa for protein sequence
  • 11 nt for DNA sequence
  • Substitution matrix?
  • Default is BLOSUM62
  • Can change under Algorithm Parameters
  • Choose other BLOSUM or PAM matrices
  • Stop Extension Threshold?
  • Typically 22 for proteins
  • 20 for DNA

6
BLAST - a few more details
  • BLAST is family of programs with several
    "variants"
  • BLASTN -
  • BLASTP -
  • BLASTX -
  • TBLASTM -
  • TBLASTX -
  • Statistical Significance?
  • E-value E m x n x P
  • m total number of residues in database
  • n number of residues in query sequence
  • P probability that an HSP is result of random
    chance
  • lower E-value, less likely to result from random
    change, thus higher significance
  • Bit Score S' is normalized, to account for
    sequence length differences size of database
  • Low Complexity Masking - remove repeats that
    confound scoring

7
"Scoring" or "Substitution" Matrices
  • 2 Major types for Amino Acids PAM BLOSUM
  • PAM Point Accepted Mutation
  • relies on "evolutionary model" based on
    observed differences in alignments of closely
    related proteins
  • BLOSUM BLOck SUbstitution Matrix
  • based on aa substitutions observed in blocks
    of conserved sequences within evolutionarily
    divergent proteins

8
PAM Matrix
  • PAM Point Accepted Mutation
  • relies on "evolutionary model" based on observed
    differences in closely related proteins
  • Model includes defined rate for each type of
    sequence change
  • Suffix number (n) reflects amount of "time"
    passed rate of expected mutation if n of amino
    acids had changed
  • PAM1 - for less divergent sequences (shorter
    time)
  • PAM250 - for more divergent sequences (longer
    time)

9
BLOSUM Matrix
  • BLOSUM BLOck SUbstitution Matrix
  • based on aa substitutions observed in blocks
    of conserved sequences within evolutionarily
    divergent proteins
  • Doesn't rely on a specific evolutionary model
  • Suffix number (n) reflects expected similarity
    average aa identity in the MSA from which the
    matrix was generated
  • BLOSUM45 - for more divergent sequences
  • BLOSUM62 - for less divergent sequences

10
BLOSUM62 Substitution Matrix
  • s(a,b) corresponds to score of aligning character
    a with character b
  • Match scores are often calculated
  • based on frequency of mutations in very similar
    sequences
  • (more details later)

11
(No Transcript)
12
Affine Gap Penalty Functions
  • Affine Gap Penalties Differential Gap
    Penalties used to reflect cost differences
    between opening a gap and extending an existing
    gap
  • Total Gap Penalty is linear function of gap
    length
  • W ? ? X (k - 1)
  • where ? gap opening penalty
  • ? gap extension penalty
  • k length of gap
  • Sometimes, a Constant Gap Penalty is used, but it
    is usually least realistic than the Affine Gap
    Penalty

Can also be solved in O(nm) time using DP
Write a Comment
User Comments (0)
About PowerShow.com