Gapped BLAST and PSI-BLAST - PowerPoint PPT Presentation

About This Presentation
Title:

Gapped BLAST and PSI-BLAST

Description:

Pi : background probability that amino acids occur randomly at all position ... GAT 2 6. TCG 4. TTT. For protein sequences: Seq. A = ELVIS. Add xyz to the hash table ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 29
Provided by: changk
Category:
Tags: blast | psi | gapped | gat

less

Transcript and Presenter's Notes

Title: Gapped BLAST and PSI-BLAST


1
Gapped BLAST and PSI-BLAST
  • Altschul et al
  • Presenter ??? ???

2
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST
  • Demonstration

3
Statistical preliminaries
  • Pi background probability that amino acids
    occur randomly at all position
  • E number of distinct HSPs with normalized score
    at least S
  • sij
  • qij target frequency of aligned pair of letters
    (i, j) with HSP, high-scoring segment paris

4
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST

5
BLAST
  • Basic Local Alignment Search Tool(by Altschul,
    Gish, Miller, Myers and Lipman)
  • The central idea of the BLAST algorithm is that a
    statistically significant alignment is likely to
    contain a high-scoring pair of aligned words.

6
The maximal segment pair measure
  • A maximal segment pair (MSP) is defined to be the
    highest scoring pair of identical length segments
    chosen from 2 sequences.(for DNA Identities
    5 Mismatches -4)
  • The MSP score may be computed in time
    proportional to the product of their lengths.
    (How?) An exact procedure is too time consuming.
  • BLAST heuristically attempts to calculate the MSP
    score.

the highest scoring pair
7
BLAST
  • Build the hash table for Sequence A.
  • Scan Sequence B for hits.
  • Extend hits.

8
BLAST
Step 1 Build the hash table for Sequence A.
(3-tuple example)
For protein sequences Seq. A ELVISAdd xyz to
the hash table if Score(xyz, ELV) ? TAdd
xyz to the hash table if Score(xyz, LVI) ?
TAdd xyz to the hash table if Score(xyz,
VIS) ? T
For DNA sequences Seq. A AGATCGAT
12345678 AAAAAC..AGA 1..ATC 3..CGA
5..GAT 2 6..TCG 4..TTT

The higher T, the less sensitivity, but faster
9
BLAST
Step2 Scan sequence B for hits.
10
BLAST
Step2 Scan sequence B for hits.
Step 3 Extend hits.
BLAST 2.0 saves the time spent in extension, and
considers gapped alignments.
hit
Terminate if the score of the sxtension fades
away. (That is, when we reach a segment pair
whose score falls a certain distance below the
best score found for shorter extensions.)
11
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST

12
Two-Hit Method
  • BLAST 1.o
  • Extension step accounts for 90 of total time
  • Observations
  • HSP of interest is much longer than a single word
    pair
  • Entail multiple hits on the same diagonal and
    within short distance of one another
  • Invoke an extension only when two non-overlapping
    hits are found within distance A on the same
    diagonal

13
Demonstration
  • Recenti the most recent hit found on the ith
    diagonal (always increasing)

overlap
14
Discussion
  • T must to be lowered
  • More one-hits while the majority are dismissed
  • Speed
  • Twice as rapid as one-hit
  • Sensitivity
  • Almost the same

15
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST

16
Gapped BLAST
  • Original BLAST find several distinct HSPs
  • All HSPs related to one alignment should be found
  • Now
  • Find one HSP only seed, than use 2-hit
  • T can be raised ? faster
  • Find all HSPs vs find one HSP for one optimal
    alignment
  • For example, result should gt 0.95, p miss prob
    of HSP
  • Orignial with 2 HSP (1-p)(1-p)gt0.95? plt0.025
  • Now p2lt0.05?p0.22

17
Gapped BLAST (contd)
  • A gapped extension takes much longer to execute
    than an ungapped extension, but by performing
    very few of them the fraction of the total time
    could be kept low.
  • Trigger a gapped extension for any HSP exceeding
    score Sg

18
Example
  • Original BLAST locates only the first and the
    last ungapped aligment, E-value gt 50 times

19
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST

20
PSI-BLAST
  • position-specific score matrices
  • Vs substitution matrices
  • Use it as ordinary ways
  • Iterated, using position-specific score matrices
  • For a BLAST run
  • Constructed automatically from the output
  • Use this matrix in place of the query for the
    next run
  • For proteins, query L
  • Position-specific matrix L 20
  • Benefits
  • Better to detect weak relationships

21
Construct Position-specific matrix
  • Construct multiple alignment M from the output
  • For every column of M
  • Find reduced Mc of column C
  • Calculate scores in column C of the
    position-specific matrix

22
Construct multiple alignment M
  • Collect sequence segments output
  • With E-value below a Threshold (why)
  • Identical sequence are dropped
  • Pair-wise alignment columns with query involves
    inserted gap are ignored
  • Multiple alignment M has same length (column
    length) as query

23
Construct multiple alignment M
24
Calculate position-specific matrix score
  • The scores of a given alignment column should
    dependent the residues appeared on the column
  • But upon those in other columns as well

25
Find reduced Mc of column C
  • R sequences contribute a residue in column C
  • Mc those columns of M in which all the sequences
    are represented

26
Calculate scores in column C of the
position-specific matrix
  • Related to all residues frequency observed fi,
    and number of independent residues in column C
    (Nc)
  • log(Qi/Pi)
  • Qi estimated probability for residue i to be
    found in C

27
BLAST applied to position-specific matrices
  • Scale with sij

28
  • Thank you
  • Any problems now?

29
Outline
  • BLAST 1.0 background (from lecture slides)
  • BLAST 2.0
  • Gapped BLAST
  • PSI-BLAST
  • Demonstration
Write a Comment
User Comments (0)
About PowerShow.com