Homology Search Tools - PowerPoint PPT Presentation

About This Presentation
Title:

Homology Search Tools

Description:

Filtering is based on the observation that a good alignment usually includes ... The idea of filtration was used in FASTA, BLAST, BLAT, and PatternHunter. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 27
Provided by: VERI152
Category:
Tags: homology | scan | search | tools

less

Transcript and Presenter's Notes

Title: Homology Search Tools


1
Homology Search Tools
  • Kun-Mao Chao (???)
  • Department of Computer Science and Information
    Engineering
  • National Taiwan University, Taiwan
  • WWW http//www.csie.ntu.edu.tw/kmchao

2
Homology Search Tools
  • Smith-Waterman(Smith and Waterman, 1981
    Waterman and Eggert, 1987)
  • FASTA(Wilbur and Lipman, 1983 Lipman and
    Pearson, 1985)
  • BLAST(Altschul et al., 1990 Altschul et al.,
    1997)
  • BLAT(Kent, 2002)
  • PatternHunter(Li et al., 2004)

3
Finding Exact Word Matches
  • Hash Tables
  • Suffix Trees
  • Suffix Arrays

4
Hash Tables
5
Suffix Trees (I)
6
Suffix Trees (II)
7
Suffix Arrays
8
FASTA
  1. Find runs of identities, and identify regions
    with the highest density of identities.
  2. Re-score using PAM matrix, and keep top scoring
    segments.
  3. Eliminate segments that are unlikely to be part
    of the alignment.
  4. Optimize the alignment in a band.

9
FASTA
Step 1 Find runes of identities, and identify
regions with the highest density of identities.
Sequence B
Sequence A
10
FASTA
Step 2 Re-score using PAM matrix, andkeep top
scoring segments.
11
FASTA
Step 3 Eliminate segments that are unlikely to
be part of the alignment.
12
FASTA
Step 4 Optimize the alignment in a band.
13
BLAST
  • Basic Local Alignment Search Tool(by Altschul,
    Gish, Miller, Myers and Lipman)
  • The central idea of the BLAST algorithm is that a
    statistically significant alignment is likely to
    contain a high-scoring pair of aligned words.

14
The maximal segment pair measure
  • A maximal segment pair (MSP) is defined to be the
    highest scoring pair of identical length segments
    chosen from 2 sequences.(for DNA Identities
    5 Mismatches -4)
  • The MSP score may be computed in time
    proportional to the product of their lengths.
    (How?) An exact procedure is too time consuming.
  • BLAST heuristically attempts to calculate the MSP
    score.

the highest scoring pair
15
A matrix of similarity scores
16
A maximum-scoring segment
17
BLAST
  • Build the hash table for Sequence A.
  • Scan Sequence B for hits.
  • Extend hits.

18
BLAST
Step 1 Build the hash table for Sequence A.
(3-tuple example)
For protein sequences Seq. A ELVISAdd xyz to
the hash table if Score(xyz, ELV) ? TAdd
xyz to the hash table if Score(xyz, LVI) ?
TAdd xyz to the hash table if Score(xyz,
VIS) ? T
For DNA sequences Seq. A AGATCGAT
12345678 AAAAAC..AGA 1..ATC 3..CGA
5..GAT 2 6..TCG 4..TTT

19
BLAST
Step2 Scan sequence B for hits.
20
BLAST
Step2 Scan sequence B for hits.
Step 3 Extend hits.
BLAST 2.0 saves the time spent in extension, and
considers gapped alignments.
hit
Terminate if the score of the sxtension fades
away. (That is, when we reach a segment pair
whose score falls a certain distance below the
best score found for shorter extensions.)
21
Gapped BLAST (I)
The two-hit method
22
Gapped BLAST (II)
Confining the dynamic-programming
23
BLAT
24
PatternHunter (I)
25
PatternHunter (II)
26
Remarks
  • Filtering is based on the observation that a good
    alignment usually includes short identical or
    very similar fragments.
  • The idea of filtration was used in FASTA, BLAST,
    BLAT, and PatternHunter.
Write a Comment
User Comments (0)
About PowerShow.com