Title: PsiBLAST, Prosite, UCSC Genome Browser Lecture 3
1Psi-BLAST,Prosite, UCSC Genome
BrowserLecture 3
2Searching for remote homologs
- Sometimes BLAST isnt enough
- Large protein family, and BLAST only finds close
members. We want more distant members
3PSI-BLAST
- Position Specific Iterative BLAST
Regular blast
Construct profile from blast results
Blast profile search
Final results
4Consensus, Pattern, PSSM
Consensus the most frequent character in the
column is chosen
Pattern represents the alignment as a regular
expression
Profile PSSM Position Specific Score Matrix
Pos
Nuc
A A C T T G
A-TA-C-T-T-GC
5Pos
Nuc
S(AACCAA)10.6711.25.33 S(GACCAA)0 Sequences
with higher scores -gt higher chance of being
related to the PSSM
6PSI-BLAST
- Position Specific Iterative BLAST
Regular blast
Construct profile from blast results
Blast profile search
Final results
7BLAST PSI-Blast
8PSI-Blast - results
9PSI-BLAST
- Advantage PSI-BLAST looks for seqs that are
close to the query, and learns from them to
extend the circle of friends - Disadvantage if we obtained a WRONG hit, we will
get to unrelated sequences (contamination). This
gets worse and worse each iteration
10PSI-BLAST
- Which of the following is/are correct?
- PSI-BLAST is expected to give more hits than
BLAST - PSI-BLAST is an iterative search method
- PSI-BLAST is faster than BLAST
- Each iteration of PSI-BLAST can only improve the
results of the previous iteration
11 Turning information into knowledge
- The outcome of a sequencing project are masses of
raw data - The challenge is to turn these raw data into
biological knowledge - A valuable tool for this challenge is an
automated diagnostic pipe through which newly
determined sequences can be streamlined
12From sequence to function
- Nature tends to innovate rather than invent
- Proteins are composed of functional elements
domains and motifs - Domains are structural
- units that carry out a
- certain function. They are
- shared between different
- proteins
- Motifs are shorter
- and are usually critical
- for the biological activity
13http//www.expasy.ch/prosite
14Prosite
- From analyzing conserved regions in protein
sequences it is possible to derive signatures of
motifs and domains - Prosite consists of annotated sites/motifs/signatu
res/fingerprints - Given an uncharacterized translated protein
sequence, prosite tries to predict which motifs
and domains make up the protein and thus identify
the family to which it belongs
15Prosite
- Prosite represents entries with patterns or
profiles
profile
pattern
A-TA-C-T-T-GC
- Profiles are used in prosite when the motif is
relatively divergent, and is difficult to
represent as a pattern - Profiles also characterize domains over their
entire length, not just the motif
16Prosite sequence query
17(No Transcript)
18Patterns with a high probability of occurrence
- Entries describing commonly found
post-translational modifications or
compositionally biased regions - Found in the majority of known protein sequences
- High probability of occurrence
- Prosite filters them by default
19Scanning Prosite
Query sequence
Query pattern
Result all patterns found in the sequence
Result all sequences which adhere to this pattern
20Prosite pattern query
21(No Transcript)
22(No Transcript)
23UCSC Genome Browser
24UCSC Genome Browser - Gateway
25UCSC Genome Browser - Gateway
26Results
27Annotation tracks
Base position
UTR
UCSC Genes
RefSeq Genes
mRNAs (GenBank)
Intron
Coding
GeneDirection
SNPs
Repeats
28UCSC Gene
29UCSC Genome Browser - movement
Zoom x3 Center
30Controllingannotationtracks
31(No Transcript)
32BLAT
- BLAT Blast-Like Alignment Tool
- BLAT is designed to find similarity of gt95 on
DNA, gt80 for protein - Rapid search by indexing entire genome
- Good for
- Finding genomic coordinates of cDNA
- Determining exons/introns
- Finding human (or chimp, dog, cow) homologs of
another vertebrate sequence
33BLAT on UCSC Genome Browser
34BLAT search
35BLAT Results
36BLAT Results
query
Match
Non-Match(mismatch/indel)
hit
Indel boundaries