PsiBLAST, Prosite, UCSC Genome Browser Lecture 3 - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

PsiBLAST, Prosite, UCSC Genome Browser Lecture 3

Description:

The challenge is to turn these raw data into biological knowledge ... Profiles are used in prosite when the motif is relatively divergent, and is ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 37
Provided by: ibisT
Category:

less

Transcript and Presenter's Notes

Title: PsiBLAST, Prosite, UCSC Genome Browser Lecture 3


1
Psi-BLAST,Prosite, UCSC Genome
BrowserLecture 3
2
Searching for remote homologs
  • Sometimes BLAST isnt enough
  • Large protein family, and BLAST only finds close
    members. We want more distant members

3
PSI-BLAST
  • Position Specific Iterative BLAST

Regular blast
Construct profile from blast results
Blast profile search
Final results
4
Consensus, Pattern, PSSM
Consensus the most frequent character in the
column is chosen
Pattern represents the alignment as a regular
expression
Profile PSSM Position Specific Score Matrix
Pos
Nuc
A A C T T G
A-TA-C-T-T-GC
5
Pos
Nuc
S(AACCAA)10.6711.25.33 S(GACCAA)0 Sequences
with higher scores -gt higher chance of being
related to the PSSM
6
PSI-BLAST
  • Position Specific Iterative BLAST

Regular blast
Construct profile from blast results
Blast profile search
Final results
7
BLAST PSI-Blast
8
PSI-Blast - results
9
PSI-BLAST
  • Advantage PSI-BLAST looks for seqs that are
    close to the query, and learns from them to
    extend the circle of friends
  • Disadvantage if we obtained a WRONG hit, we will
    get to unrelated sequences (contamination). This
    gets worse and worse each iteration

10
PSI-BLAST
  • Which of the following is/are correct?
  • PSI-BLAST is expected to give more hits than
    BLAST
  • PSI-BLAST is an iterative search method
  • PSI-BLAST is faster than BLAST
  • Each iteration of PSI-BLAST can only improve the
    results of the previous iteration

11
Turning information into knowledge
  • The outcome of a sequencing project are masses of
    raw data
  • The challenge is to turn these raw data into
    biological knowledge
  • A valuable tool for this challenge is an
    automated diagnostic pipe through which newly
    determined sequences can be streamlined

12
From sequence to function
  • Nature tends to innovate rather than invent
  • Proteins are composed of functional elements
    domains and motifs
  • Domains are structural
  • units that carry out a
  • certain function. They are
  • shared between different
  • proteins
  • Motifs are shorter
  • and are usually critical
  • for the biological activity

13
http//www.expasy.ch/prosite
14
Prosite
  • From analyzing conserved regions in protein
    sequences it is possible to derive signatures of
    motifs and domains
  • Prosite consists of annotated sites/motifs/signatu
    res/fingerprints
  • Given an uncharacterized translated protein
    sequence, prosite tries to predict which motifs
    and domains make up the protein and thus identify
    the family to which it belongs

15
Prosite
  • Prosite represents entries with patterns or
    profiles

profile
pattern
A-TA-C-T-T-GC
  • Profiles are used in prosite when the motif is
    relatively divergent, and is difficult to
    represent as a pattern
  • Profiles also characterize domains over their
    entire length, not just the motif

16
Prosite sequence query
17
(No Transcript)
18
Patterns with a high probability of occurrence
  • Entries describing commonly found
    post-translational modifications or
    compositionally biased regions
  • Found in the majority of known protein sequences
  • High probability of occurrence
  • Prosite filters them by default

19
Scanning Prosite
Query sequence
Query pattern
Result all patterns found in the sequence
Result all sequences which adhere to this pattern
20
Prosite pattern query
21
(No Transcript)
22
(No Transcript)
23
UCSC Genome Browser
24
UCSC Genome Browser - Gateway
25
UCSC Genome Browser - Gateway
26
Results
27
Annotation tracks
Base position
UTR
UCSC Genes
RefSeq Genes
mRNAs (GenBank)
Intron
Coding
GeneDirection
SNPs
Repeats
28
UCSC Gene
29
UCSC Genome Browser - movement
Zoom x3 Center
30
Controllingannotationtracks
31
(No Transcript)
32
BLAT
  • BLAT Blast-Like Alignment Tool
  • BLAT is designed to find similarity of gt95 on
    DNA, gt80 for protein
  • Rapid search by indexing entire genome
  • Good for
  • Finding genomic coordinates of cDNA
  • Determining exons/introns
  • Finding human (or chimp, dog, cow) homologs of
    another vertebrate sequence

33
BLAT on UCSC Genome Browser
34
BLAT search
35
BLAT Results
36
BLAT Results
query
Match
Non-Match(mismatch/indel)
hit
Indel boundaries
Write a Comment
User Comments (0)
About PowerShow.com