Similarity Searches on Sequence Databases - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Similarity Searches on Sequence Databases

Description:

BLASTing Protein Sequence. 2 strategies. Compare; a protein with a protein database : BLASTP ... different BLAST servers return different results instead of the ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 25
Provided by: Fat1
Category:

less

Transcript and Presenter's Notes

Title: Similarity Searches on Sequence Databases


1
Similarity Searches on Sequence Databases
  • Chapter 7 Page215

2
A story
  • H. pylori was discover in 1984
  • its genome was first sequenced in 1990s
  • this was published in NATURE.
  • In this publication, all proteins translated by
    the genome were also published
  • HOW did they do in a short time?

3
HOW?
  • They compare the sequence of the genome of H.
    pylori with those of other bacteria.
  • Then they predicted the proteins of H. pylori and
    its metabolits.

4
What does this similarity mean?
  • if two protein or gene sequences are similar,
    they are homologues.
  • SO
  • They are from similar organisms
  • similar proteins means
  • similar functions
  • similar structures
  • that is, similar charactersitics

5
How similar is very similar
  • For proteins
  • if gt25 identity between 2 proteins, they are
    similar

The range of identity lt25 is called the TWILIGHT
ZONE. Nothing is sure about similarity. For
nucleotides, the limit is 70
similarity (homologous)
6
Homology
  • Addition to , some other information is
    essential to say that there is a homology between
    2 ones
  • Expectation value less value, more homology,
  • Lenght of the similar segments
  • Patterns of a.a conservation
  • Number of insertions/deletions

7
BLAST (Basic Local Assightment and Search Tool)
  • 30 years ago, to scan the simility between our
    query and hundreds of others we would need
    several hours -(print, put on the wall, compare
    one by one manualy-)
  • NOW, by speedy computers, we compare ours with
    millons at most in several minutes.

8
BLASTing Protein Sequence
  • 2 strategies
  • Compare
  • a protein with a protein database BLASTP
  • a protein with a nucleotide database TBLASTN
  • (machine turns your nucleotide seq. into 6
    possible sequence)
  • Important BLAST servers
  • BLAST server from NCBI from USA
  • BLAST server from Swiss EMBnet
  • if U learn one, U use other(s)

9
(No Transcript)
10
(No Transcript)
11
Which we should choose
  • Dependin on
  • Database Choose the one using a database you
    want
  • Speed Choose the one which is not crowded (in
    Turkey, no problem during day until 5 because US
    and Japan in dark)
  • different BLAST servers return different results
    instead of the same query because of differences
    between their databases

12
BLAST output contains
  • A graphic display
  • A hit list
  • The alighments
  • The parameters

13
A graphic display
  • which part of other sequences is similar to yours
  • This part can be different or absent in some
    servers.
  • What colors say best, good, moderate, worse,
    worst
  • what does length say the same length...homologous
    , shorter corresponds to the domain

14
(No Transcript)
15
A hit list
  • Accesion number (spSWISS-PROT) name
  • Description You estimate whether it is
    interested or not
  • Score if lt50, unreliable
  • E-value lower E, more similarity
    Egt0.001.twilight zone.
  • E approaching 0 is the best

16
Alignments
  • Alignments say smthng on similarities btw seq
  • identity gt25 is good
  • lengthlength of alignment. short alignments
    gives generally high E values
  • Top is ours bottom is hit () shows similar aa
  • XXXXXX low complexity region
  • numbers shows the coordinates

17
(No Transcript)
18
BLASTing DNA sequences
  • If it is reading frame, tranlate it to protein
    than blast.
  • if not choose one of them below
  • a DNA from DNA BLASTN
  • a TDNA from TDNA TBLASTX
  • a TDNA from protein BLASTX
  • Ttranslated it means blast tanslates our
    sequence into 6 possible protein sequence

19
Strategies for right choice of BLAST type for DNA
20
(No Transcript)
21
controlling blast right parameters
22
Control sequence masking
  • Protein Remove low-complexity regions
  • DNA many repeats. filterhuman repeats

23
BLAST output
  • a less homologous sequence can be important
  • WHAT? Adjust parameters
  • suitable database decrease results, use swiss p.
  • use the magic tags of enrez query
  • Adjust E-value

24
PSI-BLAST (Position Specific Iterated-BLAST)
  • BLAST finds close relatives.
  • To find far relatives, use PSI-BLAST
  • It uses more complex scoring procedures.
Write a Comment
User Comments (0)
About PowerShow.com