NCBI Molecular Biology Resources - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

NCBI Molecular Biology Resources

Description:

Heuristic approach based on Smith Waterman algorithm ... fur seal. sea lions. true seals. dogs. mongooses. cats. red panda. weasels. raccoon. NCBI FieldGuide ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 53
Provided by: peter942
Category:

less

Transcript and Presenter's Notes

Title: NCBI Molecular Biology Resources


1
NCBI Molecular Biology Resources
  • Using NCBI BLAST

March 2008
2
Basic Local Alignment Search Tool
  • Widely used similarity search tool
  • Heuristic approach based on Smith Waterman
    algorithm
  • Finds best local alignments
  • Provides statistical significance
  • All combinations (DNA/Protein) query and
    database.
  • DNA vs DNA
  • DNA translation vs Protein
  • Protein vs Protein
  • Protein vs DNA translation
  • DNA translation vs DNA translation
  • www, standalone, and network client

3
What BLAST tells you
  • BLAST reports surprising alignments
  • Different than chance
  • Assumptions
  • Random sequences
  • Constant composition
  • Conclusions
  • Surprising similarities imply evolutionary
    homology

Evolutionary Homology descent from a common
ancestor Does not always imply similar function
4
BLAST and BLAST-like programs
  • Traditional BLAST (blastall) nucleotide, protein,
    translations
  • blastn nucleotide query vs. nucleotide database
  • blastp protein query vs. protein database
  • blastx nucleotide query vs. protein database
  • tblastn protein query vs. translated nucleotide
    database
  • tblastx translated query vs. translated database
  • Megablast nucleotide only
  • Contiguous megablast
  • Nearly identical sequences
  • Discontiguous megablast
  • Cross-species comparison
  • Position Specific BLAST Programs protein only
  • Position Specific Iterative BLAST (PSI-BLAST)
  • Automatically generates a position specific score
    matrix (PSSM)
  • Reverse PSI-BLAST (RPS-BLAST)
  • Searches a database of PSI-BLAST PSSMs

5
Local Alignment Statistics
High scores of local alignments between two
random sequences follow the Extreme Value
Distribution
Expect Value E number of database hits you
expect to find by chance
size of database
your score
Alignments
expected number of random hits
Score
6
Scoring Systems
  • Position Independent Matrices
  • Nucleic Acids identity matrix
  • Proteins
  • PAM Matrices (Percent Accepted Mutation)
  • Implicit model of evolution
  • Higher PAM number all calculated from PAM1
  • PAM250 widely used
  • BLOSUM Matrices (BLOck SUbstitution Matrices)
  • Empirically determined from alignment
  • of conserved blocks
  • Each includes information up to a certain level
  • of identity
  • BLOSUM62 widely used
  • Position Specific Score Matrices (PSSMs)
  • PSI and RPS BLAST

7
WWW BLAST Interface
8
The BLAST homepage
www.ncbi.nlm.nih.gov/blast
9
Basic BLAST Databases
10
BLAST Databases Non-redundant protein
Services blastp blastx
  • nr (non-redundant protein sequences)
  • GenBank CDS translations
  • NP_, XP_ RefSeqs
  • Outside Protein
  • PIR, Swiss-Prot, PRF
  • PDB (sequences from structures)
  • pat protein patents
  • env_nr environmental samples

11
Nucleotide Databases Human and Mouse
Megablast, blastn service
  • Human and mouse genomic and transcript now
    default
  • Separate sections in output for mRNA and genomic
  • Direct links to Map Viewer for genomic sequences

12
Nucleotide Databases Traditional
Services blastn tblastn tblastx
13
Nucleotide Databases Traditional
Databases are mostly non-overlapping
  • htgs
  • HTG division
  • gss
  • GSS division
  • wgs
  • whole genome shotgun
  • env_nt
  • environmental samples
  • nr (nt)
  • Traditional GenBank
  • NM_ and XM_ RefSeqs
  • refseq_rna
  • refseq_genomic
  • NC_ RefSeqs
  • dbest
  • EST Division
  • est_human, mouse, others

14
Basic BLAST Protein Searches
15
Universal Form Protein
16
BLAST and Molecular Evolution
3000 Myr
1000 Myr
540 Myr
Alzheimers Disease
Ataxia telangiectasia
Colon cancer
Pancreatic carcinoma
17
Protein BLAST Page
18
Limiting Database Organism
Organism autocomplete
19
Limiting Database Entrez Query
allfilter NOT mammalsorganism gene_in_mitocho
ndrionProperties 20062007 Modification
Date Nucleotide biomol_mrnaProperties biomol_g
enomicProperties
20
Run Search
21
BLAST Formatting Page
Conserved Domain Results
22
BLAST Output Graphical Overview
Sort by taxonomy
mouse over
23
BLAST Output Descriptions
24
TaxBLAST Taxonomy Reports
25
BLAST Output Alignments
Identical match
positive score (conservative)
gap
Negative or zero
26
Position Specific Iterative BLAST
27
MLH1 and ETR1
gtgi4557757refNP_000240.1 MutL protein homolog
1 Homo sapiens MSFVAGVIRRLDETVVNRIAAGEVIQRPANAI
KEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKT
ADGKCAYRASYSDGKLKAPPK PCAGNQGTQITVEDLFYNIATRRKALK
NPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFL
LFINHRLVESTSLRKAIETVY AAYLPKNTHPFLYLSLEISPQNVDVNV
HPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQP
LSKPLSSQPQAIVTEDKTDIS SGRARQQDEEMLELPAPAEVAAKNQSL
EGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWA
LAQHQTKLYLLNTTKLSEELF YQILIYDFANFGVLRLSEPAPLFDLAM
LALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKE
CAMFYSIRKQYISEESTLSGQ QSEVPGSIPNSWKWTVEHIVYKALRSHI
LPPKHFTEDGNILQLANLPDLYKVFERC
Human Mismatch Repair Protein
gtgi22095656spO81122.1ETR1_MALDO Ethylene
receptor MLACNCIEPQWPADELLMKYQYISDFFIALAYFSIPLELIY
FVKKSAVFPYRWVLVQFGAFIVLCGATHL INLWTFSIHSRTVAMVMTTA
KVLTAVVSCATALMLVHIIPDLLSVKTRELFLKNKAAELDREMGLIRTQE
ETGRHVRMLTHEIRSTLDRHTILKTTLVELGRTLALEECALWMPTRTGL
ELQLSYTLRQQNPVGYTVPIH LPVINQVFSSNRAVKISANSPVAKLRQL
AGRHIPGEVVAVRVPLLHLSNFQINDWPELSTKRYALMVLML PSDSARQ
WHVHELELVEVVADQVAVALSHAAILEESMRARDLLMEQNIALDLARREA
ETAIRARNDFLAV MNHEMRTPMHAIIALSSLLQETELTAEQRLMVETIL
RSSNLLATLINDVLDLSRLEDGSLQLEIATFNLH SVFREVHNMIKPVAS
IKRLSVTLNIAADLPMYAIGDEKRLMQTILNVVGNAVKFSKEGSISITAF
VAKSE SLRDFRAPDFFPVQSDNHFYLRVQVKDSGSGINPQDIPKLFTKF
AQTQALATRNSGGSGLGLAICKRFVN LMEGHIWIESEGLGKGCTATFIV
KLGFPERSNESKLPFAPKLQANHVQTNFPGLKVLVMDDNGVSRSVTK GL
LAHLGCDVTAVSLIDELLHVISQEHKVVFMDVSMPGIDGYELAVRIHEKF
TKRHERPVLVALTGSIDK ITKENCMRVGVDGVILKPVSVDKMRSVLSEL
LEHRVLFEAM
Apple ethylene receptor
28
PSI-BLAST Iteration 1
29
PSI-BLASTIteration 4
Plant ethylene receptors, bacterial two-component
regulatory system kinases
30
RPS-BLAST Conserved Domains
31
Algorithm parameters Protein
Expand
Adjust to set stringency
Default statistics adjustment for compositional
bias
Off now by default. Conflicts with comp-based
stats
32
Automatic Short Sequence Adjustment
e-value 20000 Word Size
2 Matrix PAM30 Comp Stats Off Low Comp
Filter Off
Nucleotide and Protein
33
Basic BLAST Nucleotide
34
Universal Form Nucleotide
35
Nucleotide Results ALB mRNA
megablast
disco. megablast
blastn
36
Nucleotide BLAST Human Genome
37
Sortable Results
Separate Sections for Transcript and Genome
Direct links to Entrez Databases
38
Total Score All Segments
39
Alignments Sorting in Exon Order
40
Links to Map Viewer
Chromosome 9
Chromosome 1
41
Algorithm parameters Nucleotide
blastn
  • Prevents starting alignment in masked region
  • Allows extensions through masked regions

Masks LC sequence (simple repeats)
42
BLAST Formatting Options
43
Protein Formatting Page
as HTML Plain Text ASN.1 XML
Show Alignment PSSM PssmWithParameters Bioseq
Alignment View Pairwise Pairwise with dots for
identities Query-anchored with dots for
identities Query-anchored with letters for
identities Flat query-anchored with dots for
identities Flat-query anchored with letters for
identities Hit table
44
Structured formats XML and ASN.1
ltIteration_hitsgt -ltHitgt ltHit_numgt1lt/Hit_numgt ltHit_
idgtgi730028spP40692MLH1_HUMANlt/Hit_idgt -ltHit_d
efgt DNA mismatch repair protein Mlh1 (MutL
protein homolog 1) lt/Hit_defgt ltHit_accessiongtP4069
2lt/Hit_accessiongt ltHit_lengt756lt/Hit_lengt -ltHit_hsp
sgt -ltHspgt ltHsp_numgt1lt/Hsp_numgt ltHsp_bit-scoregt1568
.9lt/Hsp_bit-scoregt ltHsp_scoregt4061lt/Hsp_scoregt ltHs
p_evaluegt0lt/Hsp_evaluegt ltHsp_query-fromgt1lt/Hsp_que
ry-fromgt ltHsp_query-togt756lt/Hsp_query-togt ltHsp_hit
-fromgt1lt/Hsp_hit-fromgt ltHsp_hit-togt756lt/Hsp_hit-to
gt ltHsp_query-framegt0lt/Hsp_query-framegt ltHsp_hit-fr
amegt0lt/Hsp_hit-framegt ltHsp_identitygt0lt/Hsp_identit
ygt ltHsp_positivegt0lt/Hsp_positivegt ltHsp_gapsgt0lt/Hsp
_gapsgt ltHsp_align-lengt756lt/Hsp_align-lengt
XML
Seq-annot desc user type
str "Hist Seqalign" , data
label str "Hist Seqalign"
, data bool TRUE ,
user type str "Blast Type" ,
data label id
0 , data int 0 ,
user type str "BLAST database
title" , data label
str "Non-redundant SwissProt
ASN.1
45
The Hit Table
BLASTP 2.2.17 (Aug-26-2007) Query
gi4557757refNP_000240.1 MutL protein homolog
1 Homo sapiens Database swissprot Fields
query id, subject ids, identity, positives,
alignment length, mismatches, gap opens, q.
start, q. end, s. start, s. end, evalue, bit
score 80 hits found refNP_000240.1gi4557757
gi1709056spP38920MLH1_YEAST 36.68 56.91 796
426 18 8 756 5 769 7e-138 491 refNP_000240.1gi
4557757 gi48474996spQ9P7W6MLH1_SCHPO 37.24
54.04 768 371 16 8 756 9 684 8e-122
437 refNP_000240.1gi4557757
gi25090753spQ8RA70MUTL_THETN 37.44 54.62 390
231 7 8 394 4 383 5e-59 229 refNP_000240.1gi4
557757 gi25090732spQ8KAX3MUTL_CHLTE 35.95
54.05 370 229 5 8 375 4 367 5e-55
215 refNP_000240.1gi4557757
gi127552spP23367.2MUTL_ECOLI 35.99 58.11 339
202 7 8 334 3 338 8e-55 214 refNP_000240.1gi4
557757 gi29427778spQ8FAK9MUTL_ECOL6 35.99
58.11 339 202 7 8 334 3 338 1e-54
214 refNP_000240.1gi4557757
gi20455084spQ8XDN4MUTL_ECO57 35.99 58.11 339
202 7 8 334 3 338 1e-54 214 refNP_000240.1gi4
557757 gi59798328spQ72PF7MUTL_LEPIC 36.27
55.20 375 221 8 6 375 2 363 3e-54
213 refNP_000240.1gi4557757
gi13431695spP57886MUTL_PASMU 35.48 58.94 341
213 6 8 345 3 339 4e-54 212 refNP_000240.1gi4
557757 gi1171080spP44494MUTL_HAEIN 35.74
59.87 319 198 6 8 323 3 317 5e-54
212 refNP_000240.1gi4557757
gi20455102spQ8ZIW4MUTL_YERPE 36.01 58.63 336
207 6 8 339 3 334 6e-54 212 refNP_000240.1gi4
557757 gi20455152spQ9JYT2MUTL_NEIMB 33.96
55.35 374 224 8 8 376 4 359 2e-53
210 refNP_000240.1gi4557757
gi20139217spQ9KAC1MUTL_BACHD 35.39 55.90 356
214 6 8 362 4 344 2e-53 209 refNP_000240.1gi4
557757 gi31076794spQ87L05MUTL_VIBPA 35.33
58.38 334 210 5 8 338 3 333 3e-53
209 refNP_000240.1gi4557757
gi20455150spQ9JTS2MUTL_NEIMA 36.94 58.28 314
183 5 8 316 4 307 5e-53 209 refNP_000240.1gi4
557757 gi56749233spQ6GHD9MUTL_STAAR 38.28
58.46 337 193 7 6 335 2 330 1e-52
207 refNP_000240.1gi4557757
gi25090739spQ8NWX9MUTL_STAAW 38.28 58.46 337
193 7 6 335 2 330 1e-52 207 refNP_000240.1gi4
557757 gi71151979spQ5HGD5MUTL_STAAC 38.28
58.46 337 193 7 6 335 2 330 1e-52
207 refNP_000240.1gi4557757
gi54037875spP65492MUTL_STAAN 38.28 58.46 337
193 7 6 335 2 330 2e-52 207 refNP_000240.1gi4
557757 gi20043258spQ9KV13MUTL_VIBCH 35.74
58.56 333 204 6 8 335 3 330 2e-52
207 refNP_000240.1gi4557757
gi127553spP14161MUTL_SALTY 35.10 56.93 339
205 7 8 334 3 338 3e-52 206 refNP_000240.1gi4
557757 gi20455140spQ9CDL1MUTL_LACLA 36.31
56.55 336 196 5 6 334 2 326 4e-52
206 refNP_000240.1gi4557757
gi61214242spQ7MH01MUTL_VIBVY 34.63 58.51 335
213 5 8 339 3 334 4e-52 206 refNP_000240.1gi4
557757 gi20455099spQ8Z187MUTL_SALTI 35.10
56.93 339 205 7 8 334 3 338 4e-52
206 refNP_000240.1gi4557757
gi31076809spQ8DCV0MUTL_VIBVU 34.63 58.51 335
213 5 8 339 3 334 6e-52 205 refNP_000240.1gi4
557757 gi71648717spQ5E2C6MUTL_VIBF1 36.71
59.81 316 186 6 8 316 3 311 1e-51
204 refNP_000240.1gi4557757
gi37999611spQ88DD1MUTL_PSEPK 30.34 48.97 435
278 7 8 419 7 439 2e-51 203
Importable into spreadsheets
46
PSSMs Restart PSI-BLAST
ASN.1 ScoreMat, Portable
ASCII encoded, Web only
47
BLAST TreeView
Black bear mt genome vs. RefSeq Genomic
48
Distance Tree Carnivore Mitochondrial Genome
raccoon
weasels
red panda
cats
mongooses
dogs
true seals
sea lions
fur seal
walrus
bears
49
Managing Searches
  • Recent Results
  • Saved Strategies

50
Recent Results
Login to My NCBI to save search strategies
Results available for 36 hours
51
Saved Strategies
Re-run searches to keep up to date
52
Genome and Specialized BLAST
53
Genome BLAST pages
54
Map Viewer Homepage
55
Poplar Genome BLAST
56
tblastn Genome BLAST Results
Protein-nucleotide alignments
Exons and genes mixed
57
Genomic Context of BLAST Hits
58
Hits in Map Viewer
59
Specialized BLAST Pages
60
Service Addresses
  • General Help info_at_ncbi.nlm.nih.gov
  • BLAST blast-help_at_ncbi.nlm.nih.gov

Telephone support 301- 496- 2475
Write a Comment
User Comments (0)
About PowerShow.com