Title: NCBI Molecular Biology Resources
1NCBI Molecular Biology Resources
March 2008
2Basic Local Alignment Search Tool
- Widely used similarity search tool
- Heuristic approach based on Smith Waterman
algorithm - Finds best local alignments
- Provides statistical significance
- All combinations (DNA/Protein) query and
database. - DNA vs DNA
- DNA translation vs Protein
- Protein vs Protein
- Protein vs DNA translation
- DNA translation vs DNA translation
- www, standalone, and network client
3What BLAST tells you
- BLAST reports surprising alignments
- Different than chance
- Assumptions
- Random sequences
- Constant composition
- Conclusions
- Surprising similarities imply evolutionary
homology
Evolutionary Homology descent from a common
ancestor Does not always imply similar function
4BLAST and BLAST-like programs
- Traditional BLAST (blastall) nucleotide, protein,
translations - blastn nucleotide query vs. nucleotide database
- blastp protein query vs. protein database
- blastx nucleotide query vs. protein database
- tblastn protein query vs. translated nucleotide
database - tblastx translated query vs. translated database
- Megablast nucleotide only
- Contiguous megablast
- Nearly identical sequences
- Discontiguous megablast
- Cross-species comparison
- Position Specific BLAST Programs protein only
- Position Specific Iterative BLAST (PSI-BLAST)
- Automatically generates a position specific score
matrix (PSSM) - Reverse PSI-BLAST (RPS-BLAST)
- Searches a database of PSI-BLAST PSSMs
5Local Alignment Statistics
High scores of local alignments between two
random sequences follow the Extreme Value
Distribution
Expect Value E number of database hits you
expect to find by chance
size of database
your score
Alignments
expected number of random hits
Score
6Scoring Systems
- Position Independent Matrices
- Nucleic Acids identity matrix
- Proteins
- PAM Matrices (Percent Accepted Mutation)
- Implicit model of evolution
- Higher PAM number all calculated from PAM1
- PAM250 widely used
- BLOSUM Matrices (BLOck SUbstitution Matrices)
- Empirically determined from alignment
- of conserved blocks
- Each includes information up to a certain level
- of identity
- BLOSUM62 widely used
- Position Specific Score Matrices (PSSMs)
- PSI and RPS BLAST
7WWW BLAST Interface
8The BLAST homepage
www.ncbi.nlm.nih.gov/blast
9Basic BLAST Databases
10BLAST Databases Non-redundant protein
Services blastp blastx
- nr (non-redundant protein sequences)
- GenBank CDS translations
- NP_, XP_ RefSeqs
- Outside Protein
- PIR, Swiss-Prot, PRF
- PDB (sequences from structures)
- pat protein patents
- env_nr environmental samples
11Nucleotide Databases Human and Mouse
Megablast, blastn service
- Human and mouse genomic and transcript now
default - Separate sections in output for mRNA and genomic
- Direct links to Map Viewer for genomic sequences
12Nucleotide Databases Traditional
Services blastn tblastn tblastx
13Nucleotide Databases Traditional
Databases are mostly non-overlapping
- htgs
- HTG division
- gss
- GSS division
- wgs
- whole genome shotgun
- env_nt
- environmental samples
- nr (nt)
- Traditional GenBank
- NM_ and XM_ RefSeqs
- refseq_rna
- refseq_genomic
- NC_ RefSeqs
- dbest
- EST Division
- est_human, mouse, others
14Basic BLAST Protein Searches
15Universal Form Protein
16BLAST and Molecular Evolution
3000 Myr
1000 Myr
540 Myr
Alzheimers Disease
Ataxia telangiectasia
Colon cancer
Pancreatic carcinoma
17Protein BLAST Page
18Limiting Database Organism
Organism autocomplete
19Limiting Database Entrez Query
allfilter NOT mammalsorganism gene_in_mitocho
ndrionProperties 20062007 Modification
Date Nucleotide biomol_mrnaProperties biomol_g
enomicProperties
20Run Search
21BLAST Formatting Page
Conserved Domain Results
22BLAST Output Graphical Overview
Sort by taxonomy
mouse over
23BLAST Output Descriptions
24TaxBLAST Taxonomy Reports
25BLAST Output Alignments
Identical match
positive score (conservative)
gap
Negative or zero
26Position Specific Iterative BLAST
27MLH1 and ETR1
gtgi4557757refNP_000240.1 MutL protein homolog
1 Homo sapiens MSFVAGVIRRLDETVVNRIAAGEVIQRPANAI
KEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKT
ADGKCAYRASYSDGKLKAPPK PCAGNQGTQITVEDLFYNIATRRKALK
NPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFL
LFINHRLVESTSLRKAIETVY AAYLPKNTHPFLYLSLEISPQNVDVNV
HPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQP
LSKPLSSQPQAIVTEDKTDIS SGRARQQDEEMLELPAPAEVAAKNQSL
EGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWA
LAQHQTKLYLLNTTKLSEELF YQILIYDFANFGVLRLSEPAPLFDLAM
LALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKE
CAMFYSIRKQYISEESTLSGQ QSEVPGSIPNSWKWTVEHIVYKALRSHI
LPPKHFTEDGNILQLANLPDLYKVFERC
Human Mismatch Repair Protein
gtgi22095656spO81122.1ETR1_MALDO Ethylene
receptor MLACNCIEPQWPADELLMKYQYISDFFIALAYFSIPLELIY
FVKKSAVFPYRWVLVQFGAFIVLCGATHL INLWTFSIHSRTVAMVMTTA
KVLTAVVSCATALMLVHIIPDLLSVKTRELFLKNKAAELDREMGLIRTQE
ETGRHVRMLTHEIRSTLDRHTILKTTLVELGRTLALEECALWMPTRTGL
ELQLSYTLRQQNPVGYTVPIH LPVINQVFSSNRAVKISANSPVAKLRQL
AGRHIPGEVVAVRVPLLHLSNFQINDWPELSTKRYALMVLML PSDSARQ
WHVHELELVEVVADQVAVALSHAAILEESMRARDLLMEQNIALDLARREA
ETAIRARNDFLAV MNHEMRTPMHAIIALSSLLQETELTAEQRLMVETIL
RSSNLLATLINDVLDLSRLEDGSLQLEIATFNLH SVFREVHNMIKPVAS
IKRLSVTLNIAADLPMYAIGDEKRLMQTILNVVGNAVKFSKEGSISITAF
VAKSE SLRDFRAPDFFPVQSDNHFYLRVQVKDSGSGINPQDIPKLFTKF
AQTQALATRNSGGSGLGLAICKRFVN LMEGHIWIESEGLGKGCTATFIV
KLGFPERSNESKLPFAPKLQANHVQTNFPGLKVLVMDDNGVSRSVTK GL
LAHLGCDVTAVSLIDELLHVISQEHKVVFMDVSMPGIDGYELAVRIHEKF
TKRHERPVLVALTGSIDK ITKENCMRVGVDGVILKPVSVDKMRSVLSEL
LEHRVLFEAM
Apple ethylene receptor
28PSI-BLAST Iteration 1
29PSI-BLASTIteration 4
Plant ethylene receptors, bacterial two-component
regulatory system kinases
30RPS-BLAST Conserved Domains
31Algorithm parameters Protein
Expand
Adjust to set stringency
Default statistics adjustment for compositional
bias
Off now by default. Conflicts with comp-based
stats
32Automatic Short Sequence Adjustment
e-value 20000 Word Size
2 Matrix PAM30 Comp Stats Off Low Comp
Filter Off
Nucleotide and Protein
33Basic BLAST Nucleotide
34Universal Form Nucleotide
35Nucleotide Results ALB mRNA
megablast
disco. megablast
blastn
36Nucleotide BLAST Human Genome
37Sortable Results
Separate Sections for Transcript and Genome
Direct links to Entrez Databases
38Total Score All Segments
39Alignments Sorting in Exon Order
40Links to Map Viewer
Chromosome 9
Chromosome 1
41Algorithm parameters Nucleotide
blastn
- Prevents starting alignment in masked region
- Allows extensions through masked regions
Masks LC sequence (simple repeats)
42BLAST Formatting Options
43Protein Formatting Page
as HTML Plain Text ASN.1 XML
Show Alignment PSSM PssmWithParameters Bioseq
Alignment View Pairwise Pairwise with dots for
identities Query-anchored with dots for
identities Query-anchored with letters for
identities Flat query-anchored with dots for
identities Flat-query anchored with letters for
identities Hit table
44Structured formats XML and ASN.1
ltIteration_hitsgt -ltHitgt ltHit_numgt1lt/Hit_numgt ltHit_
idgtgi730028spP40692MLH1_HUMANlt/Hit_idgt -ltHit_d
efgt DNA mismatch repair protein Mlh1 (MutL
protein homolog 1) lt/Hit_defgt ltHit_accessiongtP4069
2lt/Hit_accessiongt ltHit_lengt756lt/Hit_lengt -ltHit_hsp
sgt -ltHspgt ltHsp_numgt1lt/Hsp_numgt ltHsp_bit-scoregt1568
.9lt/Hsp_bit-scoregt ltHsp_scoregt4061lt/Hsp_scoregt ltHs
p_evaluegt0lt/Hsp_evaluegt ltHsp_query-fromgt1lt/Hsp_que
ry-fromgt ltHsp_query-togt756lt/Hsp_query-togt ltHsp_hit
-fromgt1lt/Hsp_hit-fromgt ltHsp_hit-togt756lt/Hsp_hit-to
gt ltHsp_query-framegt0lt/Hsp_query-framegt ltHsp_hit-fr
amegt0lt/Hsp_hit-framegt ltHsp_identitygt0lt/Hsp_identit
ygt ltHsp_positivegt0lt/Hsp_positivegt ltHsp_gapsgt0lt/Hsp
_gapsgt ltHsp_align-lengt756lt/Hsp_align-lengt
XML
Seq-annot desc user type
str "Hist Seqalign" , data
label str "Hist Seqalign"
, data bool TRUE ,
user type str "Blast Type" ,
data label id
0 , data int 0 ,
user type str "BLAST database
title" , data label
str "Non-redundant SwissProt
ASN.1
45The Hit Table
BLASTP 2.2.17 (Aug-26-2007) Query
gi4557757refNP_000240.1 MutL protein homolog
1 Homo sapiens Database swissprot Fields
query id, subject ids, identity, positives,
alignment length, mismatches, gap opens, q.
start, q. end, s. start, s. end, evalue, bit
score 80 hits found refNP_000240.1gi4557757
gi1709056spP38920MLH1_YEAST 36.68 56.91 796
426 18 8 756 5 769 7e-138 491 refNP_000240.1gi
4557757 gi48474996spQ9P7W6MLH1_SCHPO 37.24
54.04 768 371 16 8 756 9 684 8e-122
437 refNP_000240.1gi4557757
gi25090753spQ8RA70MUTL_THETN 37.44 54.62 390
231 7 8 394 4 383 5e-59 229 refNP_000240.1gi4
557757 gi25090732spQ8KAX3MUTL_CHLTE 35.95
54.05 370 229 5 8 375 4 367 5e-55
215 refNP_000240.1gi4557757
gi127552spP23367.2MUTL_ECOLI 35.99 58.11 339
202 7 8 334 3 338 8e-55 214 refNP_000240.1gi4
557757 gi29427778spQ8FAK9MUTL_ECOL6 35.99
58.11 339 202 7 8 334 3 338 1e-54
214 refNP_000240.1gi4557757
gi20455084spQ8XDN4MUTL_ECO57 35.99 58.11 339
202 7 8 334 3 338 1e-54 214 refNP_000240.1gi4
557757 gi59798328spQ72PF7MUTL_LEPIC 36.27
55.20 375 221 8 6 375 2 363 3e-54
213 refNP_000240.1gi4557757
gi13431695spP57886MUTL_PASMU 35.48 58.94 341
213 6 8 345 3 339 4e-54 212 refNP_000240.1gi4
557757 gi1171080spP44494MUTL_HAEIN 35.74
59.87 319 198 6 8 323 3 317 5e-54
212 refNP_000240.1gi4557757
gi20455102spQ8ZIW4MUTL_YERPE 36.01 58.63 336
207 6 8 339 3 334 6e-54 212 refNP_000240.1gi4
557757 gi20455152spQ9JYT2MUTL_NEIMB 33.96
55.35 374 224 8 8 376 4 359 2e-53
210 refNP_000240.1gi4557757
gi20139217spQ9KAC1MUTL_BACHD 35.39 55.90 356
214 6 8 362 4 344 2e-53 209 refNP_000240.1gi4
557757 gi31076794spQ87L05MUTL_VIBPA 35.33
58.38 334 210 5 8 338 3 333 3e-53
209 refNP_000240.1gi4557757
gi20455150spQ9JTS2MUTL_NEIMA 36.94 58.28 314
183 5 8 316 4 307 5e-53 209 refNP_000240.1gi4
557757 gi56749233spQ6GHD9MUTL_STAAR 38.28
58.46 337 193 7 6 335 2 330 1e-52
207 refNP_000240.1gi4557757
gi25090739spQ8NWX9MUTL_STAAW 38.28 58.46 337
193 7 6 335 2 330 1e-52 207 refNP_000240.1gi4
557757 gi71151979spQ5HGD5MUTL_STAAC 38.28
58.46 337 193 7 6 335 2 330 1e-52
207 refNP_000240.1gi4557757
gi54037875spP65492MUTL_STAAN 38.28 58.46 337
193 7 6 335 2 330 2e-52 207 refNP_000240.1gi4
557757 gi20043258spQ9KV13MUTL_VIBCH 35.74
58.56 333 204 6 8 335 3 330 2e-52
207 refNP_000240.1gi4557757
gi127553spP14161MUTL_SALTY 35.10 56.93 339
205 7 8 334 3 338 3e-52 206 refNP_000240.1gi4
557757 gi20455140spQ9CDL1MUTL_LACLA 36.31
56.55 336 196 5 6 334 2 326 4e-52
206 refNP_000240.1gi4557757
gi61214242spQ7MH01MUTL_VIBVY 34.63 58.51 335
213 5 8 339 3 334 4e-52 206 refNP_000240.1gi4
557757 gi20455099spQ8Z187MUTL_SALTI 35.10
56.93 339 205 7 8 334 3 338 4e-52
206 refNP_000240.1gi4557757
gi31076809spQ8DCV0MUTL_VIBVU 34.63 58.51 335
213 5 8 339 3 334 6e-52 205 refNP_000240.1gi4
557757 gi71648717spQ5E2C6MUTL_VIBF1 36.71
59.81 316 186 6 8 316 3 311 1e-51
204 refNP_000240.1gi4557757
gi37999611spQ88DD1MUTL_PSEPK 30.34 48.97 435
278 7 8 419 7 439 2e-51 203
Importable into spreadsheets
46PSSMs Restart PSI-BLAST
ASN.1 ScoreMat, Portable
ASCII encoded, Web only
47BLAST TreeView
Black bear mt genome vs. RefSeq Genomic
48Distance Tree Carnivore Mitochondrial Genome
raccoon
weasels
red panda
cats
mongooses
dogs
true seals
sea lions
fur seal
walrus
bears
49Managing Searches
- Recent Results
- Saved Strategies
50Recent Results
Login to My NCBI to save search strategies
Results available for 36 hours
51Saved Strategies
Re-run searches to keep up to date
52Genome and Specialized BLAST
53Genome BLAST pages
54Map Viewer Homepage
55Poplar Genome BLAST
56tblastn Genome BLAST Results
Protein-nucleotide alignments
Exons and genes mixed
57Genomic Context of BLAST Hits
58Hits in Map Viewer
59Specialized BLAST Pages
60Service Addresses
- General Help info_at_ncbi.nlm.nih.gov
- BLAST blast-help_at_ncbi.nlm.nih.gov
Telephone support 301- 496- 2475