Title: Searching GenBank
1Searching GenBank
- Ansuman Chattopadhyay, PhD
- Ansuman_at_pitt.edu
- 412-648-1297
2What you will learn..
- Entrez
- GenBank
- Entrez Links
3Resource Centers
- The National Center for
- Biotechnology Information (NCBI)
- European Bioinformatics
- Institute (EBI)
4NCBI
- Created as a part of the National Library of
Medicine in 1988 - Tools BLAST(1990), Entrez (1992)
- GenBank (1992)
- Free MEDLINE (PubMed, 1997)
- Other databases dbEST, dbGSS, dbSTS, MMDB, OMIM,
UniGene, GeneMap, Taxonomy, CGAP, SAGE,LocusLink,
RefSeq
5(No Transcript)
6EBI
The EBI serves researchers in molecular biology,
genetics, medicine and agriculture from
academia, and the agricultural, biotechnology,
chemical and pharmaceutical industries. The EBI
does this by building, maintaining and making
available databases and information services
relevant to molecular biology, as well as
carrying out research in bioinformatics and
computational molecular biology.
7Molecular Databases
- Primary Databases
- Original submissions by experimentalists
- Database staff organize but dont add additional
- information
- Example GenBank
- Derivative Databases
- Human curated
- compilation and correction of data
- Example SWISS-PROT, NCBI RefSeq mRNA
- Computationally Derived
- Example UniGene
8Find nucleotide sequence for human Epidermal
growth factor receptor
9GenBank
- Nucleotide only sequence database
- Archival in nature
- Data shared nightly among three collaborating
databases - GenBank at NCBI
- DNA Database of Japan (DDBJ)
- EMBL at EBI
10The International Sequence Database Collaboration
Source NCBI
11GeneBank Release 131.0December 15 2003
- 30968418 Sequences
- 36553368485 Bases
- full release every two months
- incremental and cumulative updates daily
- available only through internet
ftp//ftp.ncbi.nih.gov/genbank/
12Growth of GenBank
Source NCBI
13GenBank Traditional Division
BCT Bacterial and Archeal INV
Invertebrate MAM Mammalian (ex. ROD and
Primate) PHG Phage PLN Plant
and Fungal PRI Primate ROD Rodent SYN
Synthetic (cloning vectors) VRL Viral VRT
Other Vertebrate
14GenBank Special Sequence Division
PAT Patent EST Expressed Sequence Tags
STS Sequence Tagged Site GSS Genome Survey
Sequence HTG High Throughput Genome
CON Contig
15GenBank Record
- Header
- information that apply to
- the whole record
- Features
- annotations on the record
- Sequence
16GenBank Record
GeneBank Record
Header
modification date
Molecule Type
Locus Name
Sequence Length
Modification Date
Accession Number
Version Number
GenBank Division
17GeneBank Record
FEATURE
Link to Seq
18GenBank Record
Sequence
19Entrez
- An integrated database search and
- retrieval system
20Entrez
21Entrez
http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Select GenBank
22Find mRNA sequence for human epidermal growth
factor receptor
23Too many results
24Too many results even after specifying human
epidermal growth factor receptor
Why non-human records ?
25Click details to see the pseudo-codes for
searching
GenBank records with the text word human
present in any field will appear in the result set
26Specify human as an organism
Click Preview/Index
Specify human by selecting Organisms from
All Fields drop-down menu
272
1
28Still too many records but all non-human records
are gone
29Limit your search
Exclude all technology generated records
Select mRNA in the Molecule list
Select Refseq in the database list
30RefSeq
- Database of reference sequences
- Curated
- Non-redundant one record for each gene, or each
splice variant, from each organism represented - Each record is intended to present an
encapsulation of the current understanding of a
gene or protein, similar to a review article -
RefSeq FAQ
31RefSeq
32Molecular databases
33RefSeq Status Codes
- Provisional
- Reviewed
- Predicted
- Genome Annotation
34All results are for Refseq mRNA sequences with
NM_XXXXXX accession number But still too many
results
35Records with the text epidermal growth factor
receptor present in any field will appear in the
result set
36Records with the text epidermal growth factor
receptor present in any field will appear in the
result set and we will get epidermal growth
factor receptor activator binding protein
inhibitor Etc.
37Search GenBank using Gene Name
Find Gene Name from Gene Directories
- Gene directories
- LocusLink ( human, mouse,rat, cow,nematode, zebra
fish,fruit fly) - http//www.ncbi.nlm.nih.gov/LocusLink/
- SGD (Yeast)
- http//www.yeastgenome.org/
38Find Gene Name by searching LocusLink
http//www.ncbi.nlm.nih.gov/LocusLink/
Select organism
39LocusLink
40Find mRNA sequence for epidermal growth factor
receptor (EGFR)
Starts with gene name EGFR
- Limit search to
- Gene Name
- exclude all technology generated records
- Select mRNA as Molecule
- Select Refseq as source database
41Specify human as an organism
42Find mRNA sequence for epidermal growth factor
receptor (EGFR)
43- Searching GenBank
- Steps
- Find Gene Name for your gene of interest
- Start with Gene Name
- and using Limits option
- limit your search to
- --- Gene Name ( ALL Fields)
- --- mRNA ( Molecule)
- --- RefSeq ( Only From)
- Use Preview/Index to specify organism
44- Find mRNA sequence for
- Human p53
- Mouse BRCA 1
- Yeast SNF1
45Find mRNA sequence for human p53
Without using Limits option
With Limits
46Find mRNA sequence for mouse BRCA 1
Using Limits
Without Limits
47Find mRNA sequence for yeast SNF1
48Find mRNA sequence for yeast SNF1
RefSeq does not have any mRNA record for yeast.
Modify your searching strategy for yeast
Limit your search to 1. Gene Name and 2.
RefSeq Do not specifying any Molecule type
49Find mRNA sequence for yeast SNF1
50Find mRNA sequence for yeast SNF1
51Find mRNA sequence for yeast SNF1
Use Ctrl F command to find SNF1 in the record
52Find mRNA sequence for yeast SNF1
53Find mRNA sequence for yeast SNF1
54RefSeq Record
55RefSeq Record
Header
56RefSeq Record
57RefSeq Record
Features
58RefSeq Record
Sequence
GenBank format
59Sequence Format
60Sequence Format
61Entrez Neighbors and Hard Links
Word weight
3-D Structure
3 -D Structure
VAST
Phylogeny
Protein sequences
BLAST
BLAST
Source NCBI
62Use Links to retrieve genetic information
63Entrez Gene
64Entrez Gene
65Entrez Gene
66(No Transcript)
67(No Transcript)
68BLINK Pre-computed BLAST link
Homologous sequences from different organisms
69BLINK Pre-computed BLAST link
Results from structure database searches
70P53 Core Domain In Complex With Dna
71P53 Core Domain In Complex With Dna
72NCBI site map A good place to find resources
http//www.ncbi.nlm.nih.gov/Sitemap/index.html