Searching GenBank - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Searching GenBank

Description:

GenBank records with the text word 'human' present in any field will appear in the result set ... LocusLink ( human, mouse,rat, cow,nematode, zebra fish,fruit fly) ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 73
Provided by: tech1
Category:

less

Transcript and Presenter's Notes

Title: Searching GenBank


1
Searching GenBank
  • Ansuman Chattopadhyay, PhD
  • Ansuman_at_pitt.edu
  • 412-648-1297

2
What you will learn..
  • Entrez
  • GenBank
  • Entrez Links

3
Resource Centers
  • The National Center for
  • Biotechnology Information (NCBI)
  • European Bioinformatics
  • Institute (EBI)

4
NCBI
  • Created as a part of the National Library of
    Medicine in 1988
  • Tools BLAST(1990), Entrez (1992)
  • GenBank (1992)
  • Free MEDLINE (PubMed, 1997)
  • Other databases dbEST, dbGSS, dbSTS, MMDB, OMIM,
    UniGene, GeneMap, Taxonomy, CGAP, SAGE,LocusLink,
    RefSeq

5
(No Transcript)
6
EBI
The EBI serves researchers in molecular biology,
genetics, medicine and agriculture from
academia, and the agricultural, biotechnology,
chemical and pharmaceutical industries. The EBI
does this by building, maintaining and making
available databases and information services
relevant to molecular biology, as well as
carrying out research in bioinformatics and
computational molecular biology.
7
Molecular Databases
  • Primary Databases
  • Original submissions by experimentalists
  • Database staff organize but dont add additional
  • information
  • Example GenBank
  • Derivative Databases
  • Human curated
  • compilation and correction of data
  • Example SWISS-PROT, NCBI RefSeq mRNA
  • Computationally Derived
  • Example UniGene

8
Find nucleotide sequence for human Epidermal
growth factor receptor
9
GenBank
  • Nucleotide only sequence database
  • Archival in nature
  • Data shared nightly among three collaborating
    databases
  • GenBank at NCBI
  • DNA Database of Japan (DDBJ)
  • EMBL at EBI

10
The International Sequence Database Collaboration
Source NCBI
11
GeneBank Release 131.0December 15 2003
  • 30968418 Sequences
  • 36553368485 Bases
  • full release every two months
  • incremental and cumulative updates daily
  • available only through internet

ftp//ftp.ncbi.nih.gov/genbank/
12
Growth of GenBank
Source NCBI
13
GenBank Traditional Division
BCT Bacterial and Archeal INV
Invertebrate MAM Mammalian (ex. ROD and
Primate) PHG Phage PLN Plant
and Fungal PRI Primate ROD Rodent SYN
Synthetic (cloning vectors) VRL Viral VRT
Other Vertebrate
14
GenBank Special Sequence Division
PAT Patent EST Expressed Sequence Tags
STS Sequence Tagged Site GSS Genome Survey
Sequence HTG High Throughput Genome
CON Contig
15
GenBank Record
  • Header
  • information that apply to
  • the whole record
  • Features
  • annotations on the record
  • Sequence

16
GenBank Record
GeneBank Record
Header
modification date
Molecule Type
Locus Name
Sequence Length
Modification Date
Accession Number
Version Number
GenBank Division
17
GeneBank Record
FEATURE
Link to Seq
18
GenBank Record
Sequence
19
Entrez
  • An integrated database search and
  • retrieval system

20
Entrez
21
Entrez
http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Select GenBank
22
Find mRNA sequence for human epidermal growth
factor receptor
23
Too many results
24
Too many results even after specifying human
epidermal growth factor receptor
Why non-human records ?
25
Click details to see the pseudo-codes for
searching
GenBank records with the text word human
present in any field will appear in the result set
26
Specify human as an organism
Click Preview/Index
Specify human by selecting Organisms from
All Fields drop-down menu
27
2
1
28
Still too many records but all non-human records
are gone
29
Limit your search
Exclude all technology generated records
Select mRNA in the Molecule list
Select Refseq in the database list
30
RefSeq
  • Database of reference sequences
  • Curated
  • Non-redundant one record for each gene, or each
    splice variant, from each organism represented
  • Each record is intended to present an
    encapsulation of the current understanding of a
    gene or protein, similar to a review article

RefSeq FAQ
31
RefSeq
32
Molecular databases
33
RefSeq Status Codes
  • Provisional
  • Reviewed
  • Predicted
  • Genome Annotation

34
All results are for Refseq mRNA sequences with
NM_XXXXXX accession number But still too many
results
35
Records with the text epidermal growth factor
receptor present in any field will appear in the
result set
36
Records with the text epidermal growth factor
receptor present in any field will appear in the
result set and we will get epidermal growth
factor receptor activator binding protein
inhibitor Etc.
37
Search GenBank using Gene Name
Find Gene Name from Gene Directories
  • Gene directories
  • LocusLink ( human, mouse,rat, cow,nematode, zebra
    fish,fruit fly)
  • http//www.ncbi.nlm.nih.gov/LocusLink/
  • SGD (Yeast)
  • http//www.yeastgenome.org/

38
Find Gene Name by searching LocusLink
http//www.ncbi.nlm.nih.gov/LocusLink/
Select organism
39
LocusLink
40
Find mRNA sequence for epidermal growth factor
receptor (EGFR)
Starts with gene name EGFR
  • Limit search to
  • Gene Name
  • exclude all technology generated records
  • Select mRNA as Molecule
  • Select Refseq as source database

41
Specify human as an organism
42
Find mRNA sequence for epidermal growth factor
receptor (EGFR)
43
  • Searching GenBank
  • Steps
  • Find Gene Name for your gene of interest
  • Start with Gene Name
  • and using Limits option
  • limit your search to
  • --- Gene Name ( ALL Fields)
  • --- mRNA ( Molecule)
  • --- RefSeq ( Only From)
  • Use Preview/Index to specify organism

44
  • Find mRNA sequence for
  • Human p53
  • Mouse BRCA 1
  • Yeast SNF1

45
Find mRNA sequence for human p53
Without using Limits option
With Limits
46
Find mRNA sequence for mouse BRCA 1
Using Limits
Without Limits
47
Find mRNA sequence for yeast SNF1
48
Find mRNA sequence for yeast SNF1
RefSeq does not have any mRNA record for yeast.
Modify your searching strategy for yeast
Limit your search to 1. Gene Name and 2.
RefSeq Do not specifying any Molecule type
49
Find mRNA sequence for yeast SNF1
50
Find mRNA sequence for yeast SNF1
51
Find mRNA sequence for yeast SNF1
Use Ctrl F command to find SNF1 in the record
52
Find mRNA sequence for yeast SNF1
53
Find mRNA sequence for yeast SNF1
54
RefSeq Record
55
RefSeq Record
Header
56
RefSeq Record
57
RefSeq Record
Features
58
RefSeq Record
Sequence
GenBank format
59
Sequence Format
60
Sequence Format
61
Entrez Neighbors and Hard Links
Word weight
3-D Structure
3 -D Structure
VAST
Phylogeny
Protein sequences
BLAST
BLAST
Source NCBI
62
Use Links to retrieve genetic information
63
Entrez Gene
64
Entrez Gene
65
Entrez Gene
66
(No Transcript)
67
(No Transcript)
68
BLINK Pre-computed BLAST link
Homologous sequences from different organisms
69
BLINK Pre-computed BLAST link
Results from structure database searches
70
P53 Core Domain In Complex With Dna
71
P53 Core Domain In Complex With Dna
72
NCBI site map A good place to find resources
http//www.ncbi.nlm.nih.gov/Sitemap/index.html
Write a Comment
User Comments (0)
About PowerShow.com