Title: Introduction to Entrez Genome Projects
1Introduction to Entrez Genome Projects
2Data scope of genome resources at NCBI
Environmental samples?
Organisms
Nematoda
C.elegans, C.briggsae
Microbes
Viruses
Fungi/small eukaryotes
Plants
A.thaliana Barley Corn Oat Rice Soybean Tomato Ric
e Wheat
Fishes
Insects
D.melanogaster, A.gambia, D.pseudoobscura, Honey
bee,
Chicken
Dog
Mouse/Rat
pig, cow
Human
chimpanzee
3Data scope of genome resources at NCBI
Sequences
- Nucleotide
- EST, cDNA, mRNA, STS
- patents
- GSS
- Traces
- Genomic complete genome,
- whole genome shotgun assembly
- (different assembly methods)
- BAC clone based sequencing
- Resequencing
- Annotation
4Entrez Genome Project
- NOT Entrez Genome?
- Entrez Genomes is a collection of COMPLETE
chromosomes, - plasmids, organelles, and viruses.
- Created in 1995.
- Doesnt have a way of linking all the data for a
given organism - Other than by taxid.
- Problems
- How to define COMPLETE genome
- Same organism sequenced by different groups
- Agrobacterium tumefaciens str. C58 (Cereon and
U.Washington) - Corynebacterium glutamicum ATCC 13032 (Japan and
Germany) - Bacillus licheniformis DSM 13 (USA and Germany )
- Genome project is more than chromosomes and
proteins
- Not Entrez Taxonomy?
- Designed as taxonomic hierarchy, not organized by
genomes - Collects all Entrez links associated with the
organism - Problems
- Same organism sequenced by different groups
- Sequence links are lumped together, for example,
Oryza sativa
5Cultivar Chinsurah Boro II
6Entrez Genome Project
complete and incomplete large-scale sequencing,
assembly, annotation, and mapping projects for
cellular organisms
- Project is defined by
- Organism
- Project type ( and/or sequencing method)
- Sequencing center
7Schematic diagram of a generic eukaryotic genome
project
Nucleotide data at NCBI (GenBank)
6 Large-scale cDNA sequencing (incomplete) Center
B
1 Genomic sequencing (WGS) and assembly
and annotation (complete) Center B
Genomic data at NCBI (RefSeq)
Organism-specific overview
Links to third-party sites
2 Genomic sequencing (WGS) (complete) Center A
Nucleotide data at NCBI (GenBank)
4 BAC-ends sequencing (incomplete) Center F
project
overview
external data
NCBI data
8Entrez Genome Project
Is it implemented
Hierarchical structure Flexible project
types Related projects Entrez links Relational
database Manually curated organism
descriptions Related resources/links Sequencing
centers Submission form
9(No Transcript)
10(No Transcript)
11(No Transcript)
12Entrez Genome Project
Is it presented
Genome Project gt Overview gt Project Brief
description (Docsum defline) Project
data Lineage Image Chromosome info Map Viewer
search Related Projects Publications Organism
description Resource links NCBI Resources
(Tools) Organism data in GenBank
Sequencing Centers Sequencing Projects
Related Resources
Organism groups Eukaryotes Animals Plants
Fungi Protists Prokaryotes Archaea
Bacteria Entrez search Reports Statistics Sequen
cing Centers Eukaryotic projects Prokaryotic
projects Sequence links
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Eukaryotic Projects List
17Organism name
Short summary
Taxonomic groups
Sequencing status
Estimated size
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Prokaryotic Genomic Data
Amount of Data
genomes (nucleotides, proteins, RNAs)
expression analysis (microarrays, etc.)
microbial community sequencing (Sargasso Sea,
etc.)
Organization of Data
currently by type of data
taxonomically
23 Growth of complete microbial genomes in the last
ten years.
September 1, 2005 254 complete genomes
Deluge of Data
24(No Transcript)
25Anatomy of a Prokaryotic Project
26Anatomy of a Prokaryotic Project
External data and sites
Genome Information
Organism and strain description
Prokaryotic genome attributes
27Prokaryotic Projects List
28Microbial Projects List
29Microbial Projects List
Complete Genomes Organism - Kingdom Genome
GC Accessions Release Center NCBI
Size Content
Date Links
30Microbial Projects List
Genomes in Progress Organism - Kingdom -
Contigs - Genome GC Accessions BLAST
Center
Size
31Microbial Projects List
Organism Info
Organism - Kingdom Genome GC Gram Shape
Arrangement Spores Motility Salinity
Oxygen Habitat Temp. Host - Disease
32Microbial Projects List
33Organism/Genome Attributes
34Project types
35Environmental samples
36Comparative genomics
37Future Directions
- linking other data (microarrays)
- comparative genomics projects (ex. Bacillus)
- environmental microbial community sequencing
projects
- links to granting agencies
- International Nucleotide Sequence Databases
meta-genomic data provided by scientific
communities
38Submission of Projects
create project from existing data
create project from announced sequencing projects
direct submission from outside users
39Submission of Projects
http//www.ncbi.nlm.nih.gov/genomes/mpfsubmission.
cgi
40Entrez Genome Project
- Curators
- Prokaryotes
Eukaryotes - William Klimke
Ethan Carver - Stacy Ciufo
Melissa Landrum - Leigh Riley
Anjana Raina - Gert Roosen
Barbara Ruef - Rich McVeigh
Patti Sherman - Nikolai Daraselia
Janet Weber - Emir Khatipov
Lynn Schriml - Software developers Graphics
- Andrei Kochergin
Svetlana Iazvovskaia - Sergei Resenchuk
Usability -
Mark Johnson - Project coordinators
- Tatiana Tatusova Kim
Pruitt
41Entrez Genome Project
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD
searchDBgenomeprj
1391 projects indexed and searchable in
Entrez 1706 in works 1040 organism-specific
overview projects with manual
descriptions
Genome sequencing
projects Organism Complete In
progress Total Prokaryotes 254
421
675 Eukaryotes 19
185 204 Total
273 606 879
Comments, suggestions are welcome Mail to
genomeprj_at_ncbi.nlm.nih.gov
genomes_at_ncbi.nlm.nih.gov