Title: National Center for Biotechnology Information
1National Center for Biotechnology Information
- Created by Public Law 100-607 in 1988 as part of
National Library of Medicine at NIH to - Create automated systems for knowledge about
molecular biology, biochemistry, and genetics. - Perform research into advanced methods of
analyzing and interpreting molecular biology
data. - Enable biotechnology researchers and medical care
personnel to use the systems and methods
developed. - Builders and providers of GenBank, Entrez, Blast,
PubMed. Online systems host about 1.8 million
users per day at peak rates of 3,200 web hits a
second. - Center for basic research and training in
computational biology.
2NCBI is the most heavily site in biomedicine. Why?
3Data, the Next Intel Inside
4Comparative Analysis of Genes Enables Innovation
in Assembly
Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKS
TYIRQTGVIVLMAQIGCFVPC 697 Yeast 657
RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISL
MAQIGCFVPC 716 E.coli 584 RHPVVEQVLNEPFIANPLNLSPQR
R-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642
Colon cancer gene sequence
5Ignoring the Central Dogma in Bioinformatics is
Evidence of Stupid Design
6It Guides Innovative Assembly of Separate
Resources
GenBank RefSeq Human Genome Bacterial
Genome Virus Genome MMDB PubMed UniGene(s) LocusLi
nk OMIM Taxonomy GEO PopSet BLAST Entrez ePCR Sequ
in
7Entrez Pathway to Discovery
Term frequency statistics
MEDLINE abstracts
Literature citations in sequence databases
Literature citations in sequence databases
Protein sequences
Nucleotide sequences
Amino acid sequence similarity
Nucleotide sequence similarity
Coding region features
8Entrez Increases Discovery Space
9Entrez is Intrinsically Components
- NCBI C Toolkit enforces common modules in
internal pipelines, external applications, and
web components. - Entrez has common model for Booleans and
Summaries. Unique models for deep data. - New projects can be easily added or extended.
- Long standing use of the productotype keeps
NCBI agile, but (fairly) robust.
10Web Services Provide Access to Entrez
- Eutils supports about 5 million service requests
a day - SOAP versions support about 38,000 service
requests a day (0.8) similar to Amazon
experience with REST and SOAP - Eutils allows outside sites to recreate Entrez
and NCBI does not know who or why - Current NCBI Sequence Viewer uses Eutils itself
11Harnessing Collective Intelligence in BioMedicine
12Bibliographic Resources
- PubMed Citations and Abstracts from publishers
MEDLINE indexing - PMC PubMed Central, full text journal articles
from publishers (and NIHMS). - pPMC portable mirror of PMC content
- NIHMS NIH Manuscript Submission System for
Public Access policy - NLM DTD Modular DTD for bibliographic material
- pNIHMS portable NIHMS
- XML Authoring System MS Word/XML authoring
- Bookshelf Books and monographs in XML from
publishers and authors.
13PubMed Central XML
- Why XML?
- Preserves structure of an article
- Lends itself to intelligent processing
- Human readable not dependent on technology
- Is based on SGML, a publishing industry standard
- Portable and migratable
14PMC2
- Content is converted to a standard XML format on
ingest and then stored and rendered from the one
format. - But, What format?
15Harvard E-journal Archiving Project
- The Mellon Foundation funded the Harvard Library
to study the feasibility of using one DTD for
archiving journal articles. - Harvard commissioned Inera, Inc. for the
E-Journal Archive DTD Feasibility Study. - Conclusion yes, it is feasible, but the right
DTD does not exist. - Recommendations from the study were used in
modified PMC DTD. NCBI collaborated with Harvard
to broaden the scope of the new PMC DTD to
accommodate journals from all disciplines (not
just life sciences).
16NLM Journal Article DTDsEstablishing Standards
from Practice
- Archiving and Interchange DTD
- Purpose is to preserve journals intellectual
content - Written for
- ease of conversion (from other DTDs)
- completeness (union of current journal DTDs)
- Journal Publishing DTD
- A subset of the Archiving DTD
- Written for
- authoring article content
- initial tagging of non-XML content
- creating consistent structures
17Adoption
- Highwire Press
- JStors Electronic Archiving Initiative
- Australias Commonwealth Scientific and
Industrial Research Organization - PLoS and other PMC contributors
- Atypon Systems (over 150 titles) and other
conversion vendors and journal service providers - Wiley, Nature, Blackwell common format (PXI)
18Support
- Complete documentation for both DTDs available
online. - Established public discussion lists for user
questions - Generic transformations to HTML and PDF forms of
articles - Public XML validation tool
- Working group of leaders in printing and markup
industries provides advice on changes to Tagset
19Portable PubMed Central (pPMC)
- Provides a local mirror of PMC content
- Updated daily from NCBI
- Multiple site archiving
- Provides rendering of PMC XML into HTML
- Provides searching through NCBI EUtils
- Provides for controlled local content in
presentation - Provides first step toward collaborative
archiving - Collaboration with Microsoft on support
20Whats on the Bookshelf?
21Diabetes
Obesity
- Health information with links to molecular data
- NIDDK advisors on content
- 10,000 users per month
- a truly valuable resource Gene Barrett,
President, American Diabetes Association
22Books
- Authoring in MS Word
- Simple mark-up based on Word styles
- WordML to XML conversion
23(No Transcript)
24BioMedicine Moves to the Web
- Electronic Authoring and Distribution of Articles
- Linking and annotating factual data as a side
effect - Ability to mine data and text together
- Richer data between supported databases
- High Throughput Biology generates large datasets
stored in public repositories - Common factual data roadmap
- Greater transparency
- Greater incidental collaboration for discovery
- New private sites for discussion on this
armature - New products arise from a public infrastructure
25Influenza Anti-viral Compounds
26Influenza Anti-viral Compounds
27Influzena Anti-viral/Protein Binding
28Influenza Neuraminidase Gene
29Influenenza Genome Project
30Influenza Assembly Archive