Title: Scientific Data as Research Infrastructure: The Biomedical Sciences
1Scientific Data as Research InfrastructureThe
Biomedical Sciences
Image created by Rachel Jones
- Alexa T. McCray
- Center for Biomedical Informatics
- Harvard Medical School
- Strategies for Economic Sustainability of
Publicly Funded Data Repositories - Board on Research Data and Information, March 12,
2014
2Data Sharing Policies in the Life Sciences
There are many reasons to share data from
NIH-supported studies Data should be made as
widely and freely available as possible while
safeguarding the privacy of participants, and
protecting confidential and proprietary data.
NIH Data Sharing Policy and Implementation
Guidance The ICMJE member journals will require,
as a condition of consideration for publication,
registration in a public trials registry.
Clinical Trial Registration A
Statement from the International Committee of
Medical Journal Editors A condition of
publication in a Nature journal is that authors
are required to make materials, data and
associated protocols promptly available to others
without undue qualifications. The preferred way
to share large data sets is via public
repositories Instructions to authors, Nature
Journals We continue to request that the authors
provide the data underlying the findings
described in their manuscript authors need to
indicate where the data are housed, at the time
of submission. PLOS Data
Policy
3Where Biomedical Data are Housed
- National Institutes of Health
- NLM/NCBI
- Dozens of databases
- Institute-specific databases
- e.g., National Database of Autism Research
- Nucleic Acids Research 2014 database issue
- 58 new molecular biology databases
- Updates to 123 databases
- Community-driven domain-specific repositories
- BioDB catalogue lists 622 databases
- e.g., AgingGenesDB, MousePhenome Database,
FlyBase
4Data Stewardship in Transition
- Plan for an NIH Data Discovery Index (DDI)
- Index of publicly available biomedical datasets
- Motivation
- Catalyze scientific progress
- Reduce duplication of experimental data
collection - Reward the data provider
- Long-term sustainability of the underlying data
sources not clear - Long-term value of the data
- Long-term costs dependent on
- Selection, curation, and access