Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets

Description:

Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets Simon J. Coles EPSRC National ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 17
Provided by: SimonC170
Category:

less

Transcript and Presenter's Notes

Title: Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing datasets


1
  • Enabling the reusability of scientific data
    Experiences with designing an open access
    infrastructure for sharing datasets

Simon J. Coles EPSRC National Crystallography
Service School of Chemistry University of
Southampton
2
Data the Publication Problem
2,000,000
25,000,000
450,000
3
A Different Approach to Data Publication?
Underlying data
Intellect Interpretation
4
Requirements
  • Capture of all digital data and information
    generated during the course of an experiment
  • Data validation
  • Adding value
  • Archival system for data with attached
    bibliographic and chemical metadata
  • Automatic report generation
  • Schema and protocols for publication and
    dissemination of a dataset

5
Open Access Crystal Structure Archive
ecrystals.chem.soton.ac.uk
6
Access to the Underlying Data
7
Publicising Content
8
Harvesting, Linking and Aggregating
9
Usability Quality Uniformity of data
  • Different laboratories, practices instruments
    present a heterogeneous body of data
  • Publish according to IUCr ratified schema
  • To support publication according to this schema a
    toolbox add-on to the archive has been developed
  • Toolbox requires 2 mandatory files only is
    capable of performing file format conversions and
    generate value added files

10
Usability Ease of Deposition Metadata Quality
  • Minimal number of manual metadata entries many
    can be hardwired into the system
  • Deposition guidelines initially prepared by
    students to provide impartial feedback
  • Full documentation and in-line help/examples
  • Restrained lists, e.g. Keywords
  • Data deposited automatically by toolbox
  • Automated generation of metadata for report and
    OAI interface

11
Usability Data Validation
  • Peer review removed from self deposit publication
  • Simple checks for consistency made by the toolbox
  • Checks for crystallographic integrity made
    through a web service (IUCr, CHECKCIF)
  • Introduction of data editor for the archive a
    deposition must be signed-off by a recognised
    professional before going live
  • Quality indicators automatically taken from
    dataset and presented in HTML jump-off page

12
Usability Identifiers
  • URL of deposited dataset provides an identifier
  • Persistent only if the Institutional support
    model is accepted / adopted
  • Signed-up to an agency to register metadata
    relating to datasets with a DOI
  • Pay registry to ensure that DOI always resolves
    to associated dataset (10cents to register 1cent
    per annum to maintain)
  • InChI chemical identifier - a unique text
    descriptor for a molecule

13
Usability Dissemination Aggregation
  • OAI metadata schema ratified by IUCr chemical
    community
  • OAI covers bibliographic terms must introduce
    chemical terms
  • Both library and subject specific aggregators
    satisfied
  • Chemical linking InChI, chemical classifications
    and restricted keywords list

14
Usability Endorsement
  • Feedback during development from technical
    publishing arm of IUCr
  • Designed for automatic incorporation into CSD
    (global database operated by CCDC)
  • Accepted by Executive Committee of IUCr
  • Reuse of data achieved in collaboration with
    Leverhulme Centre for Molecular Informatics

15
Usability Community Uptake
  • Southampton archive about to publish routinely
    via the archive
  • Five crystallography laboratories in UK agreed to
    adopt philosophy, install and populate archives
  • CCDC will harvest required data from all archives
  • IUCr will harvest and curate all data
  • Develop aggregator services in collaboration with
    IUCr

16
Usability The Next Challenges
  • Full acceptance by chemical community
  • Validation worries
  • Curation worries
  • The requirement for as many peer reviewed
    publications as possible (despite quality)
  • Full acceptance by wider chemistry publishing
    community
  • Loss of control over underlying data
  • Faith in Open Archives replacing experimental
    descriptions in articles
  • Development of fully functional aggregator
    services
Write a Comment
User Comments (0)
About PowerShow.com