Title: Curating science data
1Changing Roles, Responsibilities and
Relationships Dr Liz Lyon, Director,
UKOLN Associate Director, UK Digital Curation
Centre Opening the research data lifecycle, JISC
Conference 2007
UKOLN is supported by
This work is licensed under a Creative Commons
LicenceAttribution-ShareAlike 2.0
2Preliminary findings from a JISC study
- Terms of Reference for UKOLN
- To define how institutions (collectively and
individually) and scientific data centres can
together effectively achieve - Preservation
- Access Managed and open
- Re-use Data citation, data mining and
re-interpretation - October 2006 March 2007
- N.B. Work in progress!
3Some of the data stakeholders?
4Funders
- Interviews 4 Research Councils 1 charity
- Support for data curation is (still) patchy
- Mixed approaches proactive to passive
- Gaps in infrastructure support for data outputs
- Limited formal links between programme planning
and support infrastructure - Some Data management and sharing policies
- Some use of Data Management Plans
- Wellcome Trust Policy QA January 2007
5January 2007
Data Management and Sharing Plan required if
creating or developing a resource for the
research community as the primary goal or
involve the generation of a significant quantity
of data that could potentially be shared for
added benefit
6Funders 2
- Limited advocacy work
- Funding models for infrastructure support vary
- Funding models for research programmes vary
- Some productive partnerships e.g. MRC and
Wellcome Trust, CCLRC and Wellcome - Some examples of good practice
7Hierarchy of drivers (for data sharing)
Acknowledgement Mark Thorley, NERC
- Level 0 deliver project.
- Level 1 meet good scientific practice.
- Level 2 support own science.
- Level 3 employers requirements.
- Level 4 funders requirements.
- Level 5 public policy requirements.
NERC has 7 designated data centres Data
Management Co-ordinator DataGrid
NATURAL ENVIRONMENT RESEARCH COUNCIL
8MRC developing a data support plan
Acknowledgement Alan Sudlow
9Data centres Data services
- Interviews with 5 data services
- Deep levels of expertise and subject knowledge
- Exemplars of good practice standards, policies,
manuals, robust curation / preservation practice - Limited sharing of expertise between centres
- Some effective partnerships
- AHDS Stormont Papers with Queens Belfast
- BADC with CLADDIER Project
- Wide range of community awareness
- Use of licences but IPR issues performing arts,
- Technical issues complexity of data sets,
version control, identifiers, application profiles
10Data centres Data services 2
- Exemplar of good practice
- European Bio-informatics Institute
- Microarray data to inform gene expression
- Consensus on community standards MIAME
- Data pipelines at source via Laboratory
Information Management Systems LIMS - User tools MIAMExpress value-added services
- Annotation of data using the Gene Ontology
- Submission deposit is embedded in community
culture requirement for publication - Training programme, eLearning materials coming
- This level of data curation is expensive!!
11Reactome
EMBL-BankDNA sequences
EnsEMBL Genome Annotation
UniProt Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
IntActProtein Interactions
Source Graham Cameron, EBI
12 Specialist biomolecular data
resource examples
Medical data resources
Core biomolecular resources
Biodiversity data resources
SGD
Flybase
Chemical data resources
MGD
Eumorphia/ Phenotypes
Mutants
Mouse Atlas
Source Graham Cameron, EBI
13General Data Selection Criteria
- Usability
- Quality of data
- Usable data format
- Conditions of Use
- Reputable Author
- Documentation
- Usefulness
- Data quality
- Uniqueness of data
- Potential Strategic Use
- Usefulness of parameters
14Institutions Data Repositories
- Not much data. or duplication (yet?)
- Departmental audits of research data practice at
University of Southampton to inform developing
institutional data curation policy - Barriers to data sharing
- IPR and geospatial data
- Lack of awareness amongst researchers
- Cultural roots and resistance to change
- Exemplars of good practice eBank Project
15eCrystals Global Federation Model
Data discovery, linking, citation
Presentation services / portals
Data creation capture in Smart lab
Data discovery, linking, citation
Aggregator services
Search, harvest
Search, harvest
Publication
Deposit
Validation
Subject Repository
Data analysis
Institutional data repositories
Search, harvest
Laboratory repository
Deposit
Deposit
Deposit , Validation
Curation Preservation
Institution Library Information Services
Deposit
16Roles, Rights Responsibilities
- Scientist Creation and use of data.
- Data centre Curation of and access to data.
- User Use of 3rd party data.
- Funder Set / react to public policy drivers.
- Publisher Maintain integrity of the scientific
record.
NATURAL ENVIRONMENT RESEARCH COUNCIL
Acknowledgement Mark Thorley, NERC
17Closing thoughts
- Co-ordination and join up
- High level and strategic Funders
- Operational level and practical JISC data
services research council data centres - Funding
- Are current economic models for preservation
data sharing infrastructure a) appropriate? b)
adequate? c) sustainable? - Should inform prioritisation and investment
18Closing thoughts 2
- Good Practice requirements
- Data management and sharing Policies
- Data Management Plans (peer-reviewed)
- Institutional data curation policies planning
- Technical interoperability and integration
- Data are diverse and complex
- JISC IIE vision of discovery across repositories
- Contextual linking offers opportunity for data
centres and institutional repositories to realise
synergies and work more closely together
19Closing thoughts 3
- Advocacy
- Programmes to reach across sectors
- Harmonisation and consistent messages
- Tailored targeted to disciplines
- Researcher has some curatorial responsibility
- Training
- Lack of skills
- eLearning opportunity
- Data scientists? Recognition and career
development - Native data scientists are coming.
20- Dealing with the Data Deluge
- JISC Repositories Programme
- Supporting Institutions in the Digital Age
- Digital Repositories Conference
- 5-6 June 2007
- University of Manchester
- Research Data Strand