Curating science data - PowerPoint PPT Presentation

About This Presentation
Title:

Curating science data

Description:

Mouse Atlas. Source: Graham Cameron, EBI. General Data Selection ... Search, harvest. Presentation services / portals. Data discovery, linking, citation ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 21
Provided by: lizl7
Category:
Tags: curating | data | science

less

Transcript and Presenter's Notes

Title: Curating science data


1
Changing Roles, Responsibilities and
Relationships Dr Liz Lyon, Director,
UKOLN Associate Director, UK Digital Curation
Centre Opening the research data lifecycle, JISC
Conference 2007
UKOLN is supported by
This work is licensed under a Creative Commons
LicenceAttribution-ShareAlike 2.0
2
Preliminary findings from a JISC study
  • Terms of Reference for UKOLN
  • To define how institutions (collectively and
    individually) and scientific data centres can
    together effectively achieve
  • Preservation
  • Access Managed and open
  • Re-use Data citation, data mining and
    re-interpretation
  • October 2006 March 2007
  • N.B. Work in progress!

3
Some of the data stakeholders?
4
Funders
  • Interviews 4 Research Councils 1 charity
  • Support for data curation is (still) patchy
  • Mixed approaches proactive to passive
  • Gaps in infrastructure support for data outputs
  • Limited formal links between programme planning
    and support infrastructure
  • Some Data management and sharing policies
  • Some use of Data Management Plans
  • Wellcome Trust Policy QA January 2007

5
January 2007
Data Management and Sharing Plan required if
creating or developing a resource for the
research community as the primary goal or
involve the generation of a significant quantity
of data that could potentially be shared for
added benefit
6
Funders 2
  • Limited advocacy work
  • Funding models for infrastructure support vary
  • Funding models for research programmes vary
  • Some productive partnerships e.g. MRC and
    Wellcome Trust, CCLRC and Wellcome
  • Some examples of good practice

7
Hierarchy of drivers (for data sharing)
Acknowledgement Mark Thorley, NERC
  • Level 0 deliver project.
  • Level 1 meet good scientific practice.
  • Level 2 support own science.
  • Level 3 employers requirements.
  • Level 4 funders requirements.
  • Level 5 public policy requirements.

NERC has 7 designated data centres Data
Management Co-ordinator DataGrid
NATURAL ENVIRONMENT RESEARCH COUNCIL
8
MRC developing a data support plan
Acknowledgement Alan Sudlow
9
Data centres Data services
  • Interviews with 5 data services
  • Deep levels of expertise and subject knowledge
  • Exemplars of good practice standards, policies,
    manuals, robust curation / preservation practice
  • Limited sharing of expertise between centres
  • Some effective partnerships
  • AHDS Stormont Papers with Queens Belfast
  • BADC with CLADDIER Project
  • Wide range of community awareness
  • Use of licences but IPR issues performing arts,
  • Technical issues complexity of data sets,
    version control, identifiers, application profiles

10
Data centres Data services 2
  • Exemplar of good practice
  • European Bio-informatics Institute
  • Microarray data to inform gene expression
  • Consensus on community standards MIAME
  • Data pipelines at source via Laboratory
    Information Management Systems LIMS
  • User tools MIAMExpress value-added services
  • Annotation of data using the Gene Ontology
  • Submission deposit is embedded in community
    culture requirement for publication
  • Training programme, eLearning materials coming
  • This level of data curation is expensive!!

11
Reactome
EMBL-BankDNA sequences
EnsEMBL Genome Annotation
UniProt Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
IntActProtein Interactions
Source Graham Cameron, EBI
12
Specialist biomolecular data
resource examples
Medical data resources
Core biomolecular resources
Biodiversity data resources
SGD
Flybase
Chemical data resources
MGD
Eumorphia/ Phenotypes
Mutants
Mouse Atlas
Source Graham Cameron, EBI
13
General Data Selection Criteria
  • Usability
  • Quality of data
  • Usable data format
  • Conditions of Use
  • Reputable Author
  • Documentation
  • Usefulness
  • Data quality
  • Uniqueness of data
  • Potential Strategic Use
  • Usefulness of parameters

14
Institutions Data Repositories
  • Not much data. or duplication (yet?)
  • Departmental audits of research data practice at
    University of Southampton to inform developing
    institutional data curation policy
  • Barriers to data sharing
  • IPR and geospatial data
  • Lack of awareness amongst researchers
  • Cultural roots and resistance to change
  • Exemplars of good practice eBank Project

15
eCrystals Global Federation Model
Data discovery, linking, citation
Presentation services / portals
Data creation capture in Smart lab
Data discovery, linking, citation
Aggregator services
Search, harvest
Search, harvest
Publication
Deposit
Validation
Subject Repository
Data analysis
Institutional data repositories
Search, harvest
Laboratory repository
Deposit
Deposit
Deposit , Validation
Curation Preservation
Institution Library Information Services
Deposit
16
Roles, Rights Responsibilities
  • Scientist Creation and use of data.
  • Data centre Curation of and access to data.
  • User Use of 3rd party data.
  • Funder Set / react to public policy drivers.
  • Publisher Maintain integrity of the scientific
    record.

NATURAL ENVIRONMENT RESEARCH COUNCIL
Acknowledgement Mark Thorley, NERC
17
Closing thoughts
  • Co-ordination and join up
  • High level and strategic Funders
  • Operational level and practical JISC data
    services research council data centres
  • Funding
  • Are current economic models for preservation
    data sharing infrastructure a) appropriate? b)
    adequate? c) sustainable?
  • Should inform prioritisation and investment

18
Closing thoughts 2
  • Good Practice requirements
  • Data management and sharing Policies
  • Data Management Plans (peer-reviewed)
  • Institutional data curation policies planning
  • Technical interoperability and integration
  • Data are diverse and complex
  • JISC IIE vision of discovery across repositories
  • Contextual linking offers opportunity for data
    centres and institutional repositories to realise
    synergies and work more closely together

19
Closing thoughts 3
  • Advocacy
  • Programmes to reach across sectors
  • Harmonisation and consistent messages
  • Tailored targeted to disciplines
  • Researcher has some curatorial responsibility
  • Training
  • Lack of skills
  • eLearning opportunity
  • Data scientists? Recognition and career
    development
  • Native data scientists are coming.

20
  • Dealing with the Data Deluge
  • JISC Repositories Programme
  • Supporting Institutions in the Digital Age
  • Digital Repositories Conference
  • 5-6 June 2007
  • University of Manchester
  • Research Data Strand
Write a Comment
User Comments (0)
About PowerShow.com