What is Data Anyway? Findings from the StORe Project - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

What is Data Anyway? Findings from the StORe Project

Description:

e-mail data owner to request access to approved but ... Is beginning to be invented. There is no business model. eScience Institute, Edinburgh - 14-06-07 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 31
Provided by: nes68
Category:

less

Transcript and Presenter's Notes

Title: What is Data Anyway? Findings from the StORe Project


1
What is Data Anyway? Findings from the StORe
Project
  • John MacCollEdinburgh University Library

2
Structure
  1. Thoughts about data and business models
  2. StORe and its findings
  3. Curators, publishing and libraries emerging
    roles and models

3
The bioscientist's question
  • Need we polarise (source and outputs)?
  • Prose is problematic
  • Structured data publication is more accurate
  • Relying on the literature Google-ises access to
    results
  • It is not scientific

4
Instead ...
  • Mark-up statistical results
  • (i.e. enrich the data)
  • Publish directly on the web in XML databases
  • Link to commonly agreed domain ontologies
  • (Publish the papers as well)

5
But
  • Where is the business model?
  • Who pays?
  • Who does the curation?
  • Libraries?

6
(No Transcript)
7
The Survey
8
Two-Way Links?
  • 85 support project aims as potentially
    advantageous to conduct of research
  • Would save so much time, making research more
    productive
  • Would allow reanalysis as new methods emerged
  • Integration of multiple data sets from different
    publications

9
Open Access?
  • It should be a requirement that data from
    publicly funded research is freely available
  • Restrict access until results are published to
    prevent data scavenging
  • A creditable aspiration but without a data
    administrator this represents a large burden from
    editing, compiling and sanctioning release

10
Data management?
  • 75 generate and use complex data sets
  • Storage of unique and original research on PCs
    and laptops is commonplace
  • Access influenced by perceived absence of
    adequate protection and need for interpretation
  • Data is held on secured CDs in encrypted format
    with only an identifying code. The codebook is
    kept physically separate.
  • Data volume and lack of time/experience compound
    issues of data ownership
  • Im not encouraged or discouraged from providing
    data. It just does not justify the effort.

11
Repositories?
  • Development and use culture of self-sufficiency
    broad range of effectiveness
  • Some sophistication but limited understanding and
    familiarity also found across the sector
  • Very happy with what we have in astronomyplease
    dont mess with them for the sake of some
    aesthetic..
  • ... my understanding of this topic is so limited
    that giving any answer would be trivial

12
Metadata?
  • Appropriate assignment acknowledged to be
  • Critical
  • Demanding (intellectually and in the time it
    requires)
  • Consensus on the need for good metadata does not
    necessarily translate into its provision
  • High level of self-assignment

13
Scientists and Metadata the Facts
  • I decide which terms to use and I assign
    them 212
  • Research colleagues assign metadata on the team's
    behalf 55
  • Research support staff assign metadata on the
    team's behalf 22
  • Metadata are assigned by library/information
    services staff 4
  • Metadata are assigned by the repository
    administrators 37
  • Metadata are generated automatically 63
  • It is not known who assigns metadata 68
  • Other (please specify) 37

14
Support?
  • Lack of awareness of available support
  • Disinclination to seek support
  • Self-reliance with IT matters
  • Prefer online or documentary support
  • Discipline knowledge essential
  • With few exceptions, management of research data
    not usually associated with librarians
  • Yet

15
  • Qualified demonstration of expertise in metadata,
    data preservation or curation
  • Examples of uncritical use of sources
  • Both access and sharing restricted by lack of
    confidence in processes
  • Browsing, online help and other features need
    expert support and maintenance
  • Need to boost awareness of opportunities from
    electronic data management

16
Corroboration
  • Researchers and discovery services behaviour,
    perceptions and needs RIN report, November 2006
  • contact with librarians and information
    professionals is rare
  • researchers are generally confident in their
    self-taught abilities.., librarians see them
    as..relatively unsophisticated
  • librarians see it as a problem that they are
    not reaching all researchers with formal
    training, whereas most researchers dont think
    they need it

17
Improving the Research Management Lifecycle
  • Hypothesis
  • Undertake research/experiment
  • Produce data/commence data curation
  • Publish data ? and/or paper? - bioscientist
  • Select and organise evidence
  • Write/assemble paper
  • Submit to peer review
  • Apply revisions and produce final draft
  • Publish paper ?
  • Activate data-publication links
  • Peer review of data?

18
Two-way links - benefits and risks
  • Opportunities to
  • Explore a deeper level of detail
  • Validate experiments
  • Track the use and improvement of research output
  • Identify collaborators
  • Confirm completeness of information searches
  • Supplement published papers
  • Potential risks from
  • Uncertainty of peer review
  • Premature dissemination
  • Subversion of scholarly paper
  • Scavenging
  • Lack of interpretive data

19
The StORe Pilot Demonstrator
  • Allows publication only if data has been
    deposited or identified
  • Groups items (data and publications) as projects
  • Accepts projects based on either primary or
    secondary analysis
  • Primary analysis creates new data
  • Secondary analysis is based on existing data and
    may or may not create new data
  • Source repository - the UK Data Archive
  • Output repository - the LSEs Research Articles
    Online
  • Pilot federation - includes a test institutional
    repository at the University of Essex

20
(No Transcript)
21
How StORe works
  • At present, search is across metadata in Essex
  • Find an article and you can
  • find the associated data
  • move to the official versions at LSE or UKDA
  • list all articles and related items of the author
  • Find data and you can
  • find all articles associated with them
  • e-mail data owner to request access to approved
    but embargoed/private data

22
How StORe works
  • A user can search across all or specific
    collections without being logged in, using a
    simple Google-type search

23
  • ... or by employing more advanced options

24
  • Registered users can also view data within
  • Solely owned private or public collections
  • Collaborative collections within a federated
    source or output repository
  • Collections to which they are a contributor

25
StORe the future?
  • Essex as multi-disciplined institutional
    repository in large federation of
  • source output repositories plus
  • StORe-enabled institutional repositories at other
    HE/FE institutions
  • If all universities had a StORe system,
    collaborations could be established with the
    institutional repository/repositories, and all
    relevant data repositories. Researchers would use
    StORe as their single route to deposit and
    associate
  • They would need curators to assist

26
StORe one approach to data publishing
  • Data deposit in institutional repositories until
    accepted by source repository
  • Source repository verifies data authenticity
  • Publication dependent on data deposit and
    subject to output repository controls
  • Access to non-public objects can be restricted
    and requires authentication
  • Release of data can be embargoed
  • Curators should know about it and its alternatives

27
Curators link across paradigms
  • Curation embraces appraisal, description,
    preservation, rights management ...
  • It requires understanding of science at its
    domain levels
  • Good curation is key to
  • achieving more rapid discovery
  • achieving more efficient research
  • preventing bad practice
  • preventing lazy supervision
  • preventing sloppy analysis
  • preventing fraudulent claims

28
Scientific Publication
  • As prose
  • Journals provided good practice for a long time
  • Now no longer true, hence reinvention (Open
    Access, Creative Commons, SPARC, etc)
  • There was a business model which is now broken
  • As data
  • Has never existed
  • Is beginning to be invented
  • There is no business model

29
Libraries?
  • Exist to permit sharing
  • Protect fragile business models
  • Sustain otherwise uneconomic enterprises
  • Financial role will be to pay for publication
    embracing prose and data, and subsuming curation
  • Must help researchers to publish their data
  • Because the prose-data hierarchy is being
    gradually subverted (and peer review will
    ulimately require it)

30
Thank You
Write a Comment
User Comments (0)
About PowerShow.com