PPARC Data Curation Policy A Solar System Science Perspective - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

PPARC Data Curation Policy A Solar System Science Perspective

Description:

Ensure minimum (high) standards for curation ... NASA/NSSDC: http://nssdc.gsfc.nasa.gov/nssdc/data_retention.html. NCAR/CEDAR: ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 19
Provided by: author4
Category:

less

Transcript and Presenter's Notes

Title: PPARC Data Curation Policy A Solar System Science Perspective


1
PPARC Data Curation Policy- A Solar System
Science Perspective
  • Ian McCrea
  • Space Science and Technology Department
  • Rutherford Appleton Laboratory

2
Does PPARC need a data policy ?
  • No doubt that it does, in order to
  • Guarantee long-term data security
  • Publicise PPARC data holdings (catalogues etc.)
  • Establish clear responsibilities for data
  • Guarantee continuity of funding
  • Ensure minimum (high) standards for curation
  • Ensure uniform ease of access, and control of
    access (for restricted data)
  • Facilitate cross-council, cross-community
    collaboration and international collaboration
  • Ensure that projects use current best practice.

3
more than just a data policy !
  • The policy needs to apply to more than just data
    ! It needs to encompass
  • Metadata (data about data)
  • Analysis and data handling software
  • Visualisation programs
  • Documentation
  • In short, whatever is needed to make data usable
    to a knowledgeable community.
  • ..and we need to think about what is archived,
    e.g. not just prime parameters but full data
    sets

4
Importance of historic data
  • Solar system science involves observation of a
    highly dynamic system
  • Many key periodicities, from sub-second plasma
    waves to 11-year and 22-year solar cycles
  • Many phenomena depend in a complex way on
    multi-variant conditions
  • Much current interest in long-term change
    (solar, solar wind, atmospheric)
  • Historical data are thus potentially unique, and
    cannot be re-created
  • Data are expensive to collect !

5
The Present Situation
  • Different standards of provision apply across the
    solar system science area
  • Solar High standard thanks to ESA/NASA also
    EGSO, SolarSoft etc.
  • Planetary High standard thanks to NASA/ESA
    archive systems
  • Solar Wind/Magnetosphere Variable. More
    missions. More dispersed and complex data, fewer
    common standards
  • Upper Atmosphere Variable. Highly diverse,
    complex and dispersed multi-instrument data set
    with few common standards
  • Concern that some data sets (e.g. CRRES, GOES,
    SABRE) might already be becoming unusable for
    lack of a good archive

6
Data Policies in other councils
  • NERC and EPSRC already have well thought-out data
    policies..
  • NERC www.nerc.ac.uk/data
  • EPSRC www.data-archive.ac.uk
  • and years of experience in implementing them
  • Institutes like BGS are considered to be paragons
    of best practice
  • .PPARC has no data policy.

7
Overseas Expertise
  • A number of overseas organisations and institutes
    also have data policies that might provide a
    model
  • ESA
  • e.g. www.knmi.nl/meulenvd/esa/Envisat/data-polic
    y.html
  • NASA/NSSDC http//nssdc.gsfc.nasa.gov/nssdc/data_
    retention.html
  • NCAR/CEDAR
  • www.atd.ucar.edu/dir_off/datapolicy.html
  • .and more in other areas of PPARC activity

8
The NERC Model Philosophy
  • Access to data is a key requirement for
    researchers
  • Data belong to those that have paid for them, not
    to individual groups and PIs
  • Quality control is of key importance
  • IPR must be protected
  • Councils should commit themselves to data
    provision which facilitates publication
  • NERC makes a clear commitment to adequate funding

9
The NERC Model Implementation I
  • Seven designated data centres
  • (AECD, BADC, BODC, NGDC, NWA, EIC, NEODC)
  • All data come under the aegis of one of the above
    (decided by mutual agreement)
  • Mixture of custody and awareness
  • Data Management Advisory Group and Data
    Management Co-Ordinator
  • Definition of cradle to grave data management
    policy before start of project

10
The NERC Model Implementation II
  • Data centres ensure physical custody, validation,
    dissemination, review/purging
  • Seek out data which merits stewardship
  • Promulgate catalogues, directories etc
  • Formalise data exchange arrangements with
    potential users
  • Provide audit information for performance
    monitoring
  • Share best practice in IT, storage technology etc.

11
What might PPARC do differently ?
  • Emphasise importance of project team
  • Closest to the science issues
  • Detailed knowledge of data and metadata
  • Producers of applications software
  • Best people to provide documentation
  • Skills of project teams should must be exploited,
    and their role should be integrated with that of
    data centres
  • We should still oblige projects to entrust their
    data to the system (with penalties for
    non-compliance!)

12
Four Phases of a Project
  • Phase 1 Specification Phase (before start of
    project) Definition of responsibilities during
    the project lifetime
  • Phase 2 Active Phase (during project) - Primary
    responsibility with project team, in association
    with relevant data centre
  • Phase 3 Handover Phase (end of mission) -
    Transfer of responsibility from project team to
    data centre
  • Phase 4 Archive Phase (post-mission) - Relevant
    PPARC data centre provides all data and
    associated data products

13
PPARC Data Centres
  • How many do we need ?
  • One super-centre is a non-starter. Too much
    concentration of resources, too broad a remit.
    Danger of single-point failure
  • Too many small data centres would also be
    inefficient (not sufficiently well-resourced, too
    few people)
  • Optimum number across PPARC area might be
    something like 5-6.
  • Correspondence to subject areas e.g. particle
    physics, solar system, optical astronomy, radio
    astronomy etc.

14
The Role of the Grid
  • Much discussion of distributed data holdings and
    virtual observatories
  • PPARC involvement in AstroGrid, GridPP, IVOA etc.
  • This is clearly the shape of things to come, but
    experience of data centre teams is still key
  • Even in the distributed Grid world, critical
    mass problems still apply if resource is
    dispersed too widely.
  • Grid technology liberates the potential of data
    centres, but does not replace them.
  • Data centres then become portals to
    Grid-enabled data holdings

15
Cross-Council Collaboration and Peer Review
  • Since other councils have well-established
    policies, there is clear potential for
    cross-council collaboration
  • Example A Data Centre Alliance across research
    councils could deal with a whole range of common
    issues
  • There is also clear scope for international
    collaboration (to some degree already in place)
  • Peer review mechanisms should also be put into
    place to monitor the work of the PPARC Data
    Centres.

16
Summary
  • PPARC needs a formal data policy
  • Data curation needs to be resourced in a
    systematic manner
  • Other councils have already put into place good
    frameworks on which we can build
  • Need a system which maintains the role of
    instrument groups/PI teams, and recognises Grid
    developments
  • Need a manageable number of data centres (not too
    large or too small)
  • Marginal costs of doing this are probably not
    unreasonably large (lt1 of overall project cost
    is too little, but gt 10 is unacceptable)

17
Déjà vu ?
18
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com