Digital Data Preservation in Astronomy: A Collaboration Among Libraries, Publishers, and the Virtual - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Digital Data Preservation in Astronomy: A Collaboration Among Libraries, Publishers, and the Virtual

Description:

Robert Hanisch, Space Telescope Science Institute ... Robert Milkey, American Astronomical Society ... American Astronomical Society (journals, editors) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 25
Provided by: dcc8
Category:

less

Transcript and Presenter's Notes

Title: Digital Data Preservation in Astronomy: A Collaboration Among Libraries, Publishers, and the Virtual


1
Digital Data Preservation in Astronomy A
Collaboration Among Libraries, Publishers, and
the Virtual Observatory
A pilot project aimed at preserving, curating,
and enabling access to digital data and
associated electronic journals content.
  • Robert Hanisch, Space Telescope Science Institute
  • Sayeed Choudhury, Tim DiLauro, Alex Szalay, and
    Ethan Vishniac,
  • The Johns Hopkins University
  • Julie Steffen, University of Chicago Press
  • Teresa Ehling, Cornell University
  • Robert Milkey, American Astronomical Society
  • Ray Plante, National Center for Supercomputer
    Applications

2
Outline for Presentation
  • The Virtual Observatory
  • Data in Astronomy
  • The data preservation problem
  • A scenario
  • Past experience and research
  • Approach
  • A prototype project

3
The Virtual Observatory
  • The Virtual Observatory enables new science by
    greatly enhancing access to data and computing
    resources. The VO makes it easy to locate,
    retrieve, and analyze data from archives and
    catalogs worldwide.
  • The VO is about data discovery, access, and
    integration.
  • The VO is NOT a huge centralized data repository.
  • The VO provides standard protocols for obtaining
    data from distributed collections.
  • The VO is national (US NVO) and international
    (IVOA).

4
Without VO
n services, n interfaces
astronomer
archive 1
service 3
archive 2
service 2
archive 3
service 1
survey 1
survey 3
survey 2
5
With VO
n services, 1 interface
astronomer
archive 1
service 3
archive 2
service 2
VO
archive 3
service 1
survey 1
survey 3
survey 2
6
Why is Astronomy Data Special?
  • It has no commercial value
  • No privacy concerns
  • Can freely share results with others
  • Great for experimenting with algorithms
  • It is real and well documented
  • High-dimensional (with confidence intervals)
  • Spatial
  • Temporal
  • Diverse and distributed
  • Many different instruments from many different
    places and many different times
  • The questions are interesting
  • There is a lot of it (soon petabytes)

7
Data Flow (Levels of Data)
8
The data preservation problem
  • Research communities publish peer-reviewed
    journal papers that describe highly processed
    data.
  • Long-term preservation and curation systems for
    digital journal content, including the digital
    data presented only graphically, are not
    currently in place.
  • The research cannot be verified and the results
    cannot be easily compared to other data in order
    to broaden impact.
  • Public funds invested in scientific research do
    not have maximum return on investment. Essential
    legacy datasets may be lost.

9
Storyboard
10
Storyboard
11
Storyboard
Save as FITS Copy to my VOSpace
Display in Aladin
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Astronomy Digital Image Library
16
ADIL query
17
ADIL query
  • ADIL is great, but
  • Data capture and curation is separate from
    manuscript processing
  • Data access is not integrated into the journals
  • Data management is centralized

18
Repository-related Research
  • Digital Library framework comprises
    service-oriented architecture with repositories
    as foundation, especially for digital
    preservation
  • Archive Ingest and Handling Test (AIHT) through
    Library of Congress NDIIPP
  • A Technology Analysis of Repositories and Service
    Integration (funded by Mellon Foundation)
  • Project STORE (Source to Output Repositories)

19
Approach
  • Integrate digital data management into the
    publication process (data capture, review,
    metadata tagging and validation, storage).
  • Exploit emerging information technology standards
    for managing distributed data collections,
    including digital journals.
  • Provide multiple access methods to digital data
    to maximize visibility and re-use.
  • Exploit information management and curation
    experience in the university libraries and build
    on long-term institutional commitments to
    preservation.

20
Components
  • Publication
  • Editorial Process
  • Data capture
  • Metadata capture validation
  • Links
  • Identifiers
  • Library
  • Curation
  • Preservation
  • Data Storage Appliance
  • Metadata database
  • Digital data objects
  • Ancillary information
  • Data Storage Appliance
  • Metadata database
  • Digital data objects
  • Ancillary information
  • Data Storage Appliance
  • Metadata database
  • Digital data objects
  • Ancillary information

replication services
VOSpace
  • Data Access
  • VO portals
  • Journal portals
  • Other after-market distributors
  • Registry
  • Logging

21
A prototype project
  • Implement end-to-end prototype using astronomy
    scholarly publications as a test-bed.
  • Understand operational costs and develop
    long-term business plan for preservation of
    peer-reviewed journal content and associated
    supporting data.
  • Develop associated policies affecting data
    accessibility (e.g., move toward requiring
    digital data availability as requirement for
    publication).
  • Utilize commodity open-source technologies and
    partner with Virtual Observatory to maximize
    return on investment, flexibility, adaptability.
  • Long-term evaluate impact on citations and
    productivity resulting from having ready access
    to digital data.

22
A prototype project
  • Tasks
  • metadata definition
  • content management tool evaluation/selection
    (Fedora)
  • physical storage and replication
  • publication process revisions and testing
  • policy development
  • business model development
  • Shared technology development/deployment
    with National Virtual Observatory

23
Current collaborators
  • The Johns Hopkins University-Sheridan Libraries,
    Edinburgh University Library, University of
    Washington Library and Cornell University Library
    (information management and curation)
  • The National Virtual Observatory project
    (representatives from JHU, Space Telescope
    Science Institute, and the National Center for
    Supercomputing Applications)
  • American Astronomical Society (journals, editors)
  • The University of Chicago Press (publisher for
    the AAS journals)

24
Status
  • Support from
  • UK JISC (Joint Information Systems Committee) and
    CURL (Consortium of Research Libraries in the
    British Isles)
  • US Institute of Museum and Library Services
  • Support committed from
  • Microsoft
  • SPARC (Scholarly Publishing and Academic
    Resources Coalition)
  • TeraGrid
  • NVO
  • Development has started
Write a Comment
User Comments (0)
About PowerShow.com