Addressing the Challenge of Global Infrastructure Development for Data Curation PowerPoint PPT Presentation

presentation player overlay
1 / 21
About This Presentation
Transcript and Presenter's Notes

Title: Addressing the Challenge of Global Infrastructure Development for Data Curation


1
Addressing the Challenge of Global Infrastructure
Development for Data Curation
  • David Giaretta

2
Infrastructures for curation
  • Social / Legal / Financial / Organisational
  • Agreements / Trust / Standards
  • Costs/ Benefits/ Rewards
  • Technical components

3
What is needed?
  • MONEY

4
Disincentives for curation cost
Budget available
  • Future generations do NOT
  • - Vote
  • - Pay taxes

Money
If cost of preserving old information increases
Time
Need to show that costs will be contained
5
Digital Preservation
  • Need to preserve information knowledge not
    just the bits
  • Documents, videos are rendered simple?
  • Data must be processed in new ways - harder
  • Need to manage knowledge to keep archives alive
    through time
  • Preservation is a process, not a one-time event
  • Preservation is expensive costs need to be
    shared
  • The alternative is money endless supplies of
    money
  • Open Archival Information Systems Reference Model
    (ISO 14721) provides a general conceptual
    framework and terminology
  • (http//public.ccsds.org/publications/archive/650x
    0b1.pdf)
  • OPEN process not just Open Archives

Need more than just formats
Preservation as the cornerstone of duration
6
Things change/disappear
How can we ensure that the information trapped in
the bits remains understandable despite all
these changes?
  • Software
  • Hardware
  • Environment
  • E.g. Network links to related information
  • People
  • What is common knowledge

How can a digital curator even be aware of these
changes?
7
Validation
Live a long time
  • How can we judge any proposed solution?
  • Proposed validation metrics
  • Theoretic underpinning
  • Testbed scenarios addressing real issues
  • Accelerated lifetime tests
  • Hardware and Software
  • Environment
  • People
  • Improved trustability/certifiability

Evidence - not proof
8
Infrastructure
  • No organisation can do everything that is
    required for digital preservation forever
  • Need to share the cost/effort
  • Need to identify commonalities
  • None will be a perfect fit for all purposes

9
Obstacles to Sharing the Effort
  • Not invented here
  • Differences terminology/ language
  • The Rendered Object vs Data divide
  • Lack of interoperability
  • Cannot force a master plan on everyone
  • Different timescales
  • Reluctance about long-term funding commitments
  • Fragmentation of effort
  • Made worse by incentives for branding

10
OAIS Functional Entities
11
Information Model Representation Information
The Information Model is key
Recursion ends at KNOWLEDGEBASE of the DESIGNATED
COMMUNITY (this knowledge will change over time
and region)
12
Data
  • Level 2 GOME Satellite instrument data

13
Unfamiliar information
  • Preservation
  • Digitally encoded information which must be
    usable and understandable
  • Unfamiliar because of separation in time
  • E-Science/GRID/CyberInfrastructure for data
  • Digitally encoded information which must be
    usable and understandable
  • Unfamiliar because of separation in discipline or
    location even if created yesterday
  • Support automated usage where possible

14
E-Infrastructures
  • Doing preservation right produces gains in
    usage
  • Preservation useful for E-science/GRID
  • Underpins publication and re-use aspects of
    curation
  • Enables multi-disciplinary usage
  • Virtualisation allows one to deal with unfamiliar
    objects more easily
  • E-Infrastructures need components to support
    preservation and gain interoperability
  • Additional types of Finding Aids also useful
  • Double payback
  • Applicable in GRID context
  • usability now as well as later

15
Commonalities bit level
  • Storage
  • Networks

16
Non-common aspects
  • Budgets
  • Decisions of what to preserve
  • Cost benefit analyses
  • Fancier access methods
  • Specific domain software
  • National legal aspects

But can share best practice
17
.Commonalities
  • Persistent Identifier system
  • Registries of Representation Information to
    allow understandability
  • More than formats
  • Tools services for creating and preserving the
    preservation artefacts
  • Provenance, DRM, Access Control, Rep Info
    i(Structure, Semantics, software etc etc)
  • Trustability Audit and Certification
  • Cost modelling
  • With appropriate domain parameterisation
  • Virtualisation to assist use in disciplinary
    software
  • Some aspects of access tools

18
  • Rep
  • Info

CASPAR information flow architecture
Virtualisation
19
Alliance for Permanent Access
  • Part of strategy from Task Force for Permanent
    Access to the Records of Science
  • With Research programme outline
  • Aim to align digital curation activities in the
    members
  • Members of Alliance
  • The European Science Foundation
  • European Space Agency
  • CERN
  • Max Planck Gesellschaft
  • Centre National d'Etudes Spatiales
  • Science and Technology Facilities Council
  • The British Library
  • Koninklijke Bibliotheek
  • Deutsche Nationalbibliothek
  • Joint Information Systems Committee
  • International Association of Scientific,
    Technical and Medical Publishers
  • National Archives of Sweden
  • Centre Informatique National de l'Enseignement
    Superieur
  • Digital Preservation Coalition
  • NESTOR

20
Common Infrastructure elements
  • Preservability and persistence of the
    infrastructure in some sense
  • Preservation ?understandability and usability
  • Persistence of preservation artefacts
  • Ensure Persistent Identifier system is persistent
  • Just one system if possible
  • Virtualisation techniques for interoperability
    and persistence
  • Infrastructure should share what can be common
    e.g. Representation Information
  • Work with industry e.g. more intelligent
    storage options
  • International Accreditation and Certification
    system
  • Brokerage systems to recognise finite lifetimes
    of organisations/projects
  • Prepare to hand over holdings in a way which is
    preseravable
  • Common discussion forum
  • international., non-badged, persistent
  • Fight fragmentation
  • Align existing local infrastructures

Stimulate the Market
PARSE.Insight will refine these ideas
21
END
Write a Comment
User Comments (0)
About PowerShow.com