Title: Addressing the Challenge of Global Infrastructure Development for Data Curation
1Addressing the Challenge of Global Infrastructure
Development for Data Curation
2Infrastructures for curation
- Social / Legal / Financial / Organisational
- Agreements / Trust / Standards
- Costs/ Benefits/ Rewards
- Technical components
3What is needed?
4Disincentives for curation cost
Budget available
- Future generations do NOT
- - Vote
- - Pay taxes
Money
If cost of preserving old information increases
Time
Need to show that costs will be contained
5Digital Preservation
- Need to preserve information knowledge not
just the bits - Documents, videos are rendered simple?
- Data must be processed in new ways - harder
- Need to manage knowledge to keep archives alive
through time - Preservation is a process, not a one-time event
- Preservation is expensive costs need to be
shared - The alternative is money endless supplies of
money - Open Archival Information Systems Reference Model
(ISO 14721) provides a general conceptual
framework and terminology - (http//public.ccsds.org/publications/archive/650x
0b1.pdf) - OPEN process not just Open Archives
Need more than just formats
Preservation as the cornerstone of duration
6Things change/disappear
How can we ensure that the information trapped in
the bits remains understandable despite all
these changes?
- Software
- Hardware
- Environment
- E.g. Network links to related information
- People
- What is common knowledge
How can a digital curator even be aware of these
changes?
7Validation
Live a long time
- How can we judge any proposed solution?
- Proposed validation metrics
- Theoretic underpinning
- Testbed scenarios addressing real issues
- Accelerated lifetime tests
- Hardware and Software
- Environment
- People
- Improved trustability/certifiability
Evidence - not proof
8Infrastructure
- No organisation can do everything that is
required for digital preservation forever - Need to share the cost/effort
- Need to identify commonalities
- None will be a perfect fit for all purposes
9Obstacles to Sharing the Effort
- Not invented here
- Differences terminology/ language
- The Rendered Object vs Data divide
- Lack of interoperability
- Cannot force a master plan on everyone
- Different timescales
- Reluctance about long-term funding commitments
- Fragmentation of effort
- Made worse by incentives for branding
10OAIS Functional Entities
11Information Model Representation Information
The Information Model is key
Recursion ends at KNOWLEDGEBASE of the DESIGNATED
COMMUNITY (this knowledge will change over time
and region)
12Data
- Level 2 GOME Satellite instrument data
13Unfamiliar information
- Preservation
- Digitally encoded information which must be
usable and understandable - Unfamiliar because of separation in time
- E-Science/GRID/CyberInfrastructure for data
- Digitally encoded information which must be
usable and understandable - Unfamiliar because of separation in discipline or
location even if created yesterday
- Support automated usage where possible
14E-Infrastructures
- Doing preservation right produces gains in
usage - Preservation useful for E-science/GRID
- Underpins publication and re-use aspects of
curation - Enables multi-disciplinary usage
- Virtualisation allows one to deal with unfamiliar
objects more easily - E-Infrastructures need components to support
preservation and gain interoperability - Additional types of Finding Aids also useful
- Double payback
- Applicable in GRID context
- usability now as well as later
15Commonalities bit level
16Non-common aspects
- Budgets
- Decisions of what to preserve
- Cost benefit analyses
- Fancier access methods
- Specific domain software
- National legal aspects
But can share best practice
17.Commonalities
- Persistent Identifier system
- Registries of Representation Information to
allow understandability - More than formats
- Tools services for creating and preserving the
preservation artefacts - Provenance, DRM, Access Control, Rep Info
i(Structure, Semantics, software etc etc) - Trustability Audit and Certification
- Cost modelling
- With appropriate domain parameterisation
- Virtualisation to assist use in disciplinary
software - Some aspects of access tools
18CASPAR information flow architecture
Virtualisation
19Alliance for Permanent Access
- Part of strategy from Task Force for Permanent
Access to the Records of Science - With Research programme outline
- Aim to align digital curation activities in the
members - Members of Alliance
- The European Science Foundation
- European Space Agency
- CERN
- Max Planck Gesellschaft
- Centre National d'Etudes Spatiales
- Science and Technology Facilities Council
- The British Library
- Koninklijke Bibliotheek
- Deutsche Nationalbibliothek
- Joint Information Systems Committee
- International Association of Scientific,
Technical and Medical Publishers - National Archives of Sweden
- Centre Informatique National de l'Enseignement
Superieur - Digital Preservation Coalition
- NESTOR
20Common Infrastructure elements
- Preservability and persistence of the
infrastructure in some sense - Preservation ?understandability and usability
- Persistence of preservation artefacts
- Ensure Persistent Identifier system is persistent
- Just one system if possible
- Virtualisation techniques for interoperability
and persistence - Infrastructure should share what can be common
e.g. Representation Information - Work with industry e.g. more intelligent
storage options - International Accreditation and Certification
system - Brokerage systems to recognise finite lifetimes
of organisations/projects - Prepare to hand over holdings in a way which is
preseravable - Common discussion forum
- international., non-badged, persistent
- Fight fragmentation
- Align existing local infrastructures
Stimulate the Market
PARSE.Insight will refine these ideas
21END