Title: Preservation and Long Term Access to Data and Records in a Knowledgebased Society
1Preservation and Long Term Access to Data and
Records in a Knowledge-based Society Reagan W.
Moore San Diego Supercomputer Center moore_at_sdsc.ed
u http//www.npaci.edu/DICE/
2Data and Knowledge Systems Group
- Staff
- Reagan Moore
- Ilkai Altintas
- Chaitan Baru
- Sheau Yen Chen
- Charles Cowart
- Amarnath Gupta
- George Kremenek
- M. Kulrul
- Bertram Ludäscher
- Richard Marciano
- A. Memon
- XuFei Qian
- Roman Olshanowsky
- Arcot Rajasekar
- Abe Singer
- Michael Wan
- Ilya Zaslavsky
- Bing Zhu
- Graduate Students
- A. Bagchi
- S. Bansal
- A. Behere
- R. Bharath
- S. Bharath
- L. Sui
- Undergraduate Interns
- N. Cotofana
- D. Le
- J. Trang
- L. Yin
- /- NN
3Topics
- Building persistent archives
- Data grids
- Authenticity mechanisms
- Managing technology evolution
- Knowledge-based access
4Archival Processes
- ? Appraisal determine the archivable content
- ? Accession - determine the initial physical
location for the data, and the relationship of
the new collection to existing collections - Arrangement - add administration control,
describe the information content (provenance,
authenticity, structure, administrative), and
decompose digital objects into their components
as needed. - Description - complete the definition of
collection attributes by iterating between
arrangement, reformatting, and representation. - Preservation build an archivable form of the
digital entities, characterize the collection
context , and manage their storage - ? Access provide query mechanisms for
discovering, retrieving, and presenting the
digital entities.
5ERA Concept model
6Common Approach (digital library, persistent
archive, data grid)
- Logical name space used to organize digital
entities, and associate attributes - Separation of information management from data
storage management - Definition of abstraction mechanisms for dealing
with repositories - Emergence of need for knowledge management
7SDSC Storage Resource Broker Meta-data
Catalog Levels of Abstraction
Application
Linux I/O
Web WSDL
DLL / Python
Java, NT Browsers
Prolog Predicate
Clients
Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
Servers
HRM
8Authenticity
- Guarantee that the data has not been changed
- Collection owned data, only accessible through
the data handling system - Support roles defining access (curation, owner,
annotation, read) - Support access controls mapping users to roles
- Audit trails that record all operations on files
- Digital signatures - cryptographic checksums
9Managing Technology Evolution
- Data grids provide interoperability mechanisms to
access data in multiple administration domains
and multiple types of storage systems. - Persistent archives migrate collections from old
technology to new technology to support
presentation on new systems - Both require the ability to access heterogeneous
systems
10Presentation of Digital Objects
Application
Operating System
Storage System
Display System
Digital Object
11Technology Management - Emulation
Old Application
Wrap Application
New Operating System
New Storage System
New Display System
Digital Object
12Technology Management
Old Application
Add Operating System Call
New Operating System
New Storage System
New Display System
Digital Object
13Technology Management
Old Application
Add Operating System Call
New Operating System
Add Operating System Call
Old Storage System
Old Display System
Digital Object
14Technology Management Migration
New Application
New Operating System
New Storage System
New Display System
Migrate Encoding Format
Digital Object
15Technology Management - SDSC
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Object
16Accessing Archived Data
- Name transparency
- Access data without knowing the file name
- Map from attributes to a local file name
- Location transparency
- Access data without knowing where it is stored
- Map from global file name to local file name
- Collection transparency
- Access data without knowing the collection
attributes - Map from concept space to collection attributes
17Information Management- Logical Name Space
- Set of attributes to describe digital entities
that are registered into the logical name space - SRB metadata - Unix file system semantics
- Provenance metadata - Dublin Core
- Resource metadata - User access control lists
- Discipline metadata - User defined attributes
- Each digital entity may have unique attributes
18Knowledge Management - Discovery across
Collections
- Mapping from collection attributes to discipline
concepts - Make queries based on discipline concepts
- Characterization of relationships between
attributes - Semantic / logical - cross-walks
- Procedural / temporal - records management
- Structural / spatial - GIS
19Knowledge Based Data Grids
Ingest Services
Management
Access Services
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Model-based Access)
XML DTD
Information Repository
Attribute- based Query
Attributes Semantics
SDLIP
Information
(Data Handling System - SRB)
Data
Fields Containers Folders
Storage (Replicas, Persistent IDs)
Grids
Feature-based Query
MCAT/HDF
20Further Information
http//www.npaci.edu/DICE