Preservation and Long Term Access to Data and Records in a Knowledgebased Society - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Preservation and Long Term Access to Data and Records in a Knowledgebased Society

Description:

Preservation and Long Term Access to Data and Records in a Knowledge-based Society ... Knowledge-based access. National Partnership for Advanced Computational ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 21
Provided by: reag1
Learn more at: http://www.sdsc.edu
Category:

less

Transcript and Presenter's Notes

Title: Preservation and Long Term Access to Data and Records in a Knowledgebased Society


1
Preservation and Long Term Access to Data and
Records in a Knowledge-based Society Reagan W.
Moore San Diego Supercomputer Center moore_at_sdsc.ed
u http//www.npaci.edu/DICE/
2
Data and Knowledge Systems Group
  • Staff
  • Reagan Moore
  • Ilkai Altintas
  • Chaitan Baru
  • Sheau Yen Chen
  • Charles Cowart
  • Amarnath Gupta
  • George Kremenek
  • M. Kulrul
  • Bertram Ludäscher
  • Richard Marciano
  • A. Memon
  • XuFei Qian
  • Roman Olshanowsky
  • Arcot Rajasekar
  • Abe Singer
  • Michael Wan
  • Ilya Zaslavsky
  • Bing Zhu
  • Graduate Students
  • A. Bagchi
  • S. Bansal
  • A. Behere
  • R. Bharath
  • S. Bharath
  • L. Sui
  • Undergraduate Interns
  • N. Cotofana
  • D. Le
  • J. Trang
  • L. Yin
  • /- NN

3
Topics
  • Building persistent archives
  • Data grids
  • Authenticity mechanisms
  • Managing technology evolution
  • Knowledge-based access

4
Archival Processes
  • ? Appraisal determine the archivable content
  • ? Accession - determine the initial physical
    location for the data, and the relationship of
    the new collection to existing collections
  • Arrangement - add administration control,
    describe the information content (provenance,
    authenticity, structure, administrative), and
    decompose digital objects into their components
    as needed.
  • Description - complete the definition of
    collection attributes by iterating between
    arrangement, reformatting, and representation.
  • Preservation build an archivable form of the
    digital entities, characterize the collection
    context , and manage their storage
  • ? Access provide query mechanisms for
    discovering, retrieving, and presenting the
    digital entities.

5
ERA Concept model
6
Common Approach (digital library, persistent
archive, data grid)
  • Logical name space used to organize digital
    entities, and associate attributes
  • Separation of information management from data
    storage management
  • Definition of abstraction mechanisms for dealing
    with repositories
  • Emergence of need for knowledge management

7
SDSC Storage Resource Broker Meta-data
Catalog Levels of Abstraction
Application
Linux I/O
Web WSDL
DLL / Python
Java, NT Browsers
Prolog Predicate
Clients

Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
Servers
HRM
8
Authenticity
  • Guarantee that the data has not been changed
  • Collection owned data, only accessible through
    the data handling system
  • Support roles defining access (curation, owner,
    annotation, read)
  • Support access controls mapping users to roles
  • Audit trails that record all operations on files
  • Digital signatures - cryptographic checksums

9
Managing Technology Evolution
  • Data grids provide interoperability mechanisms to
    access data in multiple administration domains
    and multiple types of storage systems.
  • Persistent archives migrate collections from old
    technology to new technology to support
    presentation on new systems
  • Both require the ability to access heterogeneous
    systems

10
Presentation of Digital Objects
Application
Operating System
Storage System
Display System
Digital Object
11
Technology Management - Emulation
Old Application
Wrap Application
New Operating System
New Storage System
New Display System
Digital Object
12
Technology Management
Old Application
Add Operating System Call
New Operating System
New Storage System
New Display System
Digital Object
13
Technology Management
Old Application
Add Operating System Call
New Operating System
Add Operating System Call
Old Storage System
Old Display System
Digital Object
14
Technology Management Migration
New Application
New Operating System
New Storage System
New Display System
Migrate Encoding Format
Digital Object
15
Technology Management - SDSC
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Object
16
Accessing Archived Data
  • Name transparency
  • Access data without knowing the file name
  • Map from attributes to a local file name
  • Location transparency
  • Access data without knowing where it is stored
  • Map from global file name to local file name
  • Collection transparency
  • Access data without knowing the collection
    attributes
  • Map from concept space to collection attributes

17
Information Management- Logical Name Space
  • Set of attributes to describe digital entities
    that are registered into the logical name space
  • SRB metadata - Unix file system semantics
  • Provenance metadata - Dublin Core
  • Resource metadata - User access control lists
  • Discipline metadata - User defined attributes
  • Each digital entity may have unique attributes

18
Knowledge Management - Discovery across
Collections
  • Mapping from collection attributes to discipline
    concepts
  • Make queries based on discipline concepts
  • Characterization of relationships between
    attributes
  • Semantic / logical - cross-walks
  • Procedural / temporal - records management
  • Structural / spatial - GIS

19
Knowledge Based Data Grids
Ingest Services
Management
Access Services
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Model-based Access)
XML DTD
Information Repository
Attribute- based Query
Attributes Semantics
SDLIP
Information
(Data Handling System - SRB)
Data
Fields Containers Folders
Storage (Replicas, Persistent IDs)
Grids
Feature-based Query
MCAT/HDF
20
Further Information
http//www.npaci.edu/DICE
Write a Comment
User Comments (0)
About PowerShow.com