Preservation Environment Working Group - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Preservation Environment Working Group

Description:

Data Grid interoperability demonstration, applicable to preservation environments ... Formally define infrastructure independence for preservation ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 16
Provided by: charl385
Learn more at: http://www.ggf.org
Category:

less

Transcript and Presenter's Notes

Title: Preservation Environment Working Group


1
Preservation Environment Working Group
  • Officers Bruce Barkstrom (NASA Langley)
  • Reagan Moore (SDSC)
  • Goals
  • Demonstrate interoperability between multiple
    preservation environments that are based on data
    grid technology
  • Significant Accomplishments at this GGF
  • Documents published in American Archivist, JCDL,
    e-Science
  • Data Grid interoperability demonstration,
    applicable to preservation environments
  • Plans
  • Formally define infrastructure independence for
    preservation
  • Demonstrate migration of preservation
    environments between three projects
  • Taiwan preservation environment
  • SDSC preservation environment
  • University of Maryland preservation environment
  • Concerns/Issues
  • Building on data grid technology. Need
    additional demonstrations of interoperability
    between data grid implementations
  • Extend environment to additional preservation
    projects

2
Intellectual Property Policy
  • I acknowledge that participation in GGF8 is
    subject to the GGF Intellectual Property Policy.
  • Intellectual Property Notices Note Well All
    statements related to the activities of the GGF
    and addressed to the GGF are subject to all
    provisions of Section 17 of GFD-C.1 (.pdf), which
    grants to the GGF and its participants certain
    licenses and rights in such statements. Such
    statements include verbal statements in GGF
    meetings, as well as written and electronic
    communications made at any time or place, which
    are addressed to the GGF plenary session,
  • any GGF working group or portion thereof,
  • the GFSG, or any member thereof on behalf of the
    GFSG,
  • the GFAC, or any member thereof on behalf of the
    GFAC,
  • any GGF mailing list, including any working group
    or research group list, or any other list
    functioning under GGF auspices,
  • the GFD Editor or the GWD process
  • Statements made outside of a GGF meeting, mailing
    list or other function, that are clearly not
    intended to be input to an GGF activity, group or
    function, are not subject to these provisions.
  • Excerpt from Section 17 of GFD-C.1 Where the GFSG
    knows of rights, or claimed rights, the GGF
    secretariat shall attempt to obtain from the
    claimant of such rights, a written assurance that
    upon approval by the GFSG of the relevant GGF
    document(s), any party will be able to obtain the
    right to implement, use and distribute the
    technology or works when implementing, using or
    distributing technology based upon the specific
    specification(s) under openly specified,
    reasonable, non-discriminatory terms. The working
    group or research group proposing the use of the
    technology with respect to which the proprietary
    rights are claimed may assist the GGF secretariat
    in this effort. The results of this procedure
    shall not affect advancement of document, except
    that the GFSG may defer approval where a delay
    may facilitate the obtaining of such assurances.
    The results will, however, be recorded by the GGF
    Secretariat, and made available. The GFSG may
    also direct that a summary of the results be
    included in any GFD published containing the
    specification. GGF Intellectual Property
    Policies are adapted from the IETF Intellectual
    Property Policies that support the Internet
    Standards Process.

3
Preservation Components
  • Authenticity - manage links to preservation
    metadata
  • OGSA naming / OGSA DAIS / Information
    Dissemination / DFDL
  • Integrity - assure data and metadata are not
    corrupted, track chain of custody, manage access
    controls, update state information
  • OGSA naming / OGSA DAIS / Grid File Systems /
    OGSA Data / Grid Information Retrieval / OGSA
    Authorization
  • Infrastructure independence - assure that no
    dependencies are introduced on a particular
    vendor product
  • Grid File Systems / DFDL / OGSA Data Replication
    / Grid Storage Management / GridFTP / Transaction
    Management / OGSA Data / Grid Remote Procedure
    Call

4
Two Approaches
  • First
  • Define services on which preservation processes
    are based
  • Integrate services under a controlling
    preservation environment interface (portal)
  • Second
  • Define collection properties needed to affirm
    preservation integrity and authenticity
  • Use data grid technology to manage infrastructure
    independence. This is the ability to migrate the
    archives - managed records - to another choice
    of technology infrastructure
  • Data grid interoperability can be used to
    demonstrate authenticity, integrity, and
    infrastructure independence

5
Preservation Services
  • Appraisal
  • DAIS / Grid File Systems
  • Accession
  • GridFTP / Grid File Systems / DAIS / Transaction
    Management / OGSA Data / OGSA Naming / GridFTP
  • Description
  • DAIS / OGSA Naming / DFDL / Transaction
    Management
  • Arrangement
  • Grid File Systems / DAIS
  • Preservation
  • Grid File Systems / Grid Storage Management /
    OGSA Data Replication / GridFTP / Transaction
    Management / OGSA Naming
  • Access
  • DAIS / DFDL / Grid File Systems / GridFTP /
    Transaction Management

6
GGF Services
  • Infrastructure Standards Groups
  • Ipv6
  • Network Measurement
  • Data Transport
  • Grid High-Performance Networking
  • Network Measurement for Applications
  • Data Standards Groups
  • Data Access and Integration Services
  • Grid File Systems
  • Data Format Description Language
  • GridFTP
  • Grid Storage Management
  • Information Dissemination
  • OGSA Data Replication Services
  • Transaction Management
  • OGSA Data
  • Byte IO
  • 3Compute Standards Groups
  • Grid Resource Allocation Agreement Protocol

7
GGF Services
  • Architecture Standards Groups
  • Open Grid Services Architecture
  • Grid Protocol Architecture
  • OGSA Naming
  • Applications Standards Groups
  • Grid Remote Procedure Call
  • Grid Information Retrieval
  • Distributed Resource management Application API
  • Simple API for Grid Applications
  • Grid Checkpoint Recovery
  • Management Standards Groups
  • Application Contents Service
  • Configuration Description, Deployment, and
    Lifecycle Management
  • Grid Economic Services Architecture
  • OGSA Resource Usage Service
  • Usage Record
  • Security Standards Groups
  • Open Grid Service Architecture Authorization
  • OGSA-P2P-Security

8
Implementations
  • NARA
  • Research prototype persistent archive
  • Electronic Records Archive
  • Persistent Archive Testbed
  • SDSC
  • NSDL persistent archive
  • CDL Digital Preservation Repository
  • NASA Langley
  • Archive Next Generation - ANGe
  • Your preservation environment

9
Collection-based Approach
  • Authenticity - assertions made by creator of
    records
  • Provenance metadata
  • Descriptive metadata
  • Encapsulation of metadata with data in an
    Archival Information Package
  • Validation of consistency between authenticity
    metadata and stored data
  • Verify data file exists for each metadata record
  • Verify for each stored data file, a metadata
    record exists
  • Validation of provenance metadata
  • Verify consistency of defined metadata attributes
    across all records
  • Verify preservation consistency constraints (a
    record appears only once)

10
Collection-Based Approach
  • Authenticity
  • Validation of assertions about the
    collectionCharacterization of assertions as
    management policies
  • Mapping of management policies to executable
    rules
  • Specification of state information on which the
    rules operate
  • Specification of state information to manage rule
    outcomes
  • Implementation
  • Granularity of application Type of rule
  • Enterprise Setting of rule parameters
  • Archives Aperiodic rule
  • Collection Periodic rules
  • Record Atomic rules

11
Collection-based Approach
  • Integrity - assertions made by archivists that
    both the data and metadata are uncorrupted, the
    chain of custody can be tracked, all actions
    performed by identified persons, the risk of data
    loss has been minimized
  • Requires mechanisms for
  • Checksums - checks based on file size, System5
    checksum, MD5 checksum
  • Replicas, backups, versions
  • Synchronization - between replicas, between
    system buffers and storage, between archives and
    local storage
  • Federation - replication of both metadata and
    data, while coordinating name spaces
  • Authentication - unique identity for archivists
    independently of storage system
  • Authorization - access controls managed
    independently of storage system

12
Data Grid Interoperability Demonstration
  • Provides the mechanisms required to demonstrate
    infrastructure independence while asserting
    authenticity and integrity
  • Federated 13 data grids, including data grids
    that are supporting preservation environments
  • TWGrid (ASGC - Taiwan archives)
  • umiacs (University of Maryland - NARA
    prototype)
  • SDSC-GGF (SDSC - NARA, NHPRC, CDL, NSDL)
  • Can extend demonstration to include
  • Export of archives into an independent data
    management system
  • Import of archives back into original data
    management system without loss of authenticity
  • Must track chain of custody, access permissions,
    identity of archivists, audit trail of operations
    performed, persistence of name spaces
  • Validate integrity of archives
  • Maintenance of links between metadata and data
  • Bit preservation

13
Propose Preservation Demonstration
  • Formal validation of existing archives
  • Consistency between metadata and stored data
  • Verification of name space integrity
  • Formal extraction of records
  • Bulk operations to extract metadata
  • Formal deposition of records into a federated
    data grid
  • Federation with a second data grid
  • Bulk operations to load metadata and data into
    remote data grid
  • Formal validation of new archives
  • Consistency between metadata and stored data
  • Verification of name space integrity
  • Formal export of records from the new archive and
    import back into the original archives, without
    loss of authenticity or integrity

14
Preservation Demonstration
  • Require specification of
  • Test archives
  • Metadata
  • Records
  • Name spaces that will be used
  • Archivists
  • Metadata
  • Records hierarchy
  • Storage resources for audit trails
  • Assessment criteria
  • Number of replicas
  • Strength of checksum
  • Audit trail
  • Invariance of name spaces
  • Validation of authenticity metadata

15
Papers
  • Moore, R., J. JaJa, A. Rajasekar, Storage
    Resource Broker Data Grid Preservation
    Assessment, SDSC Technical Report TR-2006.3, Feb
    2006.
  • Moore, R., M. Smith, Assessment of RLG Trusted
    Digital Repository Requirements, JCDL on
    "Digital Curation Trusted Repositories Seeking
    Success, June 2006, Chapel Hill, North Carolina.
  • Moore, R., Building Preservation Environments
    with Data Grid Technology, American Archivist,
    vol. 69, no. 1, pp. 139-158, July 2006.
  • Moore, R., A. Rajasekar, M. Wan, W. Schroeder, R.
    Marciano, On Building Trusted Digital
    Preservation Repositories, submitted to 5th
    e-Science All Hands Meeting, Sept. 2006,
    Nottingham, UK.
Write a Comment
User Comments (0)
About PowerShow.com