Digital Archive Policies and Trusted Digital Repositories - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Digital Archive Policies and Trusted Digital Repositories

Description:

Synchronization between replicas. Federation between archives ... Are the replicas distributed across independent administrative domains on ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 23
Provided by: ken54
Category:

less

Transcript and Presenter's Notes

Title: Digital Archive Policies and Trusted Digital Repositories


1
Digital Archive Policies and Trusted Digital
Repositories
  • MacKenzie Smith, MIT Libraries
  • Reagan Moore, San Diego Supercomputer Center

2
What is the Problem?
  • Need to extract local collection management
    policies from software to be more discoverable,
    configurable
  • Need to standardize ILM policies for sharing
    across systems within a preservation environment
  • Need to define metadata to audit ILM operations
    and achieve trust in a scalable, automated way

3
(No Transcript)
4
Local Repository Policy/Rule Types
  • Enterprise specification of assertions
  • Archive a-periodic, deferred consistency rules
  • Collection periodic rules
  • Item periodic or atomic rules

5
Policy Framework
  • Based on the NARA/RLG TDR checklist categories
  • Organization, environment and legal policies
  • Community and usability policies
  • Process and Procedure policies
  • Technology and Infrastructure policies

6
Policy Framework
  • Abstract policy (high-level)
  • Example
  • repository stipulates the number and location of
    copies of all digital objects. Number of copies
    to be made, and which specific location(s),
    business rules, preferences for order of
    replication use. Repository has mechanisms in
    place to insure any/multiple copies of digital
    objects are synchronized.

7
Policy Framework
  • Concrete policy (local policy and metadata)
  • Example
  • Specific number of copies of digital objects
  • Locations of copies of digital objects
  • Order of preference for digital object copies
  • Location of business rules for copies (e.g.
    contract with 3rd party archives for remote
    copies)

8
Policy Encoding
  • Looked at lots of schemas and approaches
  • XACML and RuleML, BPEL too limited
  • Single purpose (access control, rights
    management, workflow, etc.)
  • Ponder and KAoS too risky
  • Research projects that are no longer active
  • Using Rei (N3) RDF ontology

9
Policy Exchange
  • DSpace DIPs
  • based on METS (also looked at XFDU, IMS CP,
    others)
  • encapsulates content files, metadata, provenance,
    and policies
  • iRODS
  • enforces policies based on local rules
  • produces state information (metadata) that can be
    audited by the DSpace repository over time

10
Example Functional Requirements
  • The ERA list defines 854 key capabilities
    (functional requirements) needed for
    preservation. These can be loosely organized
    into categories related to
  • Management of disposition agreements describing
    record retention and disposition actions
  • Accession, the formal acceptance of records into
    the data management system
  • Arrangement, the organization of the records to
    preserve a required structure (implemented as a
    collection/sub-collection hierarchy)
  • Description, the management of descriptive
    metadata as well as text indexing
  • Preservation, the generation of Archival
    Information Packages
  • Access, the generation of Dissemination
    Information Packages
  • Subscription, the specification of services that
    a user picks for execution
  • Notification, the delivery of notices on service
    execution results
  • Queuing of large scale tasks through interaction
    with workflow systems
  • System performance and failure reports. Of
    particular interest is the identification of all
    failures within the data management system and
    the recovery procedures that were invoked.
  • Transformative migration, the ability to convert
    specified data formats to new standards. In this
    case, each new encoding format is managed as a
    version of the original record.
  • Display transformation, the ability to reformat a
    file for presentation.
  • Automated client specification, the ability to
    pick the appropriate client for each user.

11
Rule Definition
  • Based on assessment criteria / preservation
    policies / preservation functional capabilities
  • Implemented as
  • Rules controlling micro-services with associated
    persistent state information

12
Case Study
  • SRB/iRODS virtualized storage environment
  • Provides 3rd party preservation services
  • Rules derived from local policy, preservation
    requirements
  • Provides metadata to allow monitoring for trust
  • DSpace_at_MIT institutional repository
  • Defines local collection management policies
  • Consumes 3rd party preservation services (e.g.
    iRODS)
  • Provides provenance/audit (History) to monitor
    trust

13
DSpace Event System
  • Archivist defines TDR-level abstract policies,
    System curator defines ILM events of interest,
    based on policies
  • e.g. ingest, modification, preservation
    migration, new edition, change in access rules,
    etc.
  • System detects and acts on events, records them
    in the local History (provenance audit)
  • e.g. iRODS deposit
  • History/provenance uses ABC Harmony ontology for
    ILM (RDF)
  • System curator monitors
  • iRODS state information
  • DSpace History subsystem (via standard RDF
    browsing tools)

14
iRODS Rule-based System
  • Quantify the management policies
  • Automate the application of the policies
  • Track the outcomes from application of the
    policies
  • First release of the software is this month

15
iRODS - Infrastructure Independence
  • Six logical name spaces required to manage
    preservation properties
  • Records
  • Persons
  • Storage resources
  • Rules
  • Micro-services
  • Persistent state information

16
Example Archivist Policies
  • Authenticity
  • Are required provenance metadata provided with
    record? - Submission requirement
  • Is the chain of custody properly documented?
  • - Management requirement
  • Integrity
  • Are the bits protected against natural disasters?
    - Management requirement for replication and
    distribution
  • Are the bits preserved without corruption? -
    Future assertion

17
Example Archivist Policies
  • Infrastructure independence
  • Management of preservation properties
    independently of choice of hardware and software
    infrastructure
  • Management policies are needed for assertions
    about the properties of the records (authenticity
    and integrity) and the properties of the
    preservation environment (infrastructure
    independence)

18
Example of Complete Process of Rule Derivation
from Preservation Criteria
  • Assessment Criteria
  • Integrity of records is preserved
  • Management policy
  • Integrity will be verified every 6 months
  • Preservation capabilities
  • Replication of records
  • Checksum on each record
  • Synchronization between replicas
  • Federation between archives

19
Rule-based Preservation Policies
  • Generated Rules
  • Event-condition-(set of micro-service or other
    rules)
  • Each micro-service corresponds to operations on a
    record at a remote storage location
  • Each micro-service has a recovery procedure to
    handle remote system failure or unavailability
  • Persistent state information is saved to track
    the outcome from applying the rule

20
Rule ExampleValidate Record Integrity
  • Check permissions (requires archivist or proxy)
  • Operations on specified record
  • Access remote site
  • Compute the checksum and compare with archived
    value
  • If checksum is not correct
  • Access a replica, compute checksum, and verify is
    correct
  • Replace bad replica with a good replica
  • Update audit list to track the replacement
  • Update persistent state to record date of
    checksum verification

21
Additional Implied Assessment Criteria
  • Are there any orphaned records present in the
    archive with no preservation metadata?
  • Are the replicas distributed across independent
    administrative domains on different types of
    storage systems?
  • Is the observed error rate a factor of four lower
    than the validation rate?
  • Have all records been validated within the
    required time period?

22
Self-consistency and Closure
  • For every required preservation attribute
    (authenticity and integrity) are their assessment
    criteria?
  • For every assessment criterion, does there exist
    preservation metadata?
  • Are the properties of the preservation
    environment also preserved?
Write a Comment
User Comments (0)
About PowerShow.com