From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids

Description:

Title: Managing Distributed Data through Shared Collections Last modified by: Reagan Moore Document presentation format: On-screen Show Company: SDSC – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 29
Provided by: wikiIrods
Category:

less

Transcript and Presenter's Notes

Title: From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids


1
From SRB to IRODS Policy Virtualization using
Rule-Based Data Grids
Reagan W. Moore Wayne Schroeder Arcot
Rajasekar Mike Wan San Diego Supercomputer
Center moore_at_sdsc.edu http//irods.sdsc.edu http
//www.sdsc.edu/srb/
2
Data Grid Evolution
  • Data grids
  • Infrastructure independence
  • Data sharing through data and trust
    virtualization
  • SRB - Storage Resource Broker
  • Rule-based data grids
  • Automation of management policies Management
    virtualization
  • Open source software
  • iRODS - integrated Rule-Oriented Data System

3
Data Management Applications
  • Data grids
  • Share data - organize distributed data as a
    collection
  • Digital libraries
  • Publish data - support browsing and discovery
  • Persistent archives
  • Preserve data - manage technology evolution
  • Real-time sensor systems
  • Federate sensor data - integrate across sensor
    streams
  • Workflow systems
  • Analyze data - integrate client- server-side
    workflows

4
Generic Infrastructure
  • Data grids organize distributed data into shared
    collections
  • Persistent name spaces for files, users, storage
  • Collection attributes
  • Provenance, descriptive, system metadata
  • Data grids manage heterogeneous storage systems
  • Standard operations across file systems, tape
    archives, object ring buffers
  • Enable technology evolution
  • At the point in time when new technology is
    available, both the old and new systems can be
    integrated

5
Using a Data Grid in Abstract
Data Grid
  • User asks for data from the data grid

6
Using a Data Grid - Details
  • User asks for data
  • Data request goes to iRODS Server
  • Server looks up information in catalog
  • Catalog tells which iRODS server has data
  • 1st server asks 2nd for data
  • The 2nd iRODS server applies rules

7
Extremely Successful
  • Storage Resource Broker (SRB) manages 2 PBs of
    data in internationally shared collections
  • Data collections for NSF, NARA, NASA, DOE, DOD,
    NIH, LC, NHPRC, IMLS APAC, UK e-Science, IN2P3,
    KEK,
  • Astronomy Data grid
  • Bio-informatics Digital library
  • Earth Sciences Data grid
  • Ecology Collection
  • Education Persistent archive
  • Engineering Digital library
  • Environmental science Data grid
  • High energy physics Data grid
  • Humanities Data Grid
  • Medical community Digital library
  • Oceanography Real time sensor data, persistent
    archive
  • Seismology Digital library, real-time sensor
    data
  • Goal has been generic infrastructure for
    distributed data

8
(No Transcript)
9
BaBar High-Energy Physics
  • Stanford Linear Accelerator
  • IN2P3
  • Lyon, France
  • Rome, Italy
  • San Diego
  • RAL, UK
  • A functioning international Data Grid for
    high-energy physics

Manchester-SDSC mirror
Moved over 300 TBs of data Increasing to 5 TBs
per day
10
Requirements Driving Evolution
  • Observe that as the size of the shared
    collections grow, the administrative tasks can
    become onerous.
  • Data grids provide mechanisms to manage recovery
    from all errors that occur in the distributed
    environment
  • Need to minimize labor support through automation
    of administrative functions
  • File ingestion tasks
  • Verification of desired collection properties
  • Integrity checks and replica management

11
Requirements Driving Evolution
  • Observe that each community has unique management
    policies
  • User administration
  • File retention deletion
  • Time-dependent access controls
  • Data distribution and replication
  • File update (versions, backups)
  • Descriptive metadata

12
Requirements Driving Evolution
  • Socialization of collections
  • The creators of the collection have specific
    properties that they assert the collection will
    possess
  • Completeness
  • Authoritative sources
  • Authenticity
  • The users of the collection have their own
    criteria for the properties they expect
  • Socialization is the mapping from creator
    assertions to user expectations

13
Data Grid Mechanisms
  • Essential components needed for synergism
    implemented in SRB
  • Infrastructure independence
  • Data and trust virtualization
  • Components needed for specific management
    policies and processes implemented in iRODS
  • Map policies to rules that control all processes
  • Map processes to standard micro-services

14
Data Management
iRODS - integrated Rule-Oriented Data System
15
Rules
  • Rule classes
  • System enforced rules
  • Administrator controlled rules
  • User defined rules
  • Rule execution
  • Atomic rules - executed on each operation invoked
    by a client
  • Deferred rules - executed at a future time
  • Periodic rules - executed to validate assessment
    criteria and enforce desired properties
    (integrity)

16
iRODS Rule Syntax
  • Event Condition Action-set Recovery-set
  • Event - triggered by operation or queued rule
  • Condition - composed of tests on any attributes
    in
  • the persistent state information
  • Action-set - composed from both micro-services
  • and rules
  • Recovery-set - used to ensure transaction
    semantics
  • and consistent state information
  • Executed by a rule engine installed at each
    storage location - server side workflows

17
Micro-Services
  • Challenge is that storage systems do not provide
    desired processes
  • Have minimal set of standard operations that
    are performed at the storage system
  • Have actions required by clients such as
    replication, metadata extraction
  • Create standard micro-services that aggregate
    storage operations into modules that can be used
    to implement desired processes.

18
Data Virtualization
Access Interface
Map from the actions requested by the access
method to a standard set of micro-services.
The standard micro-services are mapped to the
operations supported by the storage system
Standard Micro-services
Data Grid
Standard Operations
Storage Protocol
Storage System
19
integrated Rule-Oriented Data System
Client Interface
Admin Interface
Rule Invoker
Rule Modifier Module
Config Modifier Module
Metadata Modifier Module
Rule Base
Current State
Consistency Check Module
Consistency Check Module
Confs
Resources
Metadata-based Services
Resource-based Services
Metadata Persistent Repository
Micro Service Modules
Micro Service Modules
20
Distributed Management System
Data Transport
Metadata Catalog
Rule Engine
Persistent State information
Virtualization
Policy Management
Execution Engine
Execution Control
Server Side Workflow
Messaging System
Scheduling
21
Micro-service Classes
  • Test
  • System
  • Workflow control
  • Client
  • iCAT catalog
  • User level invoked by irule
  • Image manipulation

22
Digital Preservation
  • Preservation community is defining the rules need
    to assert trustworthiness of a digital repository
  • RLG/NARA - Trustworthy Repositories Audit
    Certification Criteria and Checklist.
  • http//wiki.digitalrepositoryauditandcertification
    .org/pub/Main/ReferenceInputDocuments/trac.pdf
  • Defined 105 rules that are being implemented in
    iRODS

23
RLG/NARA Assessment
  • Example TRAC assessment criteria

90 Verify descriptive metadata and source against SIP template and set SIP compliance flag
91 Verify descriptive metadata against semantic term list
92 Verify status of metadata catalog backup (create a snapshot of metadata catalog)
93 Verify consistency of preservation metadata after hardware change or error
24
Classes of Assessment Criteria
  • Collection properties
  • List properties of associated name spaces
  • Verify properties
  • Compare properties with assertions
  • Collection operations
  • Transform file formats
  • Migrate data
  • Generate audit trails
  • Structured information
  • Parse audit trails to generate compliance reports
  • Apply templates to extract information
  • Apply templates to format state information

25
iRODS Development
  • NSF - SDCI grant Adaptive Middleware for
    Community Shared Collections
  • iRODS development, SRB maintenance
  • NARA - Transcontinental Persistent Archive
    Prototype
  • Trusted repository assessment criteria
  • NSF - Ocean Research Interactive Observatory
    Network (ORION)
  • Real-time sensor data stream management
  • NSF - Temporal Dynamics of Learning Center data
    grid
  • Management of Institution Research Board approval

26
iRODS Development Status
  • Current release is version 0.9.2
  • June 2007
  • Production release will be version 1.0
  • Fall quarter 2007
  • International collaborations
  • SHAMAN - University of Liverpool
  • Sustaining Heritage Access through Multivalent
    ArchiviNg
  • UK e-Science data grid
  • IN2P3 in Lyon, France
  • DSpace policy management

27
Planned Development
  • GSI support
  • Time-limited sessions via a one-way hash
    authentication
  • Python Client library
  • GUI Browser (AJAX in development)
  • Driver for HPSS (in development)
  • Driver for SAM-QFS
  • Porting to additional versions of Unix/Linux
  • Porting to Windows
  • Support for MySQL as the metadata catalog
  • API support packages based on existing mounted
    collection driver
  • MCAT to ICAT migration tools
  • Extensible Metadata including Databases Access
    Interface
  • Zones/Federation
  • Auditing - mechanisms to record and track iRODS
    persistent state changes

28
For More Information(iRODS Tutorial on Thursday)
  • Reagan W. Moore
  • San Diego Supercomputer Center
  • moore_at_sdsc.edu
  • http//www.sdsc.edu/srb/
  • http//irods.sdsc.edu/
Write a Comment
User Comments (0)
About PowerShow.com