An Overview of iRODS Integrated Rule-Oriented Data System - PowerPoint PPT Presentation

About This Presentation
Title:

An Overview of iRODS Integrated Rule-Oriented Data System

Description:

Based on experience with SRB - Distributed Data Management System. Global Logical Name space - UNIX ... Implementing all these features becomes unmanageable ... – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 23
Provided by: osgdocdbO
Category:

less

Transcript and Presenter's Notes

Title: An Overview of iRODS Integrated Rule-Oriented Data System


1
An Overview of iRODSIntegrated Rule-Oriented
Data System
Michael Wan mwan_at_sdsc.edu http//irods.sdsc.edu/

2
Motivation for iRods
  • Based on experience with SRB - Distributed Data
    Management System
  • Global Logical Name space - UNIX like directories
    and files
  • Single Global User Name Space - Single sign-on
  • Federated middleware system
  • Client/server model
  • Federation of resource servers with uniform
    interfaces
  • Robust access control
  • MCAT Metadata catalog
  • SRB used by many projects and many different
    requirements and configurations

3
BIRN Biomedical Information Research Network
4
NOAO Data Flow
5
Archive Service Architecture
Remote Institute Site
UK eScience Archive process Data Path
Local Storage
Filer
Central Cache Site
RAL
Site WAN
Firewall
1
JANET WAN
ads0sb01.cc.rl.ac.uk
Local machines
Central SRB Server
2
Local SRB Server
Tape Traffic
Sphymove in to container
SRB-ADS Server
ADS Tape Resource
3
disk
disk
Sreplcont
ADS SRB Disk Cache Resource
disk
Local Vault
4
Central cache Vault
Firewall
Firewall
  • Archive Submission Interface
  • Data Ingestion of collection hierarchy into SRB
  • Uses Java jargon API interface (equivalent of
    Sput b)
  • Ingested to /bbsrc/institute/scratch/project/year
    /user/dateandtime
  • At end of ingestion data logically moved using
    Smv to/bbsrc/institute/local-archive/project/yea
    r/user/dateandtime

1
  • Scheduled transfer to ADS resource
  • Implemented via CRON job using Sreplcont command
    which is driven by central SRB Server
  • Entire container replicated using Sreplcont
    command
  • Logical Structure preserved as /bbsrc/institute/re
    mote-archive/project/year/user/dateandtime

3
  • Scheduled transfer to Central SRB Server (Driven
    from Central SRB Server)
  • Smkcont command used to create container on
    central SRB Server
  • Data moved from Site SRB to container on central
    SRB Server using Sphymove
  • Upon data transfer completion archived data is
    logically move with Smv to /bbsrc/institute/remote
    -archive/project/year/user/dateandtime

2
  • Synchronization of container to tape resource and
    removal of original container from Central SRB
    Server
  • Ssyncont d a command used, allowing for a
    family of containers

4
6
SRB BaBar architecture.
2 Zones (SLAC Lyon)
SRB
CC-IN2P3 (Lyon)
SLAC (Stanford, CA)
7
Motivation for iRods (cont)
  • Implementing all these features becomes
    unmanageable
  • Sput -fprabvsmMkKV -c container -D
    dataType -n replNum -N numThreads -S
    resourceName -P pathName -R retry_count
    localFileNamelocalDirectory ... TargetName
  • Need a more flexible way to configure the system
  • More power and flexibility for users
  • User defined workflow to be executed on the
    server
  • SRB code requires major refactoring
  • Rewrite make it open source

8
iRods features
  • Most SRB data grid features
  • Global logical name space, global user name space
  • Fine grain access control
  • Federated resources heterogeneous
  • Zone federation across organizations and admin
    domains
  • Data replication and synchronization
  • Parallel I/O
  • User defined metadata

9
iRods Features (cont)
  • Improvements
  • Total rewrite from scratch
  • New, more flexible and efficient protocol
  • Client/server, server/server
  • 2 modes
  • native (binary) more efficient
  • XML easier for developer of other languages
    php and java
  • Reduce number msg exchange
  • Put/get/replicate/copy of small files
  • SRB 3 msg, create, write, close
  • iRods one msg (data included in the request)
  • Reduce number of tables in the Metadata catalog
  • Reduce the number of joints
  • SRB over 100 tables
  • iRods a few large tables
  • Small files upload/download a factor of 3-4
    improvement
  • Restart capability restart file

10
iRods Rules and Workflow System Target
apllications
  • Target applications include
  • Data grids for sharing data
  • Distributed workflow.
  • Persistent archival, data preservation
  • Real-time sensor data collections
  • Large scale data analysis

11
iRods rule and workflow system
  • Two basic levels
  • System Level used by Sys Admin
  • Automatic execution of data management policies
  • Data Integrity
  • Validation of checksums
  • Replication and synchronization of replicas
  • Data distribution and archival
  • Automatic caching (staging)
  • Replication of data to remote sites
  • Migration to archival Resource
  • Purging Replica (cached copies)

12
iRods rule and workflow system (cont)
  • System Level
  • Other data management policies
  • Data ingestion - pre-processing, post-processing
  • Resource selection for upload
  • Copy selection for download.
  • Data retention and deletion policy
  • Access controls foreign zone user, public user
  • Generation of Archival Packages
  • metadata,
  • data bundle zip, tar
  • Other Administrative management policies
  • Data transport tuning - parallel I/O, number of
    streams.
  • Audit trails

13
iRods rule and workflow system (cont)
  • User Level Workflow System
  • Execution of User designed workflow.
  • Request server to perform a series of
    micro-services with a single call
  • micro-services are predefined functions which can
    be called by the workflow scripting language
  • Most iRods APIs have been converted to
    micro-services
  • Depends on the user community for contribution to
    the micro-service library.

14
iRODS - integrated Rule-Oriented Data System
Client Interface
Admin Interface
Rule Invoker
Resources
Metadata Modifier Module
Config Modifier Module
Rule Modifier Module
Service Manager
Resource-based Services
Rule
Consistency Check Module
Consistency Check Module
Consistency Check Module
Engine
Micro Service Modules
Current State
Confs
Metadata-based Services
Rule Base
Metadata Persistent Repository
Micro Service Modules
15
Rules and WorkFlow implementation
  • Two interfaces to the Rule engines
  • Logic programming interface
  • Cryptic
  • Used mostly for system level rules
  • Scripting language interface
  • Programming language like
  • Support condition (if/else) and loops (while)
  • Internally translated to logic programming rules.

16
Rule - Logic programming interface
  • Rule composed of four parts
  • Name condition micro-service set recovery
    set
  • Postprocessing rule example Files replication
    acPostProcForPut objPath like
    /tempZone/home/rods/nvo/ msiSysReplDataObj(nvoR
    eplResc,null) nop
  • Preprocessing rule example Files staging
  • acPreprocForDataObjOpen objPath like
    /tempZone/home/rods/birn/
  • msiStageDataObj(demoResc8)nopnop

17
Rule Scripting Language interface
  • Easier to use. Mostly for user level workflow
  • Work in progress
  • Example 1
  • replFileSet(condition,resourceName)
  • acGetIcatResults("replicate", condition,
    result) / queries iCAT for dataNames the met
    the condition /
  • foreach (result) / for each tuple in the
    result /
  • acGetDataName(result,dataName) /
    get the dataName from the result uple /
  • msiDataObjRepl(dataName,
    resourceName, stat1)
  • writeLine(stdout,"Replication
    failed for dataName with stat1")
  • / denotes recovery
    operation. In this case, an error message is
    written /
  • writeLine(stdout,"Replicated dataName
    to resource resourceName with status stat2")
  • writeLine(stdout,"Replication Finished
    Successfully for condition")
  • Condition COLL_NAME '/tempZone/home/rods
  • or
  • Condition DATA_TYPE 'DICOM'

18
Rule Scripting Language interfaceExample
2apiTestWorkflow (InFile, OutFile1,
OutFule2) msiDataObjOpen(InFile,S_FD)
msiDataObjCreate(OutFile1,"null",D_FD)
msiDataObjLseek(S_FD,10,SEEK_SET,Stat1)
msiDataObjRead(S_FD,10000,R_BUF)
msiDataObjWrite(D_FD,R_BUF,W_LEN)
msiDataObjClose(S_FD,Stat2)
msiDataObjClose(D_FD,Stat3)
msiDataObjCopy(OutFile1,OutFile2,null,Stat4)
delay ("ltPLUSETgt1mlt/PLUSETgt")
msiDataObjRepl(OutFile2,demoResc8,Stat5)
msiDataObjUnlink(OutFile1,Stat6)
writeParams(stdout,"R_BUF,W_LEN")
19
Rules and micro-services implemented
  • Over 20 System level rules
  • Administrative
  • Storage Resource selection
  • Data pre-processing
  • Data post-processing
  • Data deletion
  • Parallel I/O
  • Over 20 User level micro-services
  • Operations on data checksum, replicate, open,
    read, write
  • Metadata extraction

20
Rule and Workflow system
  • rule exec daemon
  • Execute rules, workflows and micro-services in
    the background
  • The delay function
  • causes the rule execution env to be checkpointed
    and saved
  • Job submission through making an entry in the Job
    table in DB
  • Rule exec daemon checks Job table for job to
    execute
  • Time of execution
  • Delayed by certain time
  • At certain time
  • Frequency
  • iqstat command check status
  • iqdel command delete a job from queue
  • Job scheduling and remote execution future
    work.

21
iRods Status
  • Version 0.5 released Dec 21, 2006
  • Version 0.9 released May 30, 2007
  • Contains sufficient features to be deployed as a
    Data Grid
  • 90,000 lines of C code.
  • Server, client C lib and iCommands
  • Version 1.0 scheduled for fall, 2007
  • Web interface php/java script
  • Java classes Jargon
  • Oracle iCat
  • Zone federation
  • Open source - BSD license

22
More Information
  • Michael Wan
  • mwan_at_sdsc.edu
  • http//irods.sdsc.edu
Write a Comment
User Comments (0)
About PowerShow.com