Title: iRODS Prototype Update NCCS Advanced Technology Team
1iRODS Prototype UpdateNCCS Advanced Technology
Team
2Change Log
Version Date Author Change
1.0 2 March 2009 Hoot Thompson
1.1 2 March 2009 Daniel Duffy Changed background NCCS architecture tie in and concept of operations.
1.2 3 March 2009 Hoot Thompson General Clean-Up
1.3 10 March 2009 Hoot Thompson Added security related information
1.4
3Outline
- What is iRODS?
- iRODS Commands
- Rules and Micro-services
- NCCS Prototype
- Prototype Tests
- Web Browser and HDF5 Viewer
- NCCS Architecture and Data Management
- What Next
- Backup Slides
- Additional iRODS Information
- Performance Testing
4What Is iRODS
- Integrated Rule-Oriented Data System
- Data grid software system developed by the Data
Intensive Cyber Environments (DICE) group
(developers of the SRB, the Storage Resource
Broker), and collaborators. - Or it is everything and/or nothing
5Basic iRODS Components
iRODS Installation(s)
icommands
Federation
Metadata
Metadata
admin(s)
icat
icat
guis/apis
Collection(s)
resource(s)
resource(s)
user(s)
6icommands Unix Like
- iinit Initialize - Store your password in a
scrambled form for automatic use by other
icommands. - iput Store a file
- iget Get a file
- imkdir Like mkdir, make an iRODS collection
(similar to a directory or Windows folder) - ichmod Like chmod, allow (or later restrict)
access to your data objects by other users. - icp Like cp or rcp, copy an iRODS data object
- irm Like rm, remove an iRODS data object
- ils Like ls, list iRODS data objects (files) and
collections (directories) - ipwd Like pwd, print the iRODS current working
directory - icd Like cd, change the iRODS current working
directory - irepl Replicate data objects.
- iexit Logout (use 'iexit full' to remove your
scrambled password from the disk) - ipasswd Change your irods password.
- ichksum Checksum one or more data-object or
collection from iRODS space. - imv Moves/renames an irods data-object or
collection. - iphymv Physically move files in iRODS to another
storage resource. - ireg Register a file or a directory of files and
subdirectory into iRODS. - irmtrash Remove one or more data-object or
collection from a RODS trash bin. - irsync Synchronize the data between a local copy
and the copy stored in iRODS or between two iRODS
copies.
7icommands - Metadata
- imeta Add, remove, list, or query user-defined
Attribute-Value-Unit triplets metadata - isysmeta Show or modify system metadata
- iquest Query (pose a question to) the ICAT, via a
SQL-like interface
8icommands - Informational
- ienv Show current iRODS environment
- ilsresc List resources
- iuserinfo List users
- imiscsvrinfo Get basic server information test
communication - irule Submit a user defined rule to be executed
by an irods server. - iqstat Show pending iRODS rule executions.
- iqdel Removes delayed rules from the queue.
- iqmod Modifies delayed rules in the queue.
9Rules
- The Rule Engine is a critical and fundamental
component of the iRODS system, and is involved in
many iRODS operations. - The core set of rules are defined in the
"core.irb" text file in the release. - The names that begin with "msi" in the rules are
Micro-Service Interface routines. These are 'C'
functions that the Rules call and that may then
call other iRODS functions. - Rules format
- actionDef condition workflow-chain
recovery-chain - Example
- acCreateUsermsiCreateUseracCreateDefaultCollec
tionsmsiCommitmsiRollbackmsiRollbacknop
10Micro-service
- Small, well-defined procedures/functions that
perform a certain task. - Developed and made available by system
programmers and application programmers and
compiled into the iRODS server code. - Users and administrators can chain these
micro-services to implement a larger macro-level
functionality (actions) that they want to use or
provide for others.
11Adding a Micro-service
- Develop module collection of specialize
micro-services - Conform to directory structure
- Write micro-services C code (hdf5 example
printout) - Enable module
- Make module
- Rebuild action tables
12msiDataObjReplMicro-service Example
- /
- \fn msiDataObjRepl
- \module core
- \author Mike Wan
- \date 2007
- \brief replicate an existing data object
- \paramin STR_MS_T or DataObjInp_MS_T
dataObjName Path name of data object - \paramin STR_MS_T rsrcName optional
- \paramout INT_MS_T status status of the
operation - \DolVarDependence none
- \DolVarModified none
- \iCatAtrDependence none
- \iCatAttrModified none
- \sideeffect none
- \return integer
- \retval 0 on success
- \bug no known bugs
- /
13iRODS Prototype
14iput
iput
data
icat
Client
resource
metadata
Metadata
Data
/ltfilesystemgt
iput R ltresourcegt lt/path/filenamegt
15iput With Replicate
iput
data
icat
Client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Rule added to core.irb
Data
data
16ils Showing Multiple Copies
kirk_at_client1nccsgt ils -L /archivenccsZone/home/k
irk kirk 0 client1nccsResc
0 2009-02-27.1311 file_1
/tms/home/kirk/file_1 kirk 1
archivenccsResc 0
2009-02-27.1312 file_1
/home/archivenccs/iRODS/Vault/home/kirk/file_1
kirk 0 client1nccsResc
0 2009-02-27.1311 file_2
/tms/home/kirk/file_2 kirk 1
archivenccsResc 0
2009-02-27.1313 file_2
/home/archivenccs/iRODS/Vault/home/kirk/file_2
kirk 0 archivenccsResc
0 2009-02-27.1311 file_3
/home/archivenccs/iRODS/Vault/home/kirk/file_3
kirk 1 client1nccsResc
0 2009-02-27.1313 file_3
/tms/home/kirk/file_3 kirk 0
archivenccsResc 0
2009-02-27.1311 file_4
/home/archivenccs/iRODS/Vault/home/kirk/file_4
kirk 1 client1nccsResc
0 2009-02-27.1313 file_4
/tms/home/kirk/file_4
17ireg
icat
client
resource
metadata
Metadata
Data
/ltfilesystemgt
ireg R ltresourcegt lt/path/filenamegt
lt/irods/full/pathgt
18ireg With Replicate
icat
client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Data
data
Rule added to core.irb
19ireg With Replicate Shared File System
icat
Client
Client N/Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Data
data
20iget
iget
data
icat
client
resource
metadata
Metadata
Data
/ltfilesystemgt
iget R ltresourcegt lt/path/filenamegt
21iget Replication Number
icat
client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
data
iget -n
Data
22isysmeta
hoot_at_leftknee src isysmeta -l ls
hdf5_test.h5 doing ls of /leftkneeZone/home/leftkn
ee/hdf5_test.h5 data_name hdf5_test.h5 data_id
10012 coll_id 10008 data_repl_num
0 data_version data_type_name
generic data_size 1782027 resc_group_name
resc_name leftkneeResc data_path
/home/hoot/irods/iRODS/Vault/home/leftknee/hdf5_te
st.h5 data_owner_name leftknee data_owner_zone
leftkneeZone data_repl_status 1 data_status
data_checksum data_expiry_ts (expire time)
None data_map_id 0 r_comment create_ts
01235592554 2009-02-25.150914 modify_ts
01235592554 2009-02-25.150914
23imeta Attribute Value Units
hoot_at_leftknee src imeta ls -d
hdf5_test.h5 AVUs defined for dataObj
hdf5_test.h5 None hoot_at_leftknee src imeta add
-d hdf5_test.h5 length 10 meters hoot_at_leftknee
src imeta ls -d hdf5_test.h5 AVUs defined for
dataObj hdf5_test.h5 attribute length value
10 units meters hoot_at_leftknee src imeta add
-d hdf5_test.h5 weight 213 kilograms hoot_at_leftkne
e src imeta ls -d hdf5_test.h5 AVUs defined for
dataObj hdf5_test.h5 attribute length value
10 units meters ---- attribute weight value
213 units kilograms
24iRODS Web Browser
25HDFview iRODS
26iRODS Explorer For Windows
27Other iRODS Access Methods
- FUSE
- File system like interface
- Tested caching and performance concerns
- PRODS
- PHP client API
- Does not depend on any external library
- Talks to iRODS server directly via sockets with
native iRODS XML protocol - Jargon
- Pure java API for developing programs with a data
grid interface - Currently handles file I/O for local and
SRB/iRODS file systems, as well as querying and
modify SRB/iRODS metadata - Easily extensible to other file systems.
- WebDAV
- Access from a iPhone
28Security
- Default is single authentication user/password
- Grid Security Infrastructure (GSI) option
- Globus a prerequisite
- Based on public key cryptography
29Passwords
- Challenge/response protocol using an MD5 hash
confirms user has the correct password, - Routines are derived from the RSA Data Security,
Inc. MD5 Message-Digest Algorithm - Password not sent on the network
- iRODS user passwords stored in the iCAT database
in a scrambled form - iinit stores the password on disk in a scrambled
form - Avoids storing plain-text passwords in files
- Warning with the source code, passwords can be
descramble the passwords - Scrambling algorithm is iRODS-specific and is not
high-grade encryption - Database system (PostgreSQL) passwords used to
control access to the iCAT database - Stored in a server configuration file (by the
install script) also in a scrambled form
30Access Permissions - ichmod
- Default file owner has full control (read,
write or delete) - As owner, give access to other users or groups,
either just read access, or read and write, or
full ownership - If 'own' given to someone else, they can also
give (and remove) access to others. - Remove access by changing the access to 'null'.
- Multiple paths can be entered on the command
line. - If the entered path is a collection, then the
access permissions to that collection will be
modified - Give write access to a user or group so they can
store files into one of your collections. Access
permissions on collections are not currently
displayed via ils - As normally configured, all users can read all
collections - Inherit/noinherit form sets or clears the
inheritance attribute of one or more collections.
When collections have this attribute set, new
dataObjects and collections added to the
collection inherit the access permisions (ACLs)
of the collection. 'ils -A' displays ACLs and the
inheritance status.
31Group ichmod Example
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
hootarchivenccsZoneown Inheritance -
Disabled file1 ACL - hootarchivenccsZon
eown file2 ACL - hootarchivenccsZon
eown file3 ACL - hootarchivenccsZon
eown
ichmod read blue file1 ichmod write red
file2 ichmod own rodsadmin file3
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
hootarchivenccsZoneown Inheritance -
Disabled file1 ACL - bluearchivenccsZon
eread object hootarchivenccsZoneown
file2 ACL - hootarchivenccsZoneown
redarchivenccsZonemodify object file3
ACL - hootarchivenccsZoneown
rodsadminarchivenccsZoneown
32Collection ichmod Example
ichmod own rodsadmin /archivenccsZone/home/hoot
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
georgearchivenccsZoneown hootarchivenccsZone
own rodsBootarchivenccsZoneown
Inheritance - Disabled file1 ACL -
bluearchivenccsZoneread object
hootarchivenccsZoneown file2 ACL -
hootarchivenccsZoneown redarchivenccsZonemod
ify object file3 ACL -
hootarchivenccsZoneown rodsadminarchivenccsZo
neown
33Inheritance ichmod Example
ichmod inherit /archivenccsZone/home/hoot
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
georgearchivenccsZoneown hootarchivenccsZone
own rodsBootarchivenccsZoneown
Inheritance - Enabled file1 ACL -
bluearchivenccsZoneread object
hootarchivenccsZoneown file2 ACL -
hootarchivenccsZoneown redarchivenccsZonemod
ify object file3 ACL -
hootarchivenccsZoneown rodsadminarchivenccsZo
neown
34NCCS Representative Architecture
Planned for FY09
Future Plans
Existing
NCCS LAN (1 GbE and 10 GbE)
Login
Data Portal
Existing Discover 65 TF
Analysis
FY09 Upgrade 40 TF
Future Upgrades TBD
Data Gateways
Data Management
Viz
Direct Connect GPFS Nodes
ARCHIVE
GPFS I/O Nodes
GPFS I/O Nodes
GPFS I/O Nodes
Disk 300 TB
GPFS Disk Subsystems 1.3 PB
Tape 8 PB
Management Servers
License Servers
GPFS Management
Other Services
PBS Servers
Internal Services
35Representative Architecture
The analysis uses also require very fast read
access to this data from the NCCS analysis
platform.
The modelers require very fast I/O when
generating data on the NCCS computational systems.
The generators of the data also want a easy
method for sharing data.
Analysis Service
Compute Service
Data Portal
FAST
FAST
SLOW
ARCHIVE
GPFS Storage Cluster
SLOW
SLOW
The generators of the data also want to store the
files into the archive for long term stewardship
and retrieval (if necessary).
36Competing Requirements
- Capacity and Throughput
- IPCC, as an example, requires a large amount of
data to be kept on disk. - The modelers generating the data also need a fast
file system to write and subsequently read that
data. - The analysis users need a fast file system from
which to access the large amount of data. - All of this lends itself nicely to a global
parallel file system (GPFS). - How do we include data management in this model?
37Data Management Concept of OperationsArchive
Access
iRODS iCAT
iRODS Resource
iRODS Clients
ARCHIVE
DISCOVER
A BIT FASTER iput, iget
SLOW (10 MB/sec) NFS, cp, scp NOT AS SLOW Bbftp
FAST
GPFS Storage Cluster
- Pros
- Simple, parallel transfers
- High throughput for large files (100 MB/sec)
- Metadata captured
- Cons
- No file system level interface (Is this a con?)
- Cannot open a file from the archive (Again, con?)
38Data Management Concept of OperationsData
Security and Access
- Assume we have a well defined set of data
security and access levels (examples for
pedagogical purposes only) - Level 0 User only
- Level 1 User and Project
- Level 2 User, Project, and Service
- Level 3 Publicly Accessible
- Users define their data security and access
levels using the appropriate process - When data is put into iRODS by the user under a
specific project, it is labeled with the
appropriate access level - All NCCS iRDOS enabled services must then check
the access level to see if the service can access
the data - In addition, the user must grant access to the
data to the service
39Data Management Concept of Operationsfor IPCC
Data
Step 1 Modelers generate large amounts of data
and store into GPFS (very fast).
Step 2 Modelers register the data sets into
iRODS.
Analysis users still have very fast (GPFS) file
system access to the data.
Analysis Service
Compute Service
Data Portal
iRODS iCAT
FAST
FAST
SLOW
ARCHIVE
GPFS Storage Cluster
IPCC data is presented to the data portal either
by NFS or iRODS interface.
SLOW
SLOW
Step 3 Automatic rules kick in to do the
following A Automatically extract and publish
metadata into a database. B. Make a copy of the
file into the NCCS archive.
40Data Management Concept of OperationsMore
Implementation Details
Services on the data portal would have interfaces
into iRODS. Could even have a local iRODS
resource for caching data.
iRODS iCAT
Data Portal
iRODS Clients
iRODS Resource
iRODS Clients
iRODS Resource
Archive accessible via iRODS still use DMF.
iRODS Resource Nodes
ARCHIVE
DISCOVER
Dedicated nodes would be a combination of GFPS
clients and iRODS resources.
FAST
GFPS Clients
GPFS Storage Cluster
41Pros and Cons
- Pros
- Very easy for users they can register whatever
they want. - NCCS specific micro-services can be set up to
automatically copy files to the archive - Maintains the fast access to the data for both
modelers and analysis users - Multi-stream throughput seems to work very well.
- Cons
- No file system level access to iRODS (could be a
pro) - No link between data in GPFS and iRODS
- Data changed with iRODS or GPFS will not be
reflected in the other - Required to resynchronize the data every so often
- Data within iRODS not accessible via a file
system interface.
42Data Portal Services Architecture
Connectivity to the Goddard DISC and DISC SW.
Interfaces to ESG and PCMDI for model data (IPCC
AR5).
NASA
ESG
PCMDI
Other
Data Portal
Sufficient compute capability for some amount of
analysis.
Local Disk
NFS
iRODS
GPFS MC
Local disk will allow for relatively small amount
of data to be cached in the portal.
Reach back capability into the much larger disk
environment within the NCCS GPFS and Archive.
Users will not have to move or copy data in order
to make it available to the portal services.
43Concerns
- Integration with ESG
- Data base design, implementation and number
- iRODS security model versus NASA/NCCS policies
- Simple single authentication
- GSI Grid Security Infrastructure
- Difficulty of developing module/micro-service
- Try get best copy as an example
- iput and iget bandwidth discrepancy with delay
injected remains unresolved - Continuing to explore this in the prototype
- Little to no services built on top of metadata
- Expansive, detailed metadata will have to be
scripted
44Back-up Slides
45Installation
- Automated install script
- Set of preinstall queries
- Downloads and installs all components
- postgres
- Can use Oracle, etc.
- unixodbc
46icommands - Administration
- iadmin Administration commands add/remove/modify
users, resources, etc. Commands are - lu nameZone (list user info details if
name entered) - lt name subname (list token info)
- lr name (list resource info)
- ls name (list directory subdirs and files)
- lz name (list zone info)
- lg name (list group info (user member list))
- lgd name (list group details)
- lrg name (list resource group info)
- lf DataId (list file details DataId is the
number (from ls)) - mkuser NameZone Type DN (make user)
- moduser NameZone type zone DN comment
info password newValue - rmuser NameZone (remove user, where userName
name_at_departmentzone) - mkdir Name username (make directory(collection)
) - rmdir Name (remove directory)
- mkresc Name Type Class Host Path (make Resource)
- modresc Name type, class, host, path, comment,
info, freespace Value (mod Resc) - rmresc Name (remove resource)
- mkzone Name Type(remote) Connection-info
Comment (make zone)
47Example icommands
kirk_at_client1nccsgt ienv NOTICE Release Version
rods2.0.1, API Version d NOTICE
irodsHostarchivenccs NOTICE irodsPort1247 NOTIC
E irodsDefResourcearchivenccsResc NOTICE
irodsHome/archivenccsZone/home/kirk NOTICE
irodsCwd/archivenccsZone/home/kirk NOTICE
irodsUserNamekirk NOTICE irodsZonearchivenccsZo
ne
kirk_at_client1nccsgt ils /archivenccsZone/home/kirk
blah foo
kirk_at_client1nccsgt ilsresc archivenccsResc client
1nccsResc
48Performance Assessment Summary
- Local testing of 1Gigabit showed wire speeds for
iputs and igets - Artificial distance testing of 1Gigabit (with two
different delay simulators) yielded wire speed on
iputs but significantly less on iget (10 of
iputs) - Repeated dialogue with iRODS personnel but
discrepancy remains unresolved - Actual distance testing with ARSC showed
acceptable results giving 110 msec rtt and OC-3
pipe
49Example Rule core.irb
- 6) acPostProcForFilePathReg - Rule for post
processing the registration - of a physical file path (e.g. - ireg command).
-
- Currently, three post processing functions can
be used individually or - in sequence by these rules.
- msiExtractNaraMetadata - extract and
register metadata from the just - upload NARA files.
- msiSysReplDataObj(replResc, allFlag) - can
be used to replicate a copy of - the file just uploaded or copied data object
to the specified replResc - The allFlag is only meaningful if the
replResc is a resource group. In - this case, setting allFlag to "all" means a
copy will be made in all - the resources in the resource group. A
"null" input means a single - will be made in one of the resource in the
resource group -
- msiSysChksumDataObj - checksum the just
uploaded or copied data object. - acPostProcForPutmsiSysChksumDataObjmsiSysRep
lDataObj(demoResc8,all)nopnop - acPostProcForPutmsiSysReplDataObj(demoResc8,al
l)nop - acPostProcForPutmsiSysChksumDataObjnop
- acPostProcForPutdelayExec(ltAgtlt/Agt,msiSysReplDa
taObj(demoResc8,all),nop)nop
rulegen is a parser that takes rules written in a
nicer language to the cryptic one needed by irule
and core.irb. The input files for the rulgen is
recommended to be .r (.r extensions) and the
output created by the rulegen is inthe form of
.ir (.ir extensions). The grammar for the
langauge of the input files are given at the end
of this note.
50Local 1 Gigabit iputs
51Local 1 Gigabit igets
52Local 10 Gigabit iputs
53Local 10 Gigabit igets
54GSFC/to/ASRC iputs
55ASRC/to/GSFC igets