iRODS Prototype Update NCCS Advanced Technology Team - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

iRODS Prototype Update NCCS Advanced Technology Team

Description:

iRODS Prototype Update NCCS Advanced Technology Team – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 56
Provided by: dduf9
Category:

less

Transcript and Presenter's Notes

Title: iRODS Prototype Update NCCS Advanced Technology Team


1
iRODS Prototype UpdateNCCS Advanced Technology
Team
  • 16 March 2009

2
Change Log
Version Date Author Change
1.0 2 March 2009 Hoot Thompson
1.1 2 March 2009 Daniel Duffy Changed background NCCS architecture tie in and concept of operations.
1.2 3 March 2009 Hoot Thompson General Clean-Up
1.3 10 March 2009 Hoot Thompson Added security related information
1.4
3
Outline
  • What is iRODS?
  • iRODS Commands
  • Rules and Micro-services
  • NCCS Prototype
  • Prototype Tests
  • Web Browser and HDF5 Viewer
  • NCCS Architecture and Data Management
  • What Next
  • Backup Slides
  • Additional iRODS Information
  • Performance Testing

4
What Is iRODS
  • Integrated Rule-Oriented Data System
  • Data grid software system developed by the Data
    Intensive Cyber Environments (DICE) group
    (developers of the SRB, the Storage Resource
    Broker), and collaborators.
  • Or it is everything and/or nothing

5
Basic iRODS Components
iRODS Installation(s)
icommands
Federation
Metadata
Metadata
admin(s)
icat
icat
guis/apis
  • Collection(s)

Collection(s)
resource(s)
resource(s)
user(s)
6
icommands Unix Like
  • iinit Initialize - Store your password in a
    scrambled form for automatic use by other
    icommands.
  • iput Store a file
  • iget Get a file
  • imkdir Like mkdir, make an iRODS collection
    (similar to a directory or Windows folder)
  • ichmod Like chmod, allow (or later restrict)
    access to your data objects by other users.
  • icp Like cp or rcp, copy an iRODS data object
  • irm Like rm, remove an iRODS data object
  • ils Like ls, list iRODS data objects (files) and
    collections (directories)
  • ipwd Like pwd, print the iRODS current working
    directory
  • icd Like cd, change the iRODS current working
    directory
  • irepl Replicate data objects.
  • iexit Logout (use 'iexit full' to remove your
    scrambled password from the disk)
  • ipasswd Change your irods password.
  • ichksum Checksum one or more data-object or
    collection from iRODS space.
  • imv Moves/renames an irods data-object or
    collection.
  • iphymv Physically move files in iRODS to another
    storage resource.
  • ireg Register a file or a directory of files and
    subdirectory into iRODS.
  • irmtrash Remove one or more data-object or
    collection from a RODS trash bin.
  • irsync Synchronize the data between a local copy
    and the copy stored in iRODS or between two iRODS
    copies.

7
icommands - Metadata
  • imeta Add, remove, list, or query user-defined
    Attribute-Value-Unit triplets metadata
  • isysmeta Show or modify system metadata
  • iquest Query (pose a question to) the ICAT, via a
    SQL-like interface

8
icommands - Informational
  • ienv Show current iRODS environment
  • ilsresc List resources
  • iuserinfo List users
  • imiscsvrinfo Get basic server information test
    communication
  • irule Submit a user defined rule to be executed
    by an irods server.
  • iqstat Show pending iRODS rule executions.
  • iqdel Removes delayed rules from the queue.
  • iqmod Modifies delayed rules in the queue.

9
Rules
  • The Rule Engine is a critical and fundamental
    component of the iRODS system, and is involved in
    many iRODS operations.
  • The core set of rules are defined in the
    "core.irb" text file in the release.
  • The names that begin with "msi" in the rules are
    Micro-Service Interface routines. These are 'C'
    functions that the Rules call and that may then
    call other iRODS functions.
  • Rules format
  • actionDef condition workflow-chain
    recovery-chain
  • Example
  • acCreateUsermsiCreateUseracCreateDefaultCollec
    tionsmsiCommitmsiRollbackmsiRollbacknop

10
Micro-service
  • Small, well-defined procedures/functions that
    perform a certain task.
  • Developed and made available by system
    programmers and application programmers and
    compiled into the iRODS server code.
  • Users and administrators can chain these
    micro-services to implement a larger macro-level
    functionality (actions) that they want to use or
    provide for others.

11
Adding a Micro-service
  • Develop module collection of specialize
    micro-services
  • Conform to directory structure
  • Write micro-services C code (hdf5 example
    printout)
  • Enable module
  • Make module
  • Rebuild action tables

12
msiDataObjReplMicro-service Example
  • /
  • \fn msiDataObjRepl
  • \module core
  • \author Mike Wan
  • \date 2007
  • \brief replicate an existing data object
  • \paramin STR_MS_T or DataObjInp_MS_T
    dataObjName Path name of data object
  • \paramin STR_MS_T rsrcName optional
  • \paramout INT_MS_T status status of the
    operation
  • \DolVarDependence none
  • \DolVarModified none
  • \iCatAtrDependence none
  • \iCatAttrModified none
  • \sideeffect none
  • \return integer
  • \retval 0 on success
  • \bug no known bugs
  • /

13
iRODS Prototype
14
iput
iput
data
icat
Client
resource
metadata
Metadata
Data
/ltfilesystemgt
iput R ltresourcegt lt/path/filenamegt
15
iput With Replicate
iput
data
icat
Client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Rule added to core.irb
Data
data
16
ils Showing Multiple Copies
kirk_at_client1nccsgt ils -L /archivenccsZone/home/k
irk kirk 0 client1nccsResc
0 2009-02-27.1311 file_1
/tms/home/kirk/file_1 kirk 1
archivenccsResc 0
2009-02-27.1312 file_1
/home/archivenccs/iRODS/Vault/home/kirk/file_1
kirk 0 client1nccsResc
0 2009-02-27.1311 file_2
/tms/home/kirk/file_2 kirk 1
archivenccsResc 0
2009-02-27.1313 file_2
/home/archivenccs/iRODS/Vault/home/kirk/file_2
kirk 0 archivenccsResc
0 2009-02-27.1311 file_3
/home/archivenccs/iRODS/Vault/home/kirk/file_3
kirk 1 client1nccsResc
0 2009-02-27.1313 file_3
/tms/home/kirk/file_3 kirk 0
archivenccsResc 0
2009-02-27.1311 file_4
/home/archivenccs/iRODS/Vault/home/kirk/file_4
kirk 1 client1nccsResc
0 2009-02-27.1313 file_4
/tms/home/kirk/file_4
17
ireg
icat
client
resource
metadata
Metadata
Data
/ltfilesystemgt
ireg R ltresourcegt lt/path/filenamegt
lt/irods/full/pathgt
18
ireg With Replicate
icat
client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Data
data
Rule added to core.irb
19
ireg With Replicate Shared File System
icat
Client
Client N/Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
Data
data
20
iget
iget
data
icat
client
resource
metadata
Metadata
Data
/ltfilesystemgt
iget R ltresourcegt lt/path/filenamegt
21
iget Replication Number
icat
client
Resource 1
metadata
Metadata
Data
/ltfilesystemgt
metadata
Resource 2
data
iget -n
Data
22
isysmeta
hoot_at_leftknee src isysmeta -l ls
hdf5_test.h5 doing ls of /leftkneeZone/home/leftkn
ee/hdf5_test.h5 data_name hdf5_test.h5 data_id
10012 coll_id 10008 data_repl_num
0 data_version data_type_name
generic data_size 1782027 resc_group_name
resc_name leftkneeResc data_path
/home/hoot/irods/iRODS/Vault/home/leftknee/hdf5_te
st.h5 data_owner_name leftknee data_owner_zone
leftkneeZone data_repl_status 1 data_status
data_checksum data_expiry_ts (expire time)
None data_map_id 0 r_comment create_ts
01235592554 2009-02-25.150914 modify_ts
01235592554 2009-02-25.150914
23
imeta Attribute Value Units
hoot_at_leftknee src imeta ls -d
hdf5_test.h5 AVUs defined for dataObj
hdf5_test.h5 None hoot_at_leftknee src imeta add
-d hdf5_test.h5 length 10 meters hoot_at_leftknee
src imeta ls -d hdf5_test.h5 AVUs defined for
dataObj hdf5_test.h5 attribute length value
10 units meters hoot_at_leftknee src imeta add
-d hdf5_test.h5 weight 213 kilograms hoot_at_leftkne
e src imeta ls -d hdf5_test.h5 AVUs defined for
dataObj hdf5_test.h5 attribute length value
10 units meters ---- attribute weight value
213 units kilograms
24
iRODS Web Browser
25
HDFview iRODS
26
iRODS Explorer For Windows
27
Other iRODS Access Methods
  • FUSE
  • File system like interface
  • Tested caching and performance concerns
  • PRODS
  • PHP client API
  • Does not depend on any external library
  • Talks to iRODS server directly via sockets with
    native iRODS XML protocol
  • Jargon
  • Pure java API for developing programs with a data
    grid interface
  • Currently handles file I/O for local and
    SRB/iRODS file systems, as well as querying and
    modify SRB/iRODS metadata
  • Easily extensible to other file systems.
  • WebDAV
  • Access from a iPhone

28
Security
  • Default is single authentication user/password
  • Grid Security Infrastructure (GSI) option
  • Globus a prerequisite
  • Based on public key cryptography

29
Passwords
  • Challenge/response protocol using an MD5 hash
    confirms user has the correct password,
  • Routines are derived from the RSA Data Security,
    Inc. MD5 Message-Digest Algorithm
  • Password not sent on the network
  • iRODS user passwords stored in the iCAT database
    in a scrambled form
  • iinit stores the password on disk in a scrambled
    form
  • Avoids storing plain-text passwords in files
  • Warning with the source code, passwords can be
    descramble the passwords
  • Scrambling algorithm is iRODS-specific and is not
    high-grade encryption
  • Database system (PostgreSQL) passwords used to
    control access to the iCAT database
  • Stored in a server configuration file (by the
    install script) also in a scrambled form

30
Access Permissions - ichmod
  • Default file owner has full control (read,
    write or delete)
  • As owner, give access to other users or groups,
    either just read access, or read and write, or
    full ownership
  • If 'own' given to someone else, they can also
    give (and remove) access to others.
  • Remove access by changing the access to 'null'.
  • Multiple paths can be entered on the command
    line.
  • If the entered path is a collection, then the
    access permissions to that collection will be
    modified
  • Give write access to a user or group so they can
    store files into one of your collections. Access
    permissions on collections are not currently
    displayed via ils
  • As normally configured, all users can read all
    collections
  • Inherit/noinherit form sets or clears the
    inheritance attribute of one or more collections.
    When collections have this attribute set, new
    dataObjects and collections added to the
    collection inherit the access permisions (ACLs)
    of the collection. 'ils -A' displays ACLs and the
    inheritance status.

31
Group ichmod Example
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
hootarchivenccsZoneown Inheritance -
Disabled file1 ACL - hootarchivenccsZon
eown file2 ACL - hootarchivenccsZon
eown file3 ACL - hootarchivenccsZon
eown
ichmod read blue file1 ichmod write red
file2 ichmod own rodsadmin file3
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
hootarchivenccsZoneown Inheritance -
Disabled file1 ACL - bluearchivenccsZon
eread object hootarchivenccsZoneown
file2 ACL - hootarchivenccsZoneown
redarchivenccsZonemodify object file3
ACL - hootarchivenccsZoneown
rodsadminarchivenccsZoneown
32
Collection ichmod Example
ichmod own rodsadmin /archivenccsZone/home/hoot
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
georgearchivenccsZoneown hootarchivenccsZone
own rodsBootarchivenccsZoneown
Inheritance - Disabled file1 ACL -
bluearchivenccsZoneread object
hootarchivenccsZoneown file2 ACL -
hootarchivenccsZoneown redarchivenccsZonemod
ify object file3 ACL -
hootarchivenccsZoneown rodsadminarchivenccsZo
neown
33
Inheritance ichmod Example
ichmod inherit /archivenccsZone/home/hoot
archivenccs_at_archivenccs/testgt ils
-A /archivenccsZone/home/hoot ACL -
georgearchivenccsZoneown hootarchivenccsZone
own rodsBootarchivenccsZoneown
Inheritance - Enabled file1 ACL -
bluearchivenccsZoneread object
hootarchivenccsZoneown file2 ACL -
hootarchivenccsZoneown redarchivenccsZonemod
ify object file3 ACL -
hootarchivenccsZoneown rodsadminarchivenccsZo
neown
34
NCCS Representative Architecture
Planned for FY09
Future Plans
Existing
NCCS LAN (1 GbE and 10 GbE)
Login
Data Portal
Existing Discover 65 TF
Analysis
FY09 Upgrade 40 TF
Future Upgrades TBD
Data Gateways
Data Management
Viz
Direct Connect GPFS Nodes
ARCHIVE
GPFS I/O Nodes
GPFS I/O Nodes
GPFS I/O Nodes
Disk 300 TB
GPFS Disk Subsystems 1.3 PB
Tape 8 PB
Management Servers
License Servers
GPFS Management
Other Services
PBS Servers
Internal Services
35
Representative Architecture
The analysis uses also require very fast read
access to this data from the NCCS analysis
platform.
The modelers require very fast I/O when
generating data on the NCCS computational systems.
The generators of the data also want a easy
method for sharing data.
Analysis Service
Compute Service
Data Portal
FAST
FAST
SLOW
ARCHIVE
GPFS Storage Cluster
SLOW
SLOW
The generators of the data also want to store the
files into the archive for long term stewardship
and retrieval (if necessary).
36
Competing Requirements
  • Capacity and Throughput
  • IPCC, as an example, requires a large amount of
    data to be kept on disk.
  • The modelers generating the data also need a fast
    file system to write and subsequently read that
    data.
  • The analysis users need a fast file system from
    which to access the large amount of data.
  • All of this lends itself nicely to a global
    parallel file system (GPFS).
  • How do we include data management in this model?

37
Data Management Concept of OperationsArchive
Access
iRODS iCAT
iRODS Resource
iRODS Clients
ARCHIVE
DISCOVER
A BIT FASTER iput, iget
SLOW (10 MB/sec) NFS, cp, scp NOT AS SLOW Bbftp
FAST
GPFS Storage Cluster
  • Pros
  • Simple, parallel transfers
  • High throughput for large files (100 MB/sec)
  • Metadata captured
  • Cons
  • No file system level interface (Is this a con?)
  • Cannot open a file from the archive (Again, con?)

38
Data Management Concept of OperationsData
Security and Access
  • Assume we have a well defined set of data
    security and access levels (examples for
    pedagogical purposes only)
  • Level 0 User only
  • Level 1 User and Project
  • Level 2 User, Project, and Service
  • Level 3 Publicly Accessible
  • Users define their data security and access
    levels using the appropriate process
  • When data is put into iRODS by the user under a
    specific project, it is labeled with the
    appropriate access level
  • All NCCS iRDOS enabled services must then check
    the access level to see if the service can access
    the data
  • In addition, the user must grant access to the
    data to the service

39
Data Management Concept of Operationsfor IPCC
Data
Step 1 Modelers generate large amounts of data
and store into GPFS (very fast).
Step 2 Modelers register the data sets into
iRODS.
Analysis users still have very fast (GPFS) file
system access to the data.
Analysis Service
Compute Service
Data Portal
iRODS iCAT
FAST
FAST
SLOW
ARCHIVE
GPFS Storage Cluster
IPCC data is presented to the data portal either
by NFS or iRODS interface.
SLOW
SLOW
Step 3 Automatic rules kick in to do the
following A Automatically extract and publish
metadata into a database. B. Make a copy of the
file into the NCCS archive.
40
Data Management Concept of OperationsMore
Implementation Details
Services on the data portal would have interfaces
into iRODS. Could even have a local iRODS
resource for caching data.
iRODS iCAT
Data Portal
iRODS Clients
iRODS Resource
iRODS Clients
iRODS Resource
Archive accessible via iRODS still use DMF.
iRODS Resource Nodes
ARCHIVE
DISCOVER
Dedicated nodes would be a combination of GFPS
clients and iRODS resources.
FAST
GFPS Clients
GPFS Storage Cluster
41
Pros and Cons
  • Pros
  • Very easy for users they can register whatever
    they want.
  • NCCS specific micro-services can be set up to
    automatically copy files to the archive
  • Maintains the fast access to the data for both
    modelers and analysis users
  • Multi-stream throughput seems to work very well.
  • Cons
  • No file system level access to iRODS (could be a
    pro)
  • No link between data in GPFS and iRODS
  • Data changed with iRODS or GPFS will not be
    reflected in the other
  • Required to resynchronize the data every so often
  • Data within iRODS not accessible via a file
    system interface.

42
Data Portal Services Architecture
Connectivity to the Goddard DISC and DISC SW.
Interfaces to ESG and PCMDI for model data (IPCC
AR5).
NASA
ESG
PCMDI
Other
Data Portal
Sufficient compute capability for some amount of
analysis.
Local Disk
NFS
iRODS
GPFS MC
Local disk will allow for relatively small amount
of data to be cached in the portal.
Reach back capability into the much larger disk
environment within the NCCS GPFS and Archive.
Users will not have to move or copy data in order
to make it available to the portal services.
43
Concerns
  • Integration with ESG
  • Data base design, implementation and number
  • iRODS security model versus NASA/NCCS policies
  • Simple single authentication
  • GSI Grid Security Infrastructure
  • Difficulty of developing module/micro-service
  • Try get best copy as an example
  • iput and iget bandwidth discrepancy with delay
    injected remains unresolved
  • Continuing to explore this in the prototype
  • Little to no services built on top of metadata
  • Expansive, detailed metadata will have to be
    scripted

44
Back-up Slides
45
Installation
  • Automated install script
  • Set of preinstall queries
  • Downloads and installs all components
  • postgres
  • Can use Oracle, etc.
  • unixodbc

46
icommands - Administration
  • iadmin Administration commands add/remove/modify
    users, resources, etc. Commands are
  • lu nameZone (list user info details if
    name entered)
  • lt name subname (list token info)
  • lr name (list resource info)
  • ls name (list directory subdirs and files)
  • lz name (list zone info)
  • lg name (list group info (user member list))
  • lgd name (list group details)
  • lrg name (list resource group info)
  • lf DataId (list file details DataId is the
    number (from ls))
  • mkuser NameZone Type DN (make user)
  • moduser NameZone type zone DN comment
    info password newValue
  • rmuser NameZone (remove user, where userName
    name_at_departmentzone)
  • mkdir Name username (make directory(collection)
    )
  • rmdir Name (remove directory)
  • mkresc Name Type Class Host Path (make Resource)
  • modresc Name type, class, host, path, comment,
    info, freespace Value (mod Resc)
  • rmresc Name (remove resource)
  • mkzone Name Type(remote) Connection-info
    Comment (make zone)

47
Example icommands
kirk_at_client1nccsgt ienv NOTICE Release Version
rods2.0.1, API Version d NOTICE
irodsHostarchivenccs NOTICE irodsPort1247 NOTIC
E irodsDefResourcearchivenccsResc NOTICE
irodsHome/archivenccsZone/home/kirk NOTICE
irodsCwd/archivenccsZone/home/kirk NOTICE
irodsUserNamekirk NOTICE irodsZonearchivenccsZo
ne
kirk_at_client1nccsgt ils /archivenccsZone/home/kirk
blah foo
kirk_at_client1nccsgt ilsresc archivenccsResc client
1nccsResc
48
Performance Assessment Summary
  • Local testing of 1Gigabit showed wire speeds for
    iputs and igets
  • Artificial distance testing of 1Gigabit (with two
    different delay simulators) yielded wire speed on
    iputs but significantly less on iget (10 of
    iputs)
  • Repeated dialogue with iRODS personnel but
    discrepancy remains unresolved
  • Actual distance testing with ARSC showed
    acceptable results giving 110 msec rtt and OC-3
    pipe

49
Example Rule core.irb
  • 6) acPostProcForFilePathReg - Rule for post
    processing the registration
  • of a physical file path (e.g. - ireg command).
  • Currently, three post processing functions can
    be used individually or
  • in sequence by these rules.
  • msiExtractNaraMetadata - extract and
    register metadata from the just
  • upload NARA files.
  • msiSysReplDataObj(replResc, allFlag) - can
    be used to replicate a copy of
  • the file just uploaded or copied data object
    to the specified replResc
  • The allFlag is only meaningful if the
    replResc is a resource group. In
  • this case, setting allFlag to "all" means a
    copy will be made in all
  • the resources in the resource group. A
    "null" input means a single
  • will be made in one of the resource in the
    resource group
  • msiSysChksumDataObj - checksum the just
    uploaded or copied data object.
  • acPostProcForPutmsiSysChksumDataObjmsiSysRep
    lDataObj(demoResc8,all)nopnop
  • acPostProcForPutmsiSysReplDataObj(demoResc8,al
    l)nop
  • acPostProcForPutmsiSysChksumDataObjnop
  • acPostProcForPutdelayExec(ltAgtlt/Agt,msiSysReplDa
    taObj(demoResc8,all),nop)nop

rulegen is a parser that takes rules written in a
nicer language to the cryptic one needed by irule
and core.irb. The input files for the rulgen is
recommended to be .r (.r extensions) and the
output created by the rulegen is inthe form of
.ir (.ir extensions). The grammar for the
langauge of the input files are given at the end
of this note.
50
Local 1 Gigabit iputs
51
Local 1 Gigabit igets
52
Local 10 Gigabit iputs
53
Local 10 Gigabit igets
54
GSFC/to/ASRC iputs
55
ASRC/to/GSFC igets
Write a Comment
User Comments (0)
About PowerShow.com