Title: OMII distribution evaluation activity at KIAM1 and JINR2
1OMII distributionevaluation activityat KIAM1
and JINR2
- Viktor Pose2 et al.
- EGEE-3
- 18.04-22.04.2005
1Keldysh Institute for Applied Mathematics,
Russian Academy of Sciences 2Joint Institute for
Nuclear Research, Dubna, Russia
2Contents
- Evaluation team
- OMII functionality (based on our experience and
OMII User Guide) - General description
- Architecture
- Account management
- User operation
- Resource Allocation
- Data Staging
- Security
- Authorization
- Administration
- Installation
- Tests
- Job Service
- Data Service
- Performance and concurrency of dummy services
- Security
- Adding an application
- Adding a new service
3Evaluation team
- Keldysh Institute for Applied Mathematics,
Russian Academy of Sciences - P.Berezovsky
- E. Huhlaev
- V. Kovalenko
- D. Semyachkin
- Joint Institute for Nuclear Research, Dubna,
Russia - Y. Bugaenko
- V. Galaktionov
- N. Kutovskiy
- V.Pose
- I.Tkachev
4General description
- OMII distribution (OMII) is a middleware aimed at
enabling access to remote distributed resources
in an environment with service architecture - User operations of the two kind are primarily
supported - Job execution
- File transfer
- Job execution is regarded in OMII as an execution
of applications, which are preliminary installed
at some servers or clusters - Execution of programs submitted by users within
the job submission is impossible - Distributed environment based on OMII is highly
decentralized - All interactions occur straightly between the
client-server pair without participation of
intermediate servers, such as resource broker,
information service or replica catalog - OMII does not include any community services, and
is analogous to Globus Toolkit rather than
Workload Management System of EU DataGrid.
5Architecture
- OMII is centered around web services and grid
services standards - Web services provide server-client connectivity,
utilizing SOAP for communication - Web services are provided by Jakartas Apache
Axis and hosted by Jakartas Apache Tomcat web
server - OMII innovates by suggesting a conversation
mechanism for keeping state (context) over series
of interactions - Similar functionality is provided by WSRF
- This mechanism is used in particular for
authorization - OMII server computer
- Accomplishes hosting functions for services
- Serves as a gateway to a resource pool of
computing nodes - The resource pool is supposed to be managed by a
batch system - Interfaces to PBS and Condor are provided in form
of platform scripts
6Account management
- To join an OMII based Grid and be capable of
doing any operations the user needs - A certificate
- An account
- The way of account creation requires the
definition of a relation for each pair (user,
server) - This is not well scalable
- A user applies for an account at each OMII
Service site - User runs ogre_client open command to each OMII
Server site - Service provider grants the application (or
declines it) using a web-based tool - The scheme slightly improves by introduction of
an intermediary person a budget holder - Step 1. The budget holder applies for an account
- Budget holder runs ogre_client open command to
a OMII Server site - Service provider grants the application (or
declines it) using a web-based tool - After this step the new account can be used by
its owner only - Step 2. The budget holder gives users access to
the account - User sends his certificate to the budget holder
- Budget holder stores the users certificate in
the Java Keystore of his client - Budget holder enables a user to access an account
using the graphical interface of the ogre_client
browse command - The budget holder is responsible for paying for
usage of the account - Under the conditions of a Grid with many Resource
Centers and big virtual organizations
7User operation
- A users operations, in particular job execution
and file transfer, are carried out by means of
the OMII client part ogre_client - ogre_client is a script, which starts Java
program - A part of ogre_clients functionality is
provided to the user by interactive graphical
panels - A Java library is provided to run jobs from your
own applications - At one client computer there must be separate
client installation for each user - ogre_client configuration file explicitly points
to Java Keystore, where the users key is kept,
and to some other personal files - This is a fairly uncommon method of deployment
for a client - There is no single command for job execution, but
the procedure is divided into four steps - Resource allocation
- Uploading the input data
- Running the job
- Downloading the output data
- Such a separation
- Is partly a forced measure, as concerned resource
allocation - May be useful from the view of granularity
- Lays additional burden on the user
- Each step generates some intermediate files,
which should be passed to the following steps - Accounts.xml stores a users account
information - client.state stores state data related to a
users operations
8Resource Allocation
- During resource allocation an execution server is
selected - Two files are used by ogre_client tender
command Accounts.xml and Resources.xml - Accounts.xml contains all accounts of a user
- Resources.xml describes the resources requested
for the job(s) to be executed - Name of the application suite
- Performance of an executive computer
- Memory and storage volumes
- Processor time
- Time boundary of the allocation
- As a result, the user gets the list of service
providers, which could grant the requested
resources - Final decision is made interactively by the user
- OMII has no information service
- The process of getting allocations (tender) is
carried out by querying all the servers on which
the user has an account - This way is time and network bandwidth wasteful
in a grid with many users and RCs - Resource consumption in OMII is payable
- The budget holder is responsible for paying for
usage of the account - Price for uploaded or downloaded Byte
9Data staging
- User uploads the input file(s) to the Data
Service via ogre_client upload command - Currently the preferred option is put all input
files the job needs in one zip file - User submits job
- Input file is staged in from the Data Service to
the job workspace - an extra directory created
for each job - Input data files are moved into the working
directory of the job - a subdirectory of the
workspace directory - including unpacking any that are compressed
archives containing multiple inputs - If the job produces output files, they are
created in the working directory - Output files are copied from the working
directory into the specified positions in the
workspace - Including packing multiple outputs into
compressed archive files where necessary - Output file is staged out to the Data Service
- User downloads the output file(s) via
ogre_client download command
10Security
- OMII concerns two aspects of security
authentication and authorization - Authentication mechanism is based on X.509
certificates and public key technology - Current OMII distribution is configured for the
work with "temporary" certificates signed by
Certification Authority (CA) on OMII security
server - Work with non-OMII certificates is allowed
- There are no special instruments except standard
Java tools - In the future, it will be possible to set up a
grid where different client and service
certificates are signed by different CAs but this
has not been implemented at present. (OMII 1.2.0
User Guide) - OMII Extensions component (GridServIT)
- Initially supports message-level integrity based
on WS-Security and X.509 PKI - Provides authentication
- the identity of the sender of the message is
taken from the WS-Security SOAP header after
checking a messages integrity - Provides authorization
- Process-based access control (PBAC) useful for
enforcing business process - No proxy certificate support
- Unlike Grid Security Infrastructure OMII does not
support proxy credentials, that allow a
computation (e.g. a service) to delegate securely
user rights to another computation - This seriously limits the ability of secure
service-to-service interactions - According to OMII support - OMII is committed to
achieving interoperability with the Globus
toolkit and an interoperable authentication model
is under discussion - Java Keystore management
- Currently Keystore password and Private key
password are stored in clear text in a file - One feature of the keytool utility used to manage
the Java Keystore needs to be improved
11Authorization
- All services and applications are running on a
single system account - This makes the basic authorization mechanism at a
server side primitive - Authorization module - Process Based Access
Control (PBAC) - may be used (optional) for service development to
enforce a workflow (business process) - E.g. commercial service workflow payment -gt run
Job - Applicable in situations, where the sequence of
service operations is essential - In PBAC the term conversation is used to
represent an identity of a particular dialogue
between a client and a service provider - On a server side the PBAC records the state of
conversation as a set of operations that is
accessible - Possible authorizations are described by
- Identity of the user
- Conversation ID
- Operation name
- When a user calls a service, an operation will be
executed for a particular user in a particular
conversation only if an authorization matching
this specification is present - If a service developer wants to provide such an
enforcement, he must wrap each operation of his
service with PBAC API calls - These calls must check if an authorization is
present for the calling user and the conversation
ID - Finally the operation may open authorizations for
following operations, and close authorizations
for others
12Administration
- Administration in OMII refers to the
administration of a single server site - A web-based tool ra_admin may be used to
configure several parameters - Data service
- Limits on storage capacities
- Upload and download bandwidths (network provider
limits) - Costs per uploaded/downloaded Byte
- Name and capabilities of each machine in the
resource pool - Relative performance
- Memory
- Number of processors
- Machines can be added or deleted to/from the
resource pool - Application suites and the applications within
those suites - Both described by their URI
- An application suite is a collection of
applications normally used together - An administrator can specify which machines can
run which application suites - Work unit costs are set per application suite
- Apparently all that information is used
- At a tender step of a resource allocation
- For the costs calculation of a users job
13Installation
- OMII 1.0.0 and 1.1.1 client installation
- Easy and fast
- Successful on
- Windows XP (with flag CheckDiskSpace -gt
failonerrorfalse)small issue - SuSe Linux 9.0
- CERN Linux 7.3
- On a computer with multiple users one client
installation per user is needed - OMII 1.0.0 and 1.1.1 server installaton
- Successful only on SuSe Linux 9.0 (small issue)
- OMII 1.2.0 released on 20.03.2005
- Server and client support
- Redhat Enterprise Linux 3.0 ES/WS
- SuSE 9.0.
- Client installation successful on SuSe Linux 9.0
- Server not tested in JINR and KIAM
14 Setup for Job Service tests
- This setup was used for the tests in the next 2
slides - OMII 1.1.1 Server node
- P4 2.4GHz, 512MB RAM
- Preinstalled application GRIATestApp
- 2 client nodes, each having
- 10 OMII 1.1.1 client installations
- P4 3GHz, 1.5GB RAM
- Amount of server node CPU used to run the
application is negligible - Work.xml the line containing the -cputime30
setting was removed - Input file is the zipped test.txt file which
contains only one letter - Rough profiling between Java and Postgres CPU
consumption on the server node was made using ps
utility - Total CPU consumption on the server node was
measured via vmstat
15Job Service Concurrency Robustness
- Test Description
- Running 20 clients simultaneously
- Each client submits sequentially 20 jobs to the
Job Service 400 jobs in total - Duration of the test 1h 18min
- Results
- The server remained stable during and after this
test and all job submissions were successful - The Job service throughput was 5.1 jobs/minute
- Average server node CPU consumption 79
- The only issue that was noticed
- A few times the build in monitoring of the
running jobs failed with the message - Status Status is now submitted
(STATUS-SCRIPT-ERROR)
16Job Service Performance
- Test Description
- Different clients each using a separate account
or - Different clients using the same account (e.g.
one account per VO) - Each client submits 1 job using ogre_client run
command - Results
- Average CPU consumption at server node 79 - 84
for 20 clients - job submission throughput can not considerably
grow with a further increase of the number of
concurrent clients - maximal job submission rate to an OMII Job
service is about 6 jobs per minute for a 2.4GHz
P4 CPU server node - A client consumes in average 4.3s of a 3GHz P4
CPU per job submission - Part of server CPU consumed by PostgreSQL 10..20
- 23..26, the other part is consumed by Java
17Job Service Stability
- Test Description
- A user needs to submit multiple jobs to a batch
queue via one account - Account creation
- Resource allocation
- Uploading the input file (150 Bytes)
- The batch mode job submission command
ogre_client start is put in a cycle to submit
sequentially multiple jobs - Each job is executing the GRIATestAPP application
- ogre_client monitor command can be used to
monitor the status of submitted jobs - Server node P4 2.4GHz, 512MB RAM
- Client node P4 3GHz, 1.5GB RAM
- Results
- Successful execution until an error occurred near
the 180-th start - SEVERE Problem creating input stream for
reading SOAPMessage - After described failure, next starts of the
application from command line were successful - About 16s client execution time for 1 job
submission - This setup convenient to submit and monitor about
up to 10 - 30 jobs - Bulk job submission is not supported in OMII
18Monitoring job status
- OMIICLIENTgt ./ogre_client monitor
- OGRE Client
- Contacting http//omii01.jinr.ru18080/axis/servic
es/JobService2273 - Contacting http//omii01.jinr.ru18080/axis/servic
es/JobService2271 -
- http//jinr.ru/sjob
- URL http//omii01.jinr.ru18080/axis/service
s/JobService2273 - Status Status is now submitted (RUNNING)
- gt JOB_STATUS RUNNING
- gt Current status RUNNING
- gt
- gt Details
- gt GRIA Test App running
- gt Current time Mon Mar 7 134141
2005 -
- http//jinr.ru/sjob
- URL http//omii01.jinr.ru18080/axis/service
s/JobService2271 - Status Status is now output-staging-complete
(FINISHED) - gt JOB_STATUS FINISHED
Example A user submitted 2 jobs
19Data Service Reliability and Performance
- Test Description
- Single client runs 1000 sequential cycles of
upload/download of about 10MB big files to/from
the OMII Data Service - comparison of the uploaded and downloaded files
- P4 3GHz, 1.5GB RAM client and server nodes
- Results
- All uploads and downloads were successful
- One upload download cycle consumed about 2.9s
CPU time on OMII Server node - Average client execution times
- Upload 11.4s
- Overwriting 5.7s
- Download - 5.5s
- About 3.5s CPU consumption by client per upload
or download
20Data Service Concurrency
- Test Description
- Simultaneous upload or uploaddownloadcompare of
files to/from OMII Data Service with up to 5
parallel clients - Each client runs 10 sequential uploads or
uploaddownloadcompare cycles - Server node (OMII 1.0.0), P4 2.4GHz, 512MB RAM
- Client node with 5 OMII 1.0.0 client
installations, Celeron 1.3 GHz, 425MB RAM - Results
- All uploads and downloads were successful
- Client execution time grows approximately linear
with increasing number of clients (may be partly
influenced by slow client node)
21Performance of Dummy Services
- Test Description
- Estimate the overhead expenses of OMIIAXIS
infrastructure - Dummy services Non-PBAC Test Service and
ExampleGridServIT PBAC Service were tested by use
of the clients from OMII distribution - Client node
- OMII 1.0.0 - P4 3 GHz, 1GB RAM, SUSE 9.0
- Server nodes
- Server at OMII
- Server at KIAM OMII 1.0.0 - P4 3 GHz, 1GB RAM,
SUSE 9.0 - Time measurements were carried out by time
command - Results
- In the following we list response times of tested
services from single client - Server response time (server wall time network
delay) was estimated as (client wall time)
(client CPU time) - Response time for server at KIAM is less then
response time for server at OMII because network
delay for KIAM server is very small
non-PBAC Test Service
ExampleGridServIT PBAC Service
22Concurrency with dummy services
- Test description
- Dummy service ExampleGridServIT PBAC Service was
tested with regular OMII client - It was impossible to run client utilities
concurrently under a single user - A minor change of the utilities code allowed to
start in parallel any number of them - It was not possible to create sufficient load on
the server from single client host - Only one client node was available for this test
- Server wall time was much less than the client
CPU time - Client node
- OMII 1.0.0 - P4 3 GHz, 1GB RAM, SUSE 9.0
- Server nodes
- Server at OMII
- Server at KIAM OMII 1.0.0 - P4 3 GHz, 1GB RAM,
SUSE 9.0 - Results
- 100 parallel clients finished without any
mistakes - Client CPU time for all starts was approximately
the same - Total client wall time is equal to 100(single
client wall time)
23Security Tests
- Test OMII Services with Russian DataGrid CA user
certificates, accepted by LCG/EGEE communities - OMII Services cant be used with Russian DataGrid
CA user certificates - presumably there is no
support for certificates signed with a 4096 bit
CA PK - According to OMII support
- It is possible, that OMII will use other security
providers, e.g. IBM JSSE or Bouncy Castle in the
future - Its still not clear, whether they support
certificates signed with a 4096 bit CA public key
- Test authorization features
- Enable access of multiple users to one account -
OK - Account owner imports certificates of the
relevant users into the Java Keystore of his
client - Account owner enables access for each user
certificate manually via a graphical UI - User manually adds the account to the accounts
file (uses XML format) of his client - This was used in the Job Service tests
- A possible VO ltgt OMII account mapping could not
be effectively managed this way for big dynamic
VOs - One user accessing multiple accounts from the
same client installation - OK - Accounts are automatically added to a clients
accounts file during account creation - User chooses the account he wants to use during
resource allocation for the task with the
graphical UI of the ogre_client tender command
24Adding an application
- According to OMII User Guide to add a new
application one has to - Create an application startup wrapper script to
start the application - Simple applications can use the provided test
startup wrapper script - Optionally create additional wrapper scripts for
application-specific status monitoring and/or job
termination - Simple applications can use the provided test
status and job termination scripts - Deploy the application and wrapper script(s) on
all execution platform nodes assigned to that
application suite - Append application parameters to the job service
configuration so the job service can find and use
the new application - Add the application to a new or existing
application suite in the resource model, using
the resource model admin web interface - Using the provided
- Documentation
- Application wrapper and status scripts
- we successfully added simple applications
- This was easy done and no issues were noticed
25Adding a new service
- OMII Extensions component (GridServIT)
- Built on the native Apache Axis SOAP container
without changing it in any way - Provides a service context API for eScientists
wishing to deploy Grid Services - Services may use this API to retrieve contextual
information associated with a service - common data, such as the distinguished name of an
authenticated remote user - headers from the SOAP messages
- access to basic infrastructure security services
(the only one available at present is PBAC
authorization module) - The alternative to the OMII API-based approach
an extension of the container functionality - Seems to be essentially more productive for
enforcement of common grid policies - Will free a service developer from programming
grid-related codes - All OMII services are primarily web services
- Common Tomcat and Axis tools can be used for
service development and deployment - OMII suggests the GEMSS Transport and Messaging
framework as an invocation and messaging
framework - It enables client applications to make
invocations against message based services - The proposed method is low level though a
flexible one a client developer must write an
invocation manually, without a stub - OMII documentation describes the difficult way
for service and client creation with no use of
well known Axis tools, java2WSDL and WSDL2java
for example - A simple example service has been deployed
eventually - There was a problem with deployment, which was
caused by a small inaccuracy in the documentation
26Interoperability with WMS
- Interoperability with WMS was evaluated based on
a paper study. - Can we use the OMII client to interface to the
WMS? - Different job management interfaces and
architecture - WMS UI
- Designed to contact WMS Network Server and LB
- Functionally more reach than OMII UI
- Based on JDL
- OMII UI
- Designed to contact OMII Job Service, Data
Service, Resource Allocation Service - Aimed at submission of a job to a certain site
- No support for VO and VOMS
- No support for passing job executables along with
the job submission - Security
- No support for user proxy certificates and VOMS
in OMII - WMS uses information added by VOMS into the proxy
certificate - the VO of the user - Conclusion
- OMII UI cant be used to submit jobs to WMS
27Interoperability with WMS
- Is the 3-tier architecture OMII UI ? OMII Server
? WMS an effective way to provide OMII users with
the possibility to submit their jobs through WMS
to the underlying CEs? - Different job management interfaces and
architecture - Currently there are no ready to use means to make
the 3-tier architecture OMII UI ? OMII Server ?
WMS support a necessary amount of WMS and WMS UI
provided functionality - Security
- No support for user proxy certificates and VOMS
in OMII - Additional latency
- 3 tier architecture OMII Java UI ltgt OMII Java
Services ltgt WMS introduces additional latency - Conclusion
- OMII server cant be used to interface to the WMS
without additional development efforts
28Interoperability with WMS
- Can we use the WMS to submit to OMII servers?
- Different interfaces to computing resource and
different job management architecture - WMS submits jobs to CE via Condor-C (web service
interface) - OMII Job Service uses it's own web service
interface - Does OMII Job Service produce asynchronous job
status notifications like a CE? - Information services
- OMII has no means to publish CE and SE
information for OMII Job Service and OMII Data
Service to an Information System or WMS - Authorization
- No VOMS based authentication and authorization
support in OMII - VOMS based authentication and authorization
support has to be added to OMII to work effective
with big dynamic VOs and be compatible with WMS - Conclusion
- WMS cant be used to submit jobs to OMII Job
Service without additional development efforts
29Summary of tests
30OMII Support and Documentation
- OMII support was contacted to resolve several
issues and problems encountered - Was mostly operative
- The answers mostly came during the day or the
next day - A couple of problems were discussed and
developers made corresponding changes in the
documentation - The provided documentation clearly covers the
main topics, but at some points (e.g. bugdet
management, service creation) it is unclear and
unfinished
31Summary
- OMII has several interesting features and
abilities - Web services architecture
- Account management
- Resource consumption accounting
- Conversation mechanism and authorization module
PBAC - Process Based Access Control - Easy and compact installation (but restricted to
the certain OS) - OMII is oriented towards web services, rather
than Grid architecture - No community services
- Does not support Grid Security Infrastructure and
proxy credentials - No support for Virtual Organisations and VOMS
- Does not support the execution of users programs
- No interoperability with Globus Toolkit and WMS
- Management and administrative techniques are
intended to servicing individual servers, not VO
and resource infrastructure - Users operation needs improvements in the
implementation as well as an enhancement of
functionality, especially in case of larger grids - more powerful resource selection language
- account and resource allocation management