Title: Intro to arc middleware
1Introduction to ARC Middleware ISSGC09, Sophia
Antipolis, Nice, France
- Ivan Degtyarenko and Michael Gindonis
- CSC IT Center for Science, Espoo, Finland
- July 11th, 2009
2Todays session
What is it about?
After a quick introduction, you will familiarize
yourselves with ARC middleware with practical
examples. By this point you have already covered
grid middleware basics, X509, certificates,
proxies, virtual organizations, etc. so lets
dive in!
3ARC Tutorial timetable for this morning
4Short introductionA Hello Grid job with ARC
grid-proxy-init ngsub -f hello.xrsl ngstat
-a ngget hello
generate proxy submit monitor fetch the results
- hello.sh
- !/bin/sh
- echo Hello Grid!
hello.xrsl (executablehello.sh) (jobnamehello
) (stdouthello.out) (stderrhello.err) (gmloggri
dlog) (cputime10) (memory200) (disk1)
5Steps to start running on Grid
- get an account for a system with a Grid User
Interface installed (or install it on your own
PC) - request a certificate from a Certificate
Authority (CA) - install the certificate into /.globus/
- join a VO
- log in to the Grid (create a proxy)
- write a job description in a file
- check available resources (optional)
- submit the job
- monitor the progress of the job
- fetch the results
once
every session
6Privacy
Note! When working on the Grid, you must accept
that some information about your jobs and your
Grid identity may be made public, for example via
monitoring tools i.e. your name /
affiliation IP address of your client
computer job names and duration
runtime environment other
information Fortunately, for today you are
relatively anonymous /CIT/OGILDA/OUPersonal
Certificate/LSophia Antipolis/CNISSGCXX
7Security Policies
- policies vary in different grids and VOs
- you will need to accept these terms to use these
resource - Since you are in the Gilda VO you have already
accepted its policy - You will need to accept the M-grid Acceptable Use
Policy since some resources used in this tutorial
are part of M-grid
8The NorduGrid collaboration
- a community around the open source ARC Grid
middleware
- national Grids (e.g. M-grid, SweGrid, NorGrid),
users also outside the Nordic countries - real users, real applications
- implemented a production Grid system working non
stop since May 2002 - open for anyone to participate
9ARC Middleware
- ARC middleware (Advanced Resource Connector)
- open source out-of-the-box Grid solution software
which enables production quality computational
and data Grids - Easily Installable/Buildable for a variety of
distributions - non-intrusive server installation
- Supports a many common LRMS (Batch Systems)
- Grid Engine, PBS/torque, Platform LSF
- builds upon standard Open Source solutions such
as OpenLDAP, OpenSSL, SASL and Globus Toolkit - adds services not provided by Globus such as
scheduling - extends or completely replaces some Globus
components
10ARC Middleware (cont.)
- provides a reliable implementation of the
fundamental Grid services, such as information
services, resource discovery and monitoring, job
submission and management, brokering and data
management and resource management - integrates computing resources and storage
elements via a secure Grid layer - provides a light-weight standalone client, the
User Interface, which allows to submit, manage
and monitor jobs on the Grid, move data around
and query recourse info - UI built-in broker allows to select the best
resource for a job - Grid job requirements are expressed in extended
Resource Specification Language (xRSL)
11ARC Middleware Architecture
12The not so short introduction Installing the
ARC client
- required to submit jobs to NorduGrid
- download from http//ftp.nordugrid.org/download/
- binaries for various Linux distributions, source
code also available - the easiest way to install the client is to use
the standalone version - uncompress in a directory (no root privileges
required) tar zxvf nordugrid-standalone-ltlatest
gt.i386.tgz - run the environment setup script cd
nordugrid-standalone-ltlatestgt . ./setup.sh - RPM packages are recommended for multi-user
installations
13Requesting and Installing the grid Certificate
- create a certificate request
- grid-cert-request -int
- generates the .globus subdirectory with a key
(userkey.pem) and the request (usercert_request.pe
m) - identity string e.g. /OGrid/ONorduGrid/OUbccs.
uib.no/CNPer Hansen - remember to select a good passphrase and keep the
key secret! - send the file /.globus/usercert_request.pem to a
Certification Authority (CA) - see the instructions at your local site / country
which CA to contact - wait for an answer from the CA
- signed certificate returned by the Certificate
Authority should be saved as file
.globus/usercert.pem
14Logging in to the Grid
- "Log in" grid-proxy-init
- the command does not actually log in anywhere,
but decrypts the private key and uses it to
create a time-limited proxy - the proxy is used for authenticating to the
resources - "Log out" grid-proxy-destroy
- destroys the proxy
- "whoami" grid-proxy-info
- Shows information about the validity of the proxy
- subject /OGrid/ONorduGrid/OUcsc.fi/CNMichae
l Gindonis/CN413289378 - issuer /OGrid/ONorduGrid/OUcsc.fi/CNMichae
l Gindonis - identity /OGrid/ONorduGrid/OUcsc.fi/CNMichae
l Gindonis - type Proxy draft (pre-RFC) compliant
impersonation proxy - strength 512 bits
- path /tmp/x509up_u7060
- timeleft 115939
15Writing a job description file
- Resource Specification Language (RSL) files are
used to specify job requirements and parameters
for submission - NorduGrid uses an extended language (xRSL) based
on the Globus RSL - similar to scripts for local batch systems, but
include some additional attributes - job name
- executable location and parameters
- location of input and output files of the job
- architecture, memory, disk and CPU time
requirements - runtime environment requirements
16xRSL example
- hellogrid.sh
- !/bin/sh echo Hello Grid!
- hellogrid.xrsl
- (executablehellogrid.sh) (jobnamehellogrid)
(stdouthello.out) (stderrhello.err) (gmlogg
ridlog) (cputime10) (memory200) (disk1)
17Submitting the job
- submit the job
- ngsub -d 1 -f hellogrid.xrsl
- a job id is returned
- gt Job submitted with jobid gsiftp//ametisti.gri
d. helsinki.fi2811/jobs/455611239779372141331307
18ARC Grid Monitor
- shows currently connected resources
- almost all elements "clickable"
- browse queues and job states by cluster
- list jobs belonging to a certain user
- no authentication, anyone can browse the info
- privacy issues
19Monitoring the Job
- Query the status using the command line
- ngstat hellogrid
- gt Job gsiftp//ametisti.grid.helsinki.fi2811/
jobs/455611239779372141331307 Jobname
hellogrid Status INLRMSQ - Most common status values are ACCEPTED,
PREPARING, INLRMSQ, INLRMSR, FINISHING,
FINISHED - Or use the Grid Monitor
20Fetching the results
- print the job output
- ngcat hellogrid
- shows the standard output of the job
- this can be done also during the job is running
- download the result files
- ngget hellogrid
- gt ngget downloading files to
/home/ajt/455611239779372141331307 ngget
download successful - deleting job from
gatekeeper.
21Using a storage element
- Storage Elements are disk servers accessible via
the Grid - can be used to store job output while user is
logged out and client machine disconnected from
the Grid - allows to store input files close to the cluster
where theprogram is executed, on a high
bandwidth network - files can be local and remote in the same job
- (inputFiles("input1". "/home/user/myexperiment"
("input2", "gsiftp//se.example.com/files/data")) - (outputFiles("output", "gsiftp//se.example.com/
mydir/result1")("prog.out", "gsiftp//se.example.
com/mydir/stdout")) - (stdout"prog.out")
22Runtime environments
- software packages which are preinstalled on a
computing resource and made available through
Grid - just send the data and/or parameters to be
processed - useful if there are many users of the same
software or if the same program is used
frequently - allows local platform specific optimizations
- For a specific CPU or Parallel Environment
- Perhaps in the near future GPUs, CUDA
- required runtime environments can be specified in
the job description file, for example(runtimeenv
ironmentAPPS/GRAPH/POVRAY-3.6) - Runtime Environment Registry
- http//www.csc.fi/grid/rer/
23ARC / NorduGrid / M-grid references
- NorduGrid (resource monitor, presentations,
tutorials, docs, ) - http//nordugrid.org/
- ARC middleware
- http//nordugrid.org/middleware
- User guide http//www.nordugrid.org/documents/ui
.pdf - user support mailing list nordugrid-support at
nordugrid.org - M-grid (Finnish National Grid)
- http//www.csc.fi/english/research/Computing_servi
ces/grid_environments/mgrid - https//extras.csc.fi/mgrid/
- support email at CSC grid-support at csc.fi
- regular ARC training by CSC http//www.csc.fi/en
glish/csc/courses
24Do I need to change my application to use ARC?
- three different approaches
- using the application as is grid middleware will
move the executable and the data to the target
system - library dependencies often need to be resolved by
linking statically or packing them to go with the
application - installing the application on the target system
and using it via the Grid interface - batch processing type applications normally work
without changes, interactive applications are
more difficult - with ARC middleware this is facilitated by
runtime environments (RE) - modifying the application to fully exploit a
distributed environment - using ARC libraries
- distributing over a large geographical area is
not practical unless the computation can be split
to independent parts
25Real life applications
- it's common to send several smaller jobs to the
Grid to solve a larger problem - parallel MPI jobs to a single cluster are
supported (if correct runtime environment
installed), but no MPI between clusters - splitting the job to suitable parts and gathering
the parts together is left to the user - more error prone environment than traditional
local systems gt error checking and recovery
important - fault reporting and debugging has room for
improvements
26Real life applications
- Size your job to best exploit the grid
- group many short jobs into one to avoid
submission overhead - If possible break up larger or longer jobs into
independent parts - If your job must run for a long time, checkpoint
your results so that your calcuation can be
resumed, no resource will stay up indefinitely - M-grid is ideally suited to jobs of length 1 hour
to 1 day. - Use file caching if it is available
- Eliminate unnecessarily file transfers (load on
network) - Save time needed to stage files
- Save disk space on the cluster front-ends
27Further development of ARC middleware
- Stated goal not to undermine existing
functionality and capabilities available in
pre-ARC components (current stable version) - Two SVN branches
- ARC0 (version 0.6.5, 0.8rc)
- Pre-existing production components (Pre-KnowARC
project) - Backported features from KnowARC
- Nordic DataGrid Facility provides support and
backports features from the KnowARC project into
the current stable releases of ARC - ARC1 (0.9.xxx)
- Next generation components developed by the
KnowARC project - More information at www.ndgf.org and
www.knowarc.eu
28What is new
- Service Oriented Architecture
- Modular structure
- Self-sufficient core components
- Interoperability built on standard
- User and developer friendly
- Business friendly open source
- License Apache 2.0
- Portable runs on almost all Linux variants,
- Solaris, porting to Windows and Mac OS in
progress - Aiming at integration into Fedora
- Debian and Ubuntu
11/18/2009
www.knowarc.eu
28
29ARC WS-based components
- Internal structure of ARC components
11/18/2009
www.knowarc.eu
29
30Key Feature - New ARC client
- Relies on dedicated library
- Implemented in C
- Python and Java bindings
- Allows easy development of application-specific
clients - Implements a user Grid toolbox
- Handling of user host credentials
- computing resource discovery information
retrieval - matchmaking brokering job submission
- input/output data handling
- The new library and arc commands can handle
glite-CREAM and UNICORE - Windows and Mac OS client
- GUI user interface, just delivered !
11/18/2009
www.knowarc.eu
30
31Key Feature - HED
- HED The Hosting Environment Daemon
- Container for all the server-side functional
components - Main functions
- Route messages between the services and the
outside world - Provide inter service communication
- Provides a basic security infrastructure
- Consists of pluggable modules
- Light-weight (no Apache, no Axis)
11/18/2009
www.knowarc.eu
31
32Key Service A-Rex
- ARC Resource-coupled Execution Service
- Provides Execution Management capability
- The Grid Manager from ARC Classic as core
- Extended with WS interface implementing Basic
Execution Service (BES) - Accepts Job Submission Description Language
(JSDL) - Information and resource discovery GLUE 2
schema - Support for wide range of Local Resource
Management Systems - Torque, PBS/OpenPBS, SGE,
- LoadLeveler, LSF, Condor and SLURM
- Released in ARC 0.8, available at
http//wiki.nordugrid.org/index.php/ARC_v0.8 -
11/18/2009
www.knowarc.eu
32
33Key Service New Storage
- Distributed by Design storage system
- Global namespace
- Supports collections and subcollections to any
depth - A-Hash a replicated database to store metadata
- Librarian handles
- Metadata and hierarchy of collections and files
- The location of replicas
- Health data of the shepherd services
- Bartender - high-level interface for the users an
for other services - Shepherd manages storage services, and provides
a simple interface for storing files on storage
nodes
11/18/2009
www.knowarc.eu
33
34Welcome to ARC
Lets begin
Off to the PC classroom! (unless the coffee is
ready)
35Abstracting the middleware
- http//technical.eu-egee.org/index.php?id290
- Expand the functionality of the grid
infrastructure for users, - Reduce duplicated development when porting
applications, and - Speeds the porting of new application to the
grid. - GridWay Metascheduler (http//www.gridway.org/)
- The GridWay Metascheduler performs job execution
management and resource brokering, allowing
unattended, reliable, and efficient execution of
jobs, job arrays, and workflows on heterogeneous
and dynamic Grids. - P-GRADE Portal (http//portal.p-grade.hu/)
- The Parallel Grid Run-time and Application
Development Environment Portal (P-GRADE Portal)
is a workflow-oriented graphical environment that
covers every stage of Grid application
lifecycles. - Ganga (http//ganga.web.cern.ch/ganga/)
- Ganga is an easy-to-use frontend for job
definition and management, implemented in Python.
Ganga allows trivial switching between testing on
a local batch system and large-scale processing
on Grid resources.