Title: Grape for analysis
1- Grape for analysis
- M.Corvo, F.Fanzago, N.Smirnov
- INFN Padova
2Goals
- To show how we think to implement the real
analysis job on Grape - Grape was already used to run some analysis job
but it is necessary to add some functionalities
(like data discovery according to PubDB and
automatic retrieve of output...), evaluate
architecture and test it. - Grape was developed to run production, but now
we want to concentrate only on analysis tasks.
1
3What the user should provide...
... as information written into grape.cfg file
a) The analysis input parameter dataset and
owner b) The number of events to analyze for
each job (job splitting) c) The name of ORCA
executable to run on WN d) The name of output
file produced by executable (root file) e) The
user orcarc card ... and ... e) GRAPE finds
the executable and the libraries into the user
SCRAM area, in order to pack them and include
into jdl InputSandbox f) GRAPE modifies orcarc
card according to job splitting and include into
jdl InputSandox
2
4GRAPE workflow
1) Read grape.cfg file 2) Create scripts to
submit a) data discovery (quering PubDB)
b) packaging of user code c) modify orcarc
d) create the shell script to run on WN (wrapper
of orca executable) 3) Create jdl files 4)
Submit jobs to the Grid (without Boss as first
prototype) 5) Automatic job output retrieval
3
5How GRAPE uses user information (1)
- Data discovery
- Query the CERN PubDB to discover where the data
are stored (by RC name field). - Possibly more than one site.
- Sites storing data will be written like
requirement into jdl file so the Resource Broker
is driven to match one of them like resources
where to submit the analysis job. - The RB decides where to send job
- With the same query get also local catalogs
location (and access protocol) for all sites. - Local catalog
- Information are sent with jobs via InputSandbox
(catalogs_file) - On WN, use catalogs_file to get the correct POOL
catalog, depending on the site, and put into the
orcarc card.
4
6How GRAPE uses user information (2)
Packaging of code and modify the card The name of
analysis executable is necessary to package the
code and related libraries into a tgz archive to
be sent with InputSandox. The environment
variable LOCALRT provides the path of the user
scram area. The orcarc provided by the user will
be modified by Grape according to job splitting
(will PubDB publish the total number of events
of dataset-owner??), that means to change the
FirstEvent, MaxEvents. Creates jdl to submit
to the grid The InputSandbox is filled with 1)
tzg archive of user code 2) orcarc card 3)
catalogs_file obtained from PubDB The
OutputSandbox is defined with 1) root output
file 2) std.out and std.err of grid job
5
7How GRAPE uses user information (3)
Creates script to run on WN that 1) set the CMS
environment to run ORCA in LCG environment 2)
create scram area 3) unpack the user code into
the scram area 4) overwrite the
InputFileCatalogURL into the orcarc card with the
correct POOL file to use, selected from
catalogs_file according to the site where job is
running.Eventually copy local catalog if needed
(eg RFIO protocol) 5) run the executable 6)
rename output file accordling with job splitting
(mv MyHisto.root MyHisto_n.root) 7) the produced
output (root file) returns to the user via
OutputSandbox (not stage of output into a SE and
registration in RLS) Submit the job to the Grid
via edg-job-submit command, eventually with
BOSS. The monitoring is done via grid command
(edg-job-status). In the future we are thinking
to use BOSS or GridICE (with application
monitoring implementation). Retrieve of output A
wrapper script of edg-job-get-output command
that, when job is finished, retrieves
automatically the output and puts files into a
user predefined directory.
6
8What done and what to do
The general architecture is already done Grape
was already used to run analysis on LCG
environment We are implementing - connection
with PubDB and modification of shell scripts
(Nikolai and Federica) - software packaging and
automatic output retrieve (Marco) - monitoring
to do... We think to have a running prototype
for the end of next week. We are happy if people
will try to use it and provide feedback !!!
7