Title: Presentazione di PowerPoint
1Nicola De Filippis
Dipartimento Interateneo di Fisica
dellUniversità e del Politecnico di Bari and INFN
- Outline
- goals of the procedure
- preparation of the POOL catalogues
- attaching of the runs to the META data and the
validation - conclusions
2- to provide an easy and fast access to data
locally from the people performing analysis on
physics channels - to test the handling of MCinfo, Hit, Digis and
Pileup informations in a Tier-1/2 site, also via
Grid tools - to understand the relationship between the META
data and the events files in COBRA and POOL
(still not so clear!) - to gain experience with data management and
transfer -
3Datasets (PCP04) and POOL xml catalogues with
Hits, Digi and pileup fully attached to vergin
META data at Bari
Signal events Background
eg03_zz_2e2mu hg03_zbb_2e2mu_compHEP eg03_tt_2e2mu
(copying) hg03_zbb_cc_2e2mu_compHEP hg03_zbb_lc_2
e2mu_compHEP (the last two still in production)
eg03_hzz_2e2mu_160 eg03_hzz_2e2mu_170 eg03_hzz
_2e2mu_180 eg03_hzz_2e2mu_190
eg03_hzz_2e2mu_200 eg03_hzz_2e2mu_250
eg03_hzz_2e2mu_300 eg03_hzz_2e2mu_450
eg03_hzz_2e2mu_500 eg03_hzz_2e2mu_600
hg03_hzz_2e2mu_115a hg03_hzz_2e2mu_120a
hg03_hzz_2e2mu_130a hg03_hzz_2e2mu_140a
hg03_hzz_2e2mu_150a
- The samples not available locally were
transferred using castorgrid or SRB - A set of scripts was created in order to prepare
local catalogues and attach runs - The analysis job ran over a cluster (70 CPUs)
with a hybrid configuration in order to run
CMS production and analysis locally and in grid
environment
2.
D. Giordano is one of the responsible of H ?
ZZ?2e2m analysis
4- (a) Preparation of local POOL xml catalogues in
few steps - Downloading vergin (without runs) META data
from CERN http//cmsdoc.cern.ch/cms/production/ww
w/cgi/data/META and preparation of the
related POOL xml catalogue - Preparation of the POOL xml catalogue of HITs
and DIGIs runs by extracting the POOL compressed
string of runs from RefDB (pileup data and
catalogue assumed already in local) - Publishing the POOL catalog of META data, hits,
digis and pileup in just one complete POOL file
catalogue - Changing the physical filename of the files in
the catalogue according to the local path of
files or rfio path - Being sure that the META data are accessed
locally and not via rfio
The POOL catalogue is READY to be used
5- (b) Attach of runs to vergin META data in few
steps - Extracting the CARFResume runid of Digis string
from RefDB or from summary files if available - Attaching the runs, fixing the final collection
and checking the META data attached with dsDump - Validation by running ExSimHitStatistics and
ExDigiStatistics ORCA executables to check the
access to hits and digis locally.
The Data sample is READY to be analysed
From my experience the running of ORCA analysis
codes needs the attachment of DIGI runs alone
and the access to also META data and EVD of hits
(not necessary to be attached).
6- all the procedures are based on the parser of the
RefDB web pages and depend strongly on the
structure of RefDB tables those can change due
to multiple hit or digis fields used for tests or
empty field in tables. - it happens that after the decompression of POOL
strings some characters (like 41435a) exist in
the pool fragment related to a run in this case
you have to remove them using the sed command
already included in the scripts. - in addition to the expected META data related to
right owners, other ones are sometimes necessary
to be downloaded and published in the POOL xml
catalogue (mostly Configuration files). This
problem was related to a sometimes wrong
procedure of initialization of Digi META data at
CERN it cannot be avoided. 5-10 of datasets
should be affected by this problem. - problems related to an old version of POOL . The
catalogue has to migrated into a new one with the
command XMLmigrate_POOL1toPOOL1.4 - FCrenamePFN is very slow with large xml catalogs
in replacing dummy path into local path of files!
7- Sometimes the CARFResume runid string in smry
files is different from the one in RefDB because
of multiple submitted jobs (in RefDB the tables
are only updated the first time you sent the smry
file) .so it is better to extract them from
RefDB in order to access to validated
informations - Sometimes the runid as extracted from RefDB
tables is not correct so the script has to be
tuned to work properly (the field number has only
to be changed).
8- Publishing data for analysis is possible in a
Tier 1-2 now! - The global procedure can be optimized and
automatized (is under discussion in DAPROM) - Im in contact with KA (A. Schmidt) for creating
scripts for analysis job submisson in Grid
All the scripts and the documentation are
available in a the file kit_for_analysis.tar.gz
at link http//webcms.ba.infn.it/cms-software/or
ca For information mail to Nicola.Defilippis_at_ba.
infn.it