Implementing Metadata Using RLS/LCG - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Implementing Metadata Using RLS/LCG

Description:

usr - user data. tag - tag information. cnd - candidate information. aod ... micro = hdr usr tag cnd aod ( tru) mini = micro esd. Data access: ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 21
Provided by: hep2
Category:

less

Transcript and Presenter's Notes

Title: Implementing Metadata Using RLS/LCG


1
Implementing Metadata Using RLS/LCG
  • James Cunha Werner
  • University of Manchester
  • http//www.hep.man.ac.uk/u/jamwer/

2
Babar Experiment
  • The BaBar experiment studies the differences
    between matter and antimatter, to throw light on
    the problem, posed by Sakharov, of how the
    matter-antimatter symmetric Big Bang can have
    given rise to todays matter-dominated universe.
  • High energy collisions between electrons and
    positrons produce other elementary particles,
    giving tracks and clusters which are recorded by
    several high granularity detectors and from which
    the properties of the short-lived particles can
    be deduced.

3
  • Each recorded collision, called an event,
    comprises a large volume of data, and thousand of
    millions of events are recorded, giving a total
    dataset size of hundreds of thousands of
    Gigabytes (or hundreds of Terabytes).

4
Sources of Data in Babar
5
Amount of data
Files Size (TB) Events (Million)
Run1 6,972 2.0 593
Run2 11,527 6.3 1,925
Run3 7,383 3.2 951
Run4 16,671 12.2 3,999
Run5 (2xRun4) ??? 32,000 24 8
Run6 (2xRun5) ??? 64,000 48 16
Run7 (2xRun6) ??? 128,000 100 32
SuperBabar !
Systematic errors gtgtgt statistical errors
Same amount of Monte Carlo Generated data!
6
Data Structure
  • The user interface to the eventstore event
    "collection". Each collection represents an
    ordered series of N events and a user can choose
    to read the events from the 1st one in the
    sequence or from any given offset into the
    sequence.
  • Data components
  • hdr - event header
  • usr - user data
  • tag - tag information
  • cnd - candidate information
  • aod - "analysis object data"
  • tru - MC truth data (only in MC data)
  • esd - "event summary data"
  • sim - "sim" data from BgsApp or MooseApp like
    GHits/GVertices (only in MC data)
  • raw - subset of raw data from xtc persisted in
    the Kanga eventstore

7
Data organisation
  • How data are stored (level of detail)
  • micro hdr usr tag cnd aod ( tru)
  • mini micro esd
  • Data access
  • collections - these are "logical" names that
    users use to configure their jobs. These are
    site-independent so (assuming the site has
    imported the data) the same collection name
    should work at any site.
  • logical file names (LFN) - these are
    site-independent names give to all files in the
    eventstore. Any references within the event data
    itself _must_ use LFN's so that these remain
    valid when they are moved from site to site.
  • physical file names (PFN) - these are file names
    that will vary from site to site. In practice
    they are usually derived from the LFN's by adding
    a prefix that encapsulates how the data is
    accessed at that site.

8
(No Transcript)
9
Feeding RLS with metadata
  • Generation of basic metadata file with files
    selection!/bin/bashBbkDatasetTcl
    --dbsitelocal gt MetaLista.txtcat MetaLista.txt
    awk '// print "BbkDatasetTcl --site local
    --nolocal \""1"\""' gtgt geratclchmod 700
    geratcl./geratcl
  • Feeding RLS with basic files
  • !/bin/bashls .tcl awk '// split(1,a,".")
    print "edg-rm --vo babar cr file///home/jamwer/Pg
    mCM2/MetaData/"1 " -l lfn"a1 " gt "
    a1".rlstok"' gtgt alimrlschmod 700
    alimrls./alimrls

10
Conformity CE catalogue
  • Run evaluation software to establish CE
    conformity and perform catalogue update.
  • !/bin/bashldapsearch -x -H
    ldap//lcgbdii02.gridpp.rl.ac.uk2170 -b
    'Mds-vo-namelocal,oGrid' '((objectClassGlueCE)
    (GlueCEAccessControlBaseRuleVObabar))' grep
    "GlueCEUniqueID" gt cenames.txtcat cenames.txt
    awk '// print "./catal "2' gt subload.shchmod
    700 subload.sh./subload.shcat loadrlssubm gtgt
    1.histocat 1.histo awk ' /Sub/ FileName2
    /https/ HandleName2 print "echo " HandleName
    "gt " FileName".tok " ' gtgt gridtokchmod 700
    gridtok./gridtok

11
Conformity validation
  • Verify if site follow experiment standards
  • !/bin/bashecho Hostname
    /bin/hostnameecho Start time
    /bin/dateecholocalpwdecho Babar
    initialisation ". VO_BABAR_SW_DIR/babar-grid-set
    up-env.shechoecho Environment
    variables"printenvechocd localecho Arquivos
    disponiveis locallsechoecho " - - - - - -
    - - - - - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - - - - - - - - - - "echocd
    BFDIST/releases/14.5.2srtpath 14.5.2
    Linux24RH72_i386_gcc2953cd localBbkDatasetTcl
    --dbsitelocal gt MetaLista.txtcat MetaLista.txt
    awk '// print "BbkDatasetTcl --site local
    \""1"\""' gtgt geratclchmod 700
    geratcl./geratclexport CE_NAME1ls .tcl
    awk -v siteCE_NAME '// split(1,a,".") print
    "edg-rm --vo babar addAlias cat " 1"
    lfn"a1"."site ' gtgt alimrlschmod 700
    alimrls./alimrlsechoecho " - - - - - - - -
    - - - - - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - - - - - - - "echoecho End time
    /bin/date

12
Analysis Submission to Grid
(Prototype)
  • Single command ./easygrid dataset_name
  • Perform Handlers management and submission
  • Configurable to achieve users requirements
  • Software based in State-machine
  • Verify skimdata available
  • If not available perform BbkDatasetTCL to
    generate skimData. Each file will be a job.
  • Verify if there are handlers pending
  • If not, script generation (gera.c) with
    edg-job-submit and ClassAdds, and script
    execution. Nest for submission policy and
    optimisation.
  • If yes, verify job status. When the all jobs
    ended, recover results in user folder.

13
Job Submission system, metadata and data
14
Metadata/Event files and Computer elements
For each dataset there is a metadata file
containing the names of the event files. These
physical files are registered with the RLS, with
several logical file names in the format
datsetname_CEJobQueue assigned to them as
aliases, showing the CEs which contain copies of
that dataset. Searching all the aliases for a
dataset name provides a list of CEs to which jobs
can be submitted.
15
Managing large files in Grid
  • The analysis executable is allocated in the SE
    and its logical file name (LFN) is also
    catalogued in the RLS so any WN need download it
    only once.
  • Metadata not only for data, but to support other
    files as well.

16
Gera
  • Generation of all necessary information to submit
    the jobs on the Grid.
  • Job Description Language (JDL) files
  • the script with all necessary tasks to run the
    analysis remotely at a WN
  • some grid dependent analysis parameters.
  • The JDL files define the input sandbox with all
    necessary files to be transferred
  • WN balance load algorithm matches requirements to
    perform the task optimally.

17
Running analysis programs
When the task is delivered in the WN, scripts
start running to initialize the specific Babar
environment, and the analysis software is
downloaded.
18
Benchmarks
Behavior of particles in the BaBar
Electromagnetic Calorimeter (EMC)
  • The different behavior of electrons, hadrons, and
    muons can be distinguished.
  • Performing this analysis takes 7 days using one
    computer 24 hours a day.
  • Using 10 CPUs in parallel, accessed via the Grid,
    it took only 8 hours.

19
  • Pi- N Pi0 decays, with N 1, 2, 3 and 4
  • Invariant masses of pairs of gammas, as measured
    by the EMC, from Pi0 decay produce a mass peak at
    135 MeV (the peak in the plot). All other
    combinations are spread randomly around all
    energies (background).
  • There were 81,700,000 events in the dataset and
    it took 4 days to run in production, with 26 jobs
    in parallel to run it in one single computer
    would take more than 3 months.

20
Summary
  • Easygrid is working and provides all job
    submission structure using LCG grid, RLS and
    metadata management.
  • Provides handlers management transparent to the
    user.
  • Easy to use !!!
  • Configurable to achieve users requirements and
    maybe for other experiments as well.
  • See homepage http//www.hep.man.ac.uk/u/jamwer/
    for more details.
  • Thanks for the opportunity!
Write a Comment
User Comments (0)
About PowerShow.com