Software Packaging with DAR - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Software Packaging with DAR

Description:

call Packager: CMSIM_packager, CMKIN_packager, or DAR_packager for scram managed ... packager builds executables as requested and creates distribution ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 26
Provided by: wwwco1
Category:

less

Transcript and Presenter's Notes

Title: Software Packaging with DAR


1
Software Packaging with DAR
  • Natalia Ratnikova, Anzar Afaq, Greg Graham
  • Fermilab
  • Tony Wildish, Veronique Lefebure
  • CERN

2
Introduction ? Motivation
  • Compact Muon Solenoid CMS HEP experiment will run
    on the LHC accelerator at CERN.
  • CMS is using GRID technologies to utilize
    available computing resources for the worldwide
    distributed Monte Carlo event production.
  • To make this possible CMS software applications
    must be brought to the production sites.

3
Introduction ? Scope
  • CMS software includes a wide range of
    inter-related projects and external tools managed
    by the Software Configuration Release and
    Management tool SCRAM.
  • Complete installation of the CMS software and
    environment on the remote sites is uneasy task,
    and actually it is not necessarily required in
    order to run ready applications.

4
Introduction ? Goal
  • The USCMS software and computing project goal was
    to move CMS MC production in the US completely
    onto the GRID computing resources.
  • We wanted to have an automated way to create
    self-consistent distributions of the
    applications, based on the software released at
    CERN.
  • The Distribution After Release DAR tool was
    developed at Fermilab for quick-an-easy
    deployment of the CMS software applications,
    which can run on the systems that do not have
    pre-existing CMS environment.

5
DAR concept
  • DAR automatically creates and installs software
    applications based on the runtime environment .
  • Application is a complete, self-contained
    software program, including all required shared
    libraries and other files, that can be executed
    in a particular environment to accomplish a
    particular computing task.
  • Runtime environment is a set of UNIX shell
    environment variables used by the program during
    the runtime.

6
Concept ? Choices, Decisions
  • There is a class of tools and utilities, such as
    operating system kernel, loader, that though
    needed for the applications, are usually present
    on the remote computing node.
  • Its hard to define a clear border between the
    application and the operating system, so one
    sometimes has to decide what to include into the
    distribution, and what must be pre-installed by
    the local system administrator
  • In CMS software these issues are controlled
    through the projects configuration, which
    specifies the required tools and corresponding
    environment

7
Concept ? Conditionals
  • Application software must be relocatable.
  • This is most important and natural requirement.
    Most of real quality software products are
    relocatable, and the location is usually
    controlled through the shell environment
    variable.
  • No hard-coded absolute paths in the program or in
    the shared libraries (except those referred to
    the system area).
  • All executables are found in the PATH.
  • DAR distributions rely on the system
    compatibility

8
DAR Implementation
  • DAR is implemented in scripting languages, no
    compilation is required.
  • Core of DAR code is written in PERL.
  • Interfaces and extensions are written in Python.
  • DAR code can be simply download from the CVS
    repository or from the web, and can be used
    immediately. In the CMS environment
  • dar -c lttop release directorygt lttemporary
    directorygt
  • On the remote site
  • Dar -i ltdistribution darballgt ltinstallation
    directorygt

9
Implementation ?Shared Libraries
  • DAR will walk through the directories specified
    in the LD_LIBRARY_PATH environment variable and
    package all found libraries into the
    distribution. It will insure that upon
    installation the runtime environment scripts will
    set proper LD_LIBRARY_PATH in correct order.
  • DAR does not rely on the output of the
  • ldd ltexecutablegt
  • command, as this is considered unsafe in case
    of dynamically loaded libraries.

10
Implementation ? Executables
  • By default DAR will walk through the directories
    in the PATH environment variable (only the
    portion added for this particular application)
    and include the contents of directories into
    DARball.
  • This behavior can be overwritten by setting the
    DAR_runtime_PATH environment variable, in which
    case the associated files and directories will be
    included into the distribution, and will be added
    to the PATH in the DAR runtime environment
    scripts.

11
Implementation ? Other Variables
  • DAR distinguishes between three types of the
    runtime environment variables
  • Simple values (flags)
  • Variables associated with some path to existing
    file or directory in the local file system
  • Variables associated with several paths in the
    local file system (PATH-like variables, were
    entries are separated by the colon delimiter)
  • All physical files and directories found in
    specified paths are included preserving the
    underlying directory structure.

12
General Practices, Tests
  • All sophisticated work is done by DAR while
    creating the distribution
  • The installation procedure is extremely simple.
  • Friendly user interface simple commands,
    built-in help, backward compatibility.

13
Tests
  • Run same application in the native environment
  • Install DARball and run on the same node
  • Install and run application on remote host
    without pre-installed CMS environment
  • Same output in all three cases means success.
  • Second type of tests is optional, and can be used
  • to identify any discrepancies in the operating
  • system configuration.

14
Using DAR in Production
  • DAR created distributions have been used as a
    mandatory way to install software for the
    official CMS Monte Carlo production.
  • Using the same set of applications and consistent
    software distribution mechanisms insured stable
    performance and trustworthy results.
  • The RefDB2DAR interface has been developed to
    formalize the requests for applications and
  • provide bookkeeping of the available
    distributions.

15
CMS production over GRID (fall 2002)
The CMS Integration GRID Testbed produced 1.2
million CMS Monte Carlo events from generation
with PYTHIA physics generator through simulation
with GEANT and digitization with Objectivity
based applications.All results shown here were
run on Red Hat 6 systems, though some GEANT-only
production was also run on newer Red Hat 7
systems.
16
Next steps ? Bookkeepiing
  • The RefDB2DAR interface allows to download
    request file from the RefDB.
  • Refdbdar utility is then used to
  • parse and validate the RefDB request file
  • call Packager CMSIM_packager, CMKIN_packager, or
    DAR_packager for scram managed projects,
  • packager builds executables as requested and
    creates distribution

17
Next steps ? Optimizations
  • runtime environment contains some superfluous
    directories and files. However for detection of
    files, that could be safely excluded, expert's
    knowledge of the software application is
    required.
  • a number of new expert options allow to filter
    the contents, but it may take several iterations
    to figure out what can be removed, and whether it
    is efficient and safe.

18
Next steps ? Optimizations
  • Space optimizations
  • Avoid duplications (all duplicated files are
    replaced by symbolic links)
  • Introduced experts options
  • Runtime environment contains some superfluous
    directories and files.
  • However for detection of files, that could be
    safely excluded, expert's knowledge of the
    software application is required
  • Time optimizations
  • Automating tests

19
Distribution process
  1. Production Coordinator fills web form to create
    DARball request. Generated request is stored in
    the RefDB, notification is sent by e-mail.
  2. DARball is created then created using refdbdar
    and request file, based on software release
    installation at CERN.
  3. Application is installed and tested in DAR
    runtime environment.

20
Distribution process
  1. DARball is put into SRB for distribution and is
    ready for the production assignments.
  2. Production sites get the assignments with the
    indication of the DARball (by name). DARball is
    then downloaded from the SRB and installed, using
    DAR, on the worker nodes.
  3. McRunJob tool creates job based on application
    and submitts it to the production GRID.

21
Using DAR in MOP
  • MOP is a system for distributing CMS Monte-Carlo
    production jobs over the GRID.
  • MOP has capability of running any type of scripts
    (jobs) at remote GRID sites, called Worker Sites.
  • MOP run jobs as DAGs (Decyclic Acrylic Graph)
    which could be combined together to create
    complex workflows.

22
Using DAR in MOP
  • In general every DAG contain 04 stages.
  • Stage-in Bring in the required input files (from
    several sources) to worker site.
  • Run Execute the job itself, producing results,
    logs, data.
  • Stage-out Send out produced results/data/logs.
  • Clean-up clean the left over files/directories
    at worker site.

23
Using DAR with MOP
  • DAR installation at a worker site is achieved by
  • creating a special MOP job
  • that first pull DAR tool and Application DAR
    distribution in stage-in,
  • runs installation by invoking DAR in run-stage,
  • Bring back the results of installation to
    submission site in stage-out
  • and then performs a clean up operations at worker
    site.

24
Summary
  • DAR-based distribution scheme is successfully
    used in the CMS event production for an extended
    period of time.
  • It allows to keep the pace with the
    software developments and deliver software
    applications to the productions sites with ease
    and in a timely fashion.
  • Being re-packaged into RPM files, applications
    can be re-used within different distribution
    approaches (e.g. LCFG).

25
Acknowledgements
  • Main credit in this work should be addressed to
    the core CMS software developers, architects and
    release managers for the constant care about
    software quality.
  • We would like to thank CMS and USCMS software and
    computing managers for their attention paid to
    this project, CMS Production Team for providing
    excellent working environment, and all CMS
    colleagues from many counties and institutions
    for their useful feedback.
  • My special thanks to Dr. Yujun Wu for presenting
    this talk to You, and numerous fruitful
    discussions.

THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com