CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid - PowerPoint PPT Presentation

About This Presentation
Title:

CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid

Description:

CMS Applications. Towards Requirements for Data ... Access to large datasets at a few 'central' sites ... Status: Deployed now, need to groom CMS users ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 14
Provided by: tri5532
Category:

less

Transcript and Presenter's Notes

Title: CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid


1
CMS ApplicationsTowards Requirements for Data
Processing and Analysis on the Open Science Grid
  • Greg Graham
  • FNAL CD/CMS for OSG Deployment
  • 16-Dec-2004

2
CMS Applications - General Requirements
  • Access to large datasets at a few central sites
  • Access to small datasets at many distributed
    sites
  • Ability to move large datasets between sites
  • Ability to create jobs to run against these
    datasets
  • Ability to submit jobs and track progress
  • Ability to control/restrict access to
    sites/resources
  • Ability to lookup information about datasets and
    jobs

3
Specific Application Examples
  • CMS Distributed Processing Environment (DPE)
  • VDT/Grid2003 based software package to provide
    CMS specific software on top of Grid software.
  • Monte Carlo Production
  • MCRunjob (CMS Tool) to create jobs, MOP (PPDG) to
    submit jobs using Condor-G, ConfMon to provide
    site parameters for MOP.
  • Large Scale Data Transfer
  • srmcp to transfer results of production from one
    site(transient) to another (permanent)
  • Phedex to transfer data with metadata (GridFTP)

4
Monte Carlo Processing Service
  • A Clarens based system for generating,
    processing, and analyzing Monte Carlo data.
  • Runjob, MOP, DAR software repository, and MOPDb
    deployed behind Clarens Web services
  • SC2004 demo point and click MC generation and
    analysis (Root tuples also served by Clarens)
  • Currently deployed on top of DPE and it requires
    Clarens.
  • Status Deployed now, need to groom CMS users
  • Needed a parameter service to accept and store
    arbitrary job configuration parameters AND a
    context service

5
CMS History with Grid
  • Using Condor-G/Globus based technology to do real
    CMS MC production since 2001.
  • Shook out bugs and performance issues, used MOP
  • Using Grid2003 technology to do real production
    since 2003.
  • Stakeholder in security (SAZ), registration
    (VOMRS), and data transfer protocols
  • We plan to migrate to an OSG product based on the
    current Grid2003
  • Must meet requirements and we are working to
    discover those
  • In the meantime, we assume it will work like it
    does currently for DPE running on top of Grid2003
    cache

6
Current CMS Deployment Activities for OSG
  • Within the DPE scope
  • Moving to VDT 1.2.2 to be consistent with
    Grid3-dev
  • Testing latest versions of SRM
  • We are able to run MOP production with older
    versions of srm. Craig Prescott is investigating
    later versions with Timur. 12/13/04 OSGD
    milestone.
  • MCPS rollout on OSG
  • 3/1/05 OSGD Milestone
  • Testing Condor-C and providing feedback to the
    Condor team and also testing VDT 1.3
  • No milestone listed in the OSG deployment doc
  • Keeping up will help us be ready for OSG turn on

7
Conclusion
  • The requirements for CMS applications running on
    OSG can be gleaned from looking at the current
    requirements for running on Grid2003.
  • The requirements laid out here should be
    concretized in two documents
  • CMS Requirements for OSG Deployment
  • To track the current requirements
  • Impact of OSG Deployment on CMS Software
  • To track evolution of the requirements
  • CMS has a lot of experience running on the Grid
  • Procedures are in place to deal with an evolving
    middleware environment.

8
Summary of Known Requirementsfor OSG Deployment
from CMS
9
Infrastructure MC Production
  • Support for MOP style job submission
  • Condor-G/Globus from VDT 1.14 or better
  • But we are exploring use of Condor-C
  • Information service ConfMon
  • MDS based hack of the Glue Schema to tell MOP
    where to find software remotely, where to deposit
    output files.
  • But we are (hopefully) moving to GridCat
  • Space to drop in CMS application software and
    hold the output temporarily
  • Servers to move the data off of the remote site.
  • srmcp is preferred, GridFTP is default right now.

10
Infrastructure Data Access
  • The requirements are less well known at the
    moment
  • Directory based lookup of data products
  • Since this is CMS based data, we would expect
    that CMS clients would be used to do lookup.
  • Are there any common lookup operations? Then
    CLARENS may be required on the client side.
  • Data movement from large central sites to/from
    small sites
  • srmcp, GridFTP clients are required.
  • Data movement between all sites and push from
    large sites
  • srmcp and GridFTP servers are required.

11
Infrastructure Common
  • Security
  • Middleware needs to have strong authentication
  • Kerberos tickets or equivalent VO authentication
  • Middleware needs to support the callouts or other
    mechanisms used by SAZ database and GUMS
  • We are now dependent upon gridmapfiles, but I am
    not sure if this is required
  • Participating sites need to support the
    interfaces and provide information needed by
    VOMRS.
  • Required to submit jobs to Fermilab, maybe not
    required to accept jobs from Fermilab -)

12
Infrastructure Common
  • Information services
  • Real-time information about running jobs and
    resource usage
  • Historical information and accounting (soft
    requirement)
  • Remote viewing of selected logfiles would also be
    useful (soft requirement - satisfied by
    operations staff?)
  • Catalog services
  • CMS will initially come in with its own file and
    metadata catalogs. In the future we may rely on
    Globus RLS for file replicas. Open question if
    common cataloging services would be useful.
  • Service Discovery
  • CMS will initially come in with its own service
    discovery method (ie- the null one-). In the
    future, we may rely on CLARENS based services.

13
LCG Interoperability
  • We currently have a job creation and submission
    tool that can submit to either LCG or Grid2003
    resources.
  • Interoperability at a lower level may also be
    required to satisfy simultaneously the needs of
    the CMS collaboration and the institutional needs
    of Fermilab.
  • This is currently under development and we are
    very interested in the results.
Write a Comment
User Comments (0)
About PowerShow.com