WP8 Report - PowerPoint PPT Presentation

About This Presentation
Title:

WP8 Report

Description:

Tests on RB, RC and GDMP. Will need Objectivity for tests. ... Had problems with old' version of RB. Will move now to latest version. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 17
Provided by: Harr269
Category:
Tags: rb | report | wp8

less

Transcript and Presenter's Notes

Title: WP8 Report


1
WP8 Report
  • F Harris (Oxford/CERN)

2
Outline of presentation
  • Overview of experiment plans for use of Grid
    facilities/services for tests and data challenges
  • ATLAS
  • ALICE
  • CMS
  • LHCb
  • BaBar
  • D0
  • Status of ATLAS/EDG Task Force work
  • Essential requirements for making 1.2.n usable by
    broader physics user community
  • Future activities of WP8 and some questions
    regarding LCG etc.
  • Summary

3
ATLAS
  • Currently in middle of Phase1 of DC1 (Geant3
    simulation,Athena reconstruction,analysis). Many
    sites in EuropeUSAustralia,Canada,Japan,Taiwan,I
    srael and Russia are involved
  • Phase2 of DC1 will begin Oct-Nov 2002 using new
    event model
  • Plans for use of Grid tools in DCs
  • Phase1 Atlas-EDG Task Force to repeat with EDG
    1.2. 1 of simulations already done.
  • Using CERN,CNAF,Nikhef,RAL,Lyon
  • 9 GB input 100 GB output 2000 CPU hrs
  • Phase2 will make larger use of Grid tools. Maybe
    different sites will use different tools. There
    will be (many?) more sites. This to be defined
    Sep 16-20.
  • 106 CPU hrs 20 TB input to
    reconstruction 5TB output
    (? How much on testbed?)

4
ALICE
  • Alice assume that as soon as a stable version of
    1.2.n is tested and validated it will be
    progressively installed on all EDG testbed
    sites
  • As new sites come will use an automatic tool for
    submission of test jobs of increasing output size
    and duration
  • at the moment do not plan a "data challenge" with
    EDG. However plan a data transfer test, as close
    as possible to the expected data transfer rate
    for a real production and analysis
  • Will concentrate the AliEn/EDG interface and on
    the AliRoot/EDG interface in particular for items
    concerning the Data Management.
  • Will use CERN, CNAF,Nikhef, Lyon,Turin,Catania
    for first tests
  • CPU and store requirements can be tailored to
    availability of facilities in testbed but will
    need some scheduling and priorities

5
CMS
  • CMS currently running production for DAQ
    Technical Design Report(TDR)
  • Requires full chain of CMS software and
    production tools. This includes use of
    Objectivity.(licensing problem in hand..)
  • 5 Data Challenge(DC04) will start Summer 2003
    and will last 7 months. This will produce
    5107 events. In last month all data will be
    reconstructed and distributed to Tier1/2 centres
    for analysis.
  • 1000 CPUs for 5 months
    100 TB output
  • Use of GRID tools and facilities
  • Will not be used for current production
  • Plan to use in DC04 production
  • EDG 1.2 will be used to make scale and
    performance tests (proof of concept). Tests on
    RB, RC and GDMP. Will need Objectivity for tests.
  • IC,RAL,CNAF/BO,Padova,CERN,Nikhef,IN2P3,Ecol-
    Poly,ITEP
  • Some sites will do EDT GLUE tests
  • CPU 50 CPUs distributed Store 200 Gb
    per site
  • V2 seems best candidate for DC04 starting summer
    2003(has functionality required by CMS)

6
LHCB
  • First intensive Data Challenge starts Oct 2002
    currently doing intensive pre-tests at all sites.
  • Participating sites for 2002
  • CERN,Lyon,Bologna,Nikhef,RAL
  • Bristol,Cambridge,Edinburgh,Imperial,Oxford,ITEP
    Moscow,Rio de Janeiro
  • Use of EDG Testbed
  • Install latest OO environment on testbed sites.
    Flexible job submission Grid/non-Grid
  • First tests(now) for MC reconstruction
    analysis with data stored to Mass Store
  • Large scale production tests(by October)
  • Production (if tests OK)
  • Aim to do percentage of production on Testbed
  • Total reqt is 500 CPUs for 2 months 10 TB
  • (10 should be OK on testbed?)

7
BaBar Grid and EDG (talk by G Grodidier)
  • Target have some production environment ready
    for all users by the end of this year
  • with attractive interface tools
  • Customised to SLAC site
  • There were 3 types of issues raised thru
    EDG/Globus(experience with 1.1.4) which were
    solved by local hacks
  • use of LSF Batch Scheduler(uses AFS)
  • AFS File System used for User Home Directories
  • Batch Workers located inside of the IFZ (security
    issue)
  • Three parts of the Globus/EDG software were
    installed at SLAC CE, WN and UI
  • The exercise clearly showed that they are running
    fine altogether, and also with the RB at IC
  • Had problems with old version of RB. Will move
    now to latest version.
  • BaBar now have D.Boutigny on WP8/TWG

8
D0 (Nikhef)
  • Have already ran many events on the testbeds of
    NIKHEF and SARA
  • Wish to extend tests to the whole testbed
  • D0 rpm's are already in the EDG releases and will
    be installed on
  • all sites. Will set up a special VO and RC
    for D0 at NIKHEF on a rather short time scale.
  • Jeff Templon, NIKHEF rep. in WP8, will report
    on work

9
Atlas/EDG Task Force(led by O Smirnova)-
Foundation work accomplished since since late
July
  • ATLAS 3.2.1 RPMs are distributed with the EDG
    tools to provide the ATLAS runtime environment
  • Validation of the ATLAS runtime environment by
    submitting a short (100 input events) DC1 job was
    done at several sites
  • CERN
  • NIKHEF
  • RAL
  • CNAF
  • Lyon
  • Karlsruhe in progress
  • A very fruitful cooperation between ATLAS users
    and EDG experts is ongoing since late July this
    type of dialogue will be a principal factor in
    future developments

10
Whats almost there
  • Input file replication for user this is a
    multi-step procedure, requiring several steps and
    complex GDMP commands
  • Theoretically, it works. However it is very
    sensitive to errors in any step of the chain
  • So far, the recommended procedure worked for
    NIKHEF (input partitions 0003 and 0004)
  • As of Sept 3 Atlas have looked at the use of the
    interim Replica Manager which is much simpler for
    single file replication.

11
What has just become reliable
  • Submission of long jobs
  • The provisional fix of the long known
    gass-cache problem, which allowed the frequent
    submission of short jobs, turned out to cause
    long jobs (approx.gt20) never to get to
    finished status
  • After the fix was removed in recent days
    significant progress has been made. M Schulz had
    success with 23/24 long jobs. The single failure
    was probably related to network problems.
  • A temporary solution the production testbed has
    a RB which point to fixed CE for frequent
    submission of short jobs, and an ATLAS RB
    pointing to unfixed CE for long jobs (sees only
    CERN and Karlsruhe sites, to be extended further)
  • Atlas are running long jobs now, but the
    gass-cache problem has to be fixed to allow also
    frequent submission of short jobs

12
Essential requirements for making 1.2.n usable by
broader physics user community
  • Top level requirements
  • Production testbed to be stable for weeks, not
    hours, and allow spectrum of job submissions
  • Have reasonably easy to use basic functions for
    job submission, replica handling and mass
    storage utilisation
  • Good concise user documentation for all functions
  • Easy for user to get certificates and to get into
    correct VO working environment
  • We had very positive discussions this week on our
    needs in joint meetings with Workpackages
    1256
  • gass-cache problem is absolute top priority
  • Can we wrap data management complexity while
    waiting for version 2? (GDMP is too complex for
    average user) maybe use of interim RM will
    help.
  • We need to clarify use of mass store(Castor,HPSS,R
    AL store) by multi-VOs
  • E.g how is store partitioned between VOs, and how
    does non-Grid user access data

13
More essential requirements on use of 1.2
  • We must put people and procedures in place for
    mapping VO organisation onto test bed sites (e.g.
    quotas, priorities)
  • We must clarify user support at sites (middleware
    applications)
  • Installation of applications software
  • should not be combined with the system
    installation
  • Authentication authorisation
  • Can we streamline this procedure? (40-odd
    countries to accommodate for Atlas!)
  • Documentation ( Training - EDG tutorials for
    experiments)
  • Has to be user-oriented and concise
  • Much good work going on here (user
    guideexamples). About to be released

14
Some longer term requirements
  • Job Submission to take into account availability
    of space on SEs and quota assigned to user
    (e.g. for macro-jobs, say 500 each generating 1
    GB)
  • Mass Store should be on Grid in a transparent
    way (space management, archiving,staging)
  • Need easy to use replica management system
  • Comments
  • Are some of these 1.2.n rather than 2, i.e.
    increments in functionality in successive
    releases?
  • Task Force people should maintain continuing
    dialogue with developers
  • (should include data challenge managers from all
    VOs in dialogue)

15
Future activities of WP8 and some questions
regarding LCG etc.
  • The mandate of WP8 is to facilitate the
    interfacing of applications to EDG middleware,
    and participate in the evaluation and produce
    the evaluation reports (start writing very
    soon!).
  • Loose Cannons have been heavily involved in
    testing middleware components, and have produced
    test software and documentation. This should be
    packaged for use by the Test Group.
  • LCs will be involved in liasing with the
    experiments testing their applications. The
    details of how this relates to the new
    Testing/Validation procedure have to be worked
    out.
  • WP8 have been involved in the development of
    application use cases and participate to
    current ATF activities. This is continuing.
  • We are interested in the feasibility of a common
    application layer running over middleware
    functions. This issue goes into the domain of
    current LCG deliberations.
  • More generally we need to clarify the
    relationship of WP8 work to applications work in
    LCG

16
Summary
  • Current WP8 top priority activity is Atlas/EDG
    Task Force work
  • This has been very positive. Focuses attention on
    the real user problems, and as a result we review
    our requirements, design etc. Remember the
    eternal cycle! We should not be surprised if we
    change our ideas. We must maintain flexibility
    with continuing dialogue between users and
    developers.
  • Will continue Task Force flavoured activities
    with the other experiments
  • Current use of Testbed is focused on main sites
    (CERN,Lyon,Nikhef,CNAF,RAL) this is mainly for
    reasons of support
  • Once stability is achieved (see Atlas/EDG work)
    we will expand to other sites. But we should be
    careful in selection of these sites in the first
    instance. Local support would seem essential.
  • WP8 will maintain a role in architecture
    discussions, and maybe be involved in some common
    application layer developments
  • THANKS To members of IT and the middleware WPs
    for heroic efforts in past months, and to
    Federico for laying WP8 foundations
Write a Comment
User Comments (0)
About PowerShow.com