Santiago Gonz - PowerPoint PPT Presentation

About This Presentation
Title:

Santiago Gonz

Description:

ATLAS Data Challenge 2: A massive Monte Carlo Production on the Grid k Santiago Gonz lez de la Hoz (Santiago.Gonzalez_at_ific.uv.es) on behalf of ATLAS DC2 ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 28
Provided by: Frances203
Category:

less

Transcript and Presenter's Notes

Title: Santiago Gonz


1
ATLAS Data Challenge 2 A massive Monte Carlo
Production on the Grid
k
  • Santiago González de la Hoz (Santiago.Gonzalez_at_if
    ic.uv.es)
  • on behalf of ATLAS DC2 Collaboration
  • EGC 2005
  • Amsterdam, 14/02/2005

2
Overview
  • Introduction
  • ATLAS experiment
  • Data Challenge program
  • ATLAS production system
  • DC2 production phases
  • The 3 Grid flavours (LCG, GRID3 and NorduGrid)
  • ATLAS DC2 production
  • Distributed analysis system
  • Conclusions

3
LHC (CERN)
Introduction LHC/CERN
Mont Blanc, 4810 m
Geneva
4
The challenge of the LHC computing
Storage Raw recording rate 0.1 1
GBytes/sec Accumulating at 5-8
PetaBytes/year 10 PetaBytes of
disk Processing 200,000 of todays fastest
PCs
5
Introduction ATLAS
  • Detector for the study of high-energy
    proton-proton collision.
  • The offline computing will have to deal with an
    output event rate of 100 Hz. i.e 109 events per
    year with an average event size of 1 Mbyte.
  • Researchers are spread all over the world.

6
Introduction Data Challenges
  • Scope and Goals
  • In 2002 ATLAS computing planned a first series of
    Data Challenges (DCs) in order to validate its
  • Computing Model
  • Software
  • Data Model
  • The major features of the DC1 were
  • The development and deployment of the software
    required for the production of large event
    samples
  • The production of those samples involving
    institutions worldwide.
  • ATLAS collaboration decided to perform the DC2
    and in the future the DC3 using the Grid
    middleware developed in several Grid projects
    (Grid flavours) like
  • LHC Computing Grid project (LCG), to which CERN
    is committed
  • GRID3
  • NorduGRID

7
ATLAS production system
  • The production database, which contains abstract
    job definitions
  • The windmill supervisor that reads the production
    database for job definitions and present them to
    the different GRID executors in an easy-to-parse
    XML format
  • The Executors, one for each GRID flavor, that
    receive the job-definitions in XML format and
    convert them to the job description language of
    that particular GRID
  • Don Quijote, the Atlas Data Management System,
    moves files from their temporary output locations
    to their final destination on some Storage
    Element and registers the files in the Replica
    Location Service of that GRID.
  • In order to handle the task of ATLAS DC2 an
    automated production system was designed
  • The ATLAS production system consists of 4
    components

8
DC2 production phases
Bytestream Raw Digits
Task Flow for DC2 data
ESD
Bytestream Raw Digits
Digits (RDO) MCTruth
Mixing
Reconstruction
Hits MCTruth
Events HepMC
Geant4
Digitization
Bytestream Raw Digits
ESD
Digits (RDO) MCTruth
Hits MCTruth
Events HepMC
Pythia
Reconstruction
Geant4
Digitization
Digits (RDO) MCTruth
Hits MCTruth
Events HepMC
Pile-up
Geant4
Bytestream Raw Digits
ESD
Bytestream Raw Digits
Mixing
Reconstruction
Digits (RDO) MCTruth
Events HepMC
Hits MCTruth
Geant4
Bytestream Raw Digits
Pile-up
20 TB
5 TB
20 TB
30 TB
5 TB
Event Mixing
Digitization (Pile-up)
Reconstruction
Detector Simulation
Event generation
Byte stream
Persistency Athena-POOL
TB
Physics events
Min. bias Events
Piled-up events
Mixed events
Mixed events With Pile-up
Volume of data for 107 events
9
DC2 production phases
Process No. of events Event/ size CPU power Volume of data
MB kSI2k-s TB
Event generation 107 0.06 156
Simulation 107 1.9 504 30
Pile-up/ Digitization 107 3.3/1.9 144/16 35
Event mixing Byte-stream 107 2.0 5.4 20
  • The ATLAS DC2 which started in July 2004 finished
    the simulation part at the end of September 2004.
  • 10 million events (100000 jobs) were generated
    and simulated using the three Grid Flavors
  • The Grid technologies have provided the tools to
    generate a large Monte Carlo simulation samples
  • The digitization and Pile-up part was completed
    in December. The pile-up was done on a sub-sample
    of 2 M events.
  • The event mixing and byte-stream production are
    going on

10
The 3 Grid flavors
  • LCG (http//lcg.web.cern.ch/LCG/)
  • The job of the LHC Computing Grid Project LCG
    is to prepare the computing infrastructure for
    the simulation, processing and analysis of LHC
    data for all four of the LHC collaborations. This
    includes both the common infrastructure of
    libraries, tools and frameworks required to
    support the physics application software, and the
    development and deployment of the computing
    services needed to store and process the data,
    providing batch and interactive facilities for
    the worldwide community of physicists involved in
    LHC.
  • NorduGrid (http//www.nordugrid.org/)
  • The aim of the NorduGrid collaboration is to
    deliver a robust, scalable, portable and fully
    featured solution for a global computational and
    data Grid system. NorduGrid develops and deploys
    a set of tools and services the so-called ARC
    middleware, which is a free software.
  • Grid3 (http//www.ivdgl.org/grid2003/)
  • The Grid3 collaboration has deployed an
    international Data Grid with dozens of sites and
    thousands of processors. The facility is operated
    jointly by the U.S. Grid projects iVDGL, GriPhyN
    and PPDG, and the U.S. participants in the LHC
    experiments ATLAS and CMS.
  • Both Grid3 and NorduGrid have similar approaches
    using the same foundations (GLOBUS) as LCG but
    with slightly different middleware.

11
The 3 Grid flavors LCG
  • This infrastructure has been operating since
    2003.
  • The resources used (computational and storage)
    are installed at a large number of Regional
    Computing Centers, interconnected by fast
    networks.
  • 82 sites, 22 countries (This number is evolving
    very fast)
  • 6558 TB
  • 7269 CPUs (shared)

12
The 3 Grid flavors NorduGRID
  • NorduGrid is a research collaboration established
    mainly across Nordic Countries but includes sites
    from other countries.
  • They contributed to a significant part of the DC1
    (using the Grid in 2002).
  • It supports production on non-RedHat 7.3
    platforms
  • 11 countries, 40 sites, 4000 CPUs,
  • 30 TB storage

13
The 3 Grid flavors GRID3
  • Sep 04
  • 30 sites, multi-VO
  • shared resources
  • 3000 CPUs (shared)
  • The deployed infrastructure has been in operation
    since November 2003
  • At this moment running 3 HEP and 2 Biological
    applications
  • Over 100 users authorized to run in GRID3

14
ATLAS DC2 production on LCG, GRID3 and
NorduGrid
  • G4 simulation

total
Validated Jobs
Day
15
Typical job distribution on LCG, GRID3 and
NorduGrid
16
Distributed Analysis system ADA
  • The physicists want to use the Grid to perform
    the analysis of the data too.
  • ADA (ATLAS Distributed Analysis) project aims at
    putting together all software components to
    facilitate the end-user analysis.
  • DIAL It defines the job components (dataset,
    task, applications, etc..). Together with LSF or
    Condor provides interactivity ( a low response
    time).
  • ATPROD production system to be used for low mass
    scale
  • ARDA Analysis system to be interfaced to EGEE
    middleware
  • The ADA architecture

17
Lessons learned from DC2
  • Main problems
  • The production system was in development during
    DC2 phase.
  • The beta status of the services of the Grid
    caused troubles while the system was in operation
  • For example, the Globus RLS, the Resource Broker
    and the information system were unstable at the
    initial phase.
  • Specially on LCG, lack of uniform monitoring
    system.
  • The mis-configuration of sites and site stability
    related problems.
  • Main achievements
  • To have an automatic production system making use
    of Grid infrastructure.
  • 6 TB (out of 30 TB) of data have been moved among
    the different Grid flavours using Don Quijote
    servers.
  • 235000 jobs were submitted by the production
    system
  • 250000 logical files were produced and 2500-3500
    jobs per day distributed over the three Grid
    flavours per day.

18
Conclusions
  • The generation and simulation of events for ATLAS
    DC2 have been completed using 3 flavours of Grid
    Technology.
  • They have been proven to be usable in a coherent
    way for a real production and this is a major
    achievement.
  • This exercise has taught us that all the involved
    elements (Grid middleware, production system,
    deployment and monitoring tools) need
    improvements.
  • Between the start of DC2 in July 2004 and the end
    of September 2004 (it corresponds G4-simulation
    phase), the automatic production system has
    submitted 235000 jobs, they consumed 1.5 million
    SI2K months of cpu and produced more than 30TB of
    physics data.
  • ATLAS is also pursuing a model for distributed
    analysis which would improve the productivity of
    end users by profiting from Grid available
    resources.

19
Backup Slides
20
Supervisor-Executors
Jabber communication pathway
supervisors
executors
1. lexor 2. dulcinea 3. capone 4. legacy
numJobsWanted executeJobs getExecutorData getStatu
s fixJob killJob
Windmill
Prod DB (jobs database)
Don Quijote (file catalog)
21
NorduGRID ARC features
  • ARC is based on Globus Toolkit with core services
    replaced
  • Currently uses Globus Toolkit 2
  • Alternative/extended Grid services
  • Grid Manager that
  • Checks user credentials and authorization
  • Handles jobs locally on clusters (interfaces to
    LRMS)
  • Does stage-in and stage-out of files
  • Lightweight User Interface with built-in resource
    broker
  • Information System based on MDS with a NorduGrid
    schema
  • xRSL job description language (extended Globus
    RSL)
  • Grid Monitor
  • Simple, stable and non-invasive

22
LCG software
  • LCG-2 core packages
  • VDT (Globus2, condor)
  • EDG WP1 (Resource Broker, job submission tools)
  • EDG WP2 (Replica Management tools) lcg tools
  • One central RMC and LRC for each VO, located at
    CERN, ORACLE backend
  • Several bits from other WPs (Config objects,
    InfoProviders, Packaging)
  • GLUE 1.1 (Information schema) few essential LCG
    extensions
  • MDS based Information System with significant LCG
    enhancements (replacements, simplified (see
    poster))
  • Mechanism for application (experiment) software
    distribution
  • Almost all components have gone through some
    reengineering
  • robustness
  • scalability
  • efficiency
  • adaptation to local fabrics
  • The services are now quite stable and the
    performance and scalability has been
    significantly improved (within the limits of the
    current architecture)

23
Grid3 software
  • Grid environment built from core Globus and
    Condor middleware, as delivered through the
    Virtual Data Toolkit (VDT)
  • GRAM, GridFTP, MDS, RLS, VDS
  • equipped with VO and multi-VO security,
    monitoring, and operations services
  • allowing federation with other Grids where
    possible, eg. CERN LHC Computing Grid (LCG)
  • USATLAS GriPhyN VDS execution on LCG sites
  • USCMS storage element interoperability
    (SRM/dCache)
  • Delivering the US LHC Data Challenges

24
ATLAS DC2 (CPU)
25
Typical job distribution on LCG
26
Typical Job distribution on Grid3
27
Jobs distribution on NorduGrid
Write a Comment
User Comments (0)
About PowerShow.com