ARDA Interim Report to the LCG SC2 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

ARDA Interim Report to the LCG SC2

Description:

Both of these are in progress --- will provide a technical annex that documents these. This is a main thrust of the ARDA roadmap ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 27
Provided by: lcgWe
Category:
Tags: arda | lcg | amain | interim | report | sc2

less

Transcript and Presenter's Notes

Title: ARDA Interim Report to the LCG SC2


1
ARDA Interim Reportto the LCG SC2
  • L.A.T.Bauerdick/Fermilab
  • For the RTAG-11/ARDA group

2
ARDA Mandate

3
ARDA Mandate

4
ARDA Mandate
Long list of projects being looked at, analyzing
how their components and services would map to
the ARDA services, synthesized to provide
description of ARDA components

GAG discussed an initial internal working draft,
GAG to follow up
Both of these are in progress --- will provide a
technical annex that documents these
This is a main thrust of the ARDA roadmap
Will be part of the technical annex -- e.g.
security, auditing etc
Main deliverable of ARDA, approach to be
described in this talk
5
ARDA Schedule and Makeup

6
ARDA Schedule and Makeup
  • No written draft report today (too late for
    reviews anyway)
  • Instead verbal interim report, with indication of
    initial guidance to the LCG and experiments
  • The report is clearly not finished, but
    blueprint for a roadmap and its waypoints
    exists (in the heads of the committee members)
  • Still talking to experiments and DA projects
  • See what we can do -- want to have initial
    recommendations for that date
  • Alice Fons Rademakers and Predrag Buncic
  • Atlas Roger Jones and Rob Gardner
  • CMS Lothar Bauerdick and Lucia Silvestris
  • LHCb Philippe Charpentier and Andrei
    Tsaregorodtsev
  • LCG GTA David Foster, stand-in Massimo Lamanna
  • LCG AA Torre Wenaus
  • GAG Federico Carminati

7
ARDA mode of operation
  • Thank you for an excellent committee -- large
    expertise, agility and responsiveness, very
    constructive and open-minded, and sacrificing
    quite a bit of the summer
  • Series of weekly meetings July and August,
    mini-workshop in September
  • Invited talks from existing experiments
    projects
  • Summary of Caltech GAE workshop (Torre)
  • PROOF (Fons)
  • AliEn (Predrag)
  • DIAL (David Adams)
  • GAE and Clarens (Conrad Steenberg)
  • Ganga (Pere Mato)
  • Dirac (Andrei)
  • Cross-check w/ other projects of emerging ARDA
    decomposition of services
  • Magda, DIAL -- Torre, Rob
  • EDG, NorduGrid -- Andrei, Massimo
  • SAM, MCRunjob -- Roger, Lothar
  • BOSS, MCRunob -- Lucia, Lothar
  • Clarens, GAE -- Lucia, Lothar
  • Ganga -- Rob, Torre
  • PROOF -- Fons
  • AliEn -- Predrag

8
Initial Picture Distributed Analysis (Torre,
Caltech w/s)
9
Hepcal-II Analysis Use Cases
  • Scenarios based on GAG HEPCAL-II report
  • Register as a user
  • Make sure resources are available
  • Perform queries on the Metadata Catalogue(s) to
    determine Data Sets
  • Select event components
  • Perform iterative analysis activity looping over
    event components
  • Specific requirements from Hepcal-II
  • Job traceability, provenance, logbooks
  • Also discussed support for finer-grain access
    control and enabling to share data within physics
    groups

10
e.g. Asynchronous Analysis Mode in AliEn
11
ARDA Roadmap Informed By DA Implementations
  • Following SC2 advice, reviewed major existing DA
    projects
  • Clearly AliEn today provides the most complete
    implementation of a distributed analysis
    services, that is fully functional -- also
    interfaces to PROOF
  • Implements the major Hepcal-II use cases
  • Presents a clean API to experiments application,
    Web portals,
  • Should address most requirements for upcoming
    experiments physics studies
  • Existing and fully functional interface to
    complete analysis package --- ROOT
  • Interface to PROOF cluster-based interactive
    analysis system
  • Interfaces to any other system well defined and
    certainly feasible
  • Based on Web-services, with global (federated)
    database to give state and persistency to the
    system
  • ARDA approach
  • Re-factoring AliEn, using the experience of the
    other project, to generalize it in an
    architecture Consider OGSI as a natural
    foundation for that
  • Confront ARDA services with existing projects
    (notably EDG, SAM, Dirac, etc)
  • Synthesize service definition, defining their
    contracts and behavior
  • Blueprint for initial distributed analysis
    service infrastructure
  • ARDA services blueprint gains credibility w/
    functional prototypical implementation

12
ARDA Distributed Analysis Services
  • Distributed Analysis in a Grid Services based
    architecture
  • ARDA Services should be OGSI compliant -- built
    upon OGSI middleware
  • Frameworks and applications use ARDA API with
    bindings to C, Java, Python, PERL,
  • interface through UI/API factory --
    authentication, persistent session
  • Fabric Interface to resources through CE, SE
    services
  • job description language, based on Condor
    ClassAds and matchmaking
  • Database(ses) through Dbase Proxy provide
    statefulness and persistence
  • We arrived at a decomposition into the following
    key services
  • API and User Interface
  • Authentication, Authorization, Accounting and
    Auditing services
  • Workload Management and Data Management services
  • File and (event) Metadata Catalogues
  • Information service
  • Grid and Job Monitoring services
  • Storage Element and Computing Element services
  • Package Manager and Job Provenance services

13
AliEn (re-factored)

14

15
ARDA Key Services for Distributed Analysis
16
API and User Interface

17
API and User Interface
  • ARDA services present an API, called by
    applications like the experiments frameworks,
    interactive analysis packages, Grid portals, Grid
    shells, etc
  • allows to implement a wide variety of different
    applications. Examples are command line interface
    similar to a UNIX file system. Similar
    functionality can be provided by graphical user
    interfaces.
  • Using these interfaces, it will be possible to
    access the catalogue, submit jobs and retrieve
    the output. Web portals can be provided as an
    alternative user interface, where one can check
    the status of the current and past jobs, submit
    new jobs and interact with them.
  • Web portals should also offer additional
    functionality to power users Grid
    administrators can check the status of all
    services, monitor, start and stop them while VO
    administrators (production user) can submit and
    manipulate bulk jobs.
  • The user interface can use the Condor ClassAds as
    a Job Description Language
  • This will maintain compatibility with existing
    job execution services, in particular LCG-1.
  • The JDL defines the executable, its arguments and
    the software packages or data and the resources
    that are required by the job
  • The Workload Management service can modify the
    jobs JDL entry by adding or elaborating
    requirements based on the detailed information it
    can get from the system like the exact location
    of the dataset and replicas, client and service
    capabilities.

18
File Catalogue and Data Management
  • Input and output associated with any job can be
    registered in the File Catalogue, a virtual file
    system in which a logical name is assigned to a
    file.
  • Unlike real file systems, the File Catalogue does
    not own the files it only keeps an association
    between the Logical File Name (LFN) and (possibly
    more than one) Physical File Names (PFN) on a
    real file or mass storage system. PFNs describe
    the physical location of the files and include
    the name of the Storage Element and the path to
    the local file.
  • The system should support file replication and
    caching and will use file location information
    when it comes to scheduling jobs for execution.
  • The directories and files in the File Catalogue
    have privileges for owner, group and the world.
    This means that every user can have exclusive
    read and write privileges for his portion of the
    logical file namespace (home directory).
  • Etc pp

19
Job Provenance service
  • The File Catalogue is not meant to support only
    regular files this is extended to include
    information about running processes in the system
    (in analogy with the /proc directory on Linux
    systems) and to support virtual data services
  • Each job sent for execution gets an unique id and
    a corresponding /proc/id directory where it can
    register temporary files, standard input and
    output as well as all job products. In a typical
    production scenario, only after a separate
    process has verified the output, the job products
    will be renamed and registered in their final
    destination in the File Catalogue. The entries
    (LFNs) in the File Catalogue have an immutable
    unique file id attribute that is required to
    support long references (for instance in ROOT)
    and symbolic links.

20
Package Manager Service
  • Allows dynamic installation of application
    software released by the VO (e.g. the experiment
    or a physics group).
  • Each VO can provide the Packages and Commands
    that can be subsequently executed. Once the
    corresponding files with bundled executables and
    libraries are published in the File Catalogue and
    registered, the Package Manager will install them
    automatically as soon as a job becomes eligible
    to run on a site whose policy accepts these jobs.
  • While installing the package in a shared package
    repository, the Package Manager will resolve the
    dependencies on other packages and, taking into
    account package versions, install them as well.
    This means that old versions of packages can be
    safely removed from the shared repository and, if
    these are needed again at some point later, they
    will be re-installed automatically by the system.
    This provides a convenient and automated way to
    distribute the experiment specific software
    across the Grid and assures accountability in the
    long term.

21
Computing Element
  • Computing Element is a service representing a
    computing resource. Its interface should allow
    submission of a job to be executed on the
    underlying computing facility, access to the job
    status information as well as high level job
    manipulation commands. The interface should also
    provide access to the dynamic status of the
    computing resource like its available capacity,
    load, number of waiting and running jobs.
  • This service should be available on a per VO
    basis.

22
Etc. pp

23
General ARDA Roadmap
  • Emerging picture of waypoints on the ARDA
    roadmap
  • ARDA RTAG report
  • review of existing projects, component
    decomposition re-factoring, capturing of common
    architectures, synthesis of existing approaches
  • recommendations for a prototypical architecture
    and definition of prototypical functionality and
    a development strategy
  • development of a prototype and first release
  • Re-factoring AliEn web services, studying the
    ARDA architecture in a OGSI context, based on
    existing implementation
  • POOL and other LCG components (VO, CE, SE, )
    interface to ARDA
  • Adaptation of specific ARDA services to
    experiments requirements
  • E.g. File catalogs, package manager, metadata
    handling for different data models
  • Integration with and deployment on LCG-1
    resources and services
  • Re-engineering of prototypical ARDA services, as
    required
  • Evolving services scaling up and adding
    functionality, robustness, resilience, etc

24
Talking Points
  • System deals with files, not objects
  • however, Object Location service can be added if
    required
  • Investigate experiments metadata/file catalog
    interaction
  • Interface to LCG-1 infrastructure
  • VDT/EDG interface through CE, SE and the use of
    JDL
  • ARDA VO services should take into account
    emerging VO management infrastructure
  • VO system and site security
  • Jobs are executed on behalf of VO, however users
    fully traceable
  • How do policies get implemented, e.g. analysis
    priorities, MoU contributions etc
  • Auditing and accounting system, priorities
    through special optimizers
  • accounting of site contributions, that depend
    what resources sites expose
  • Prototype could be based on global database
  • Address latency, stability and scalability issues
    up-front good experience exists
  • ARDA Prototype provides an Physics Analysis
    environment for experiment framework based and
    ROOT based analysis of distributed experiments
    data
  • Interfacing to other analysis packages, event
    displays, etc. can be implemented easily

25
Major Role for Middleware Engineering
  • ARDA roadmap based on a well-factored prototype
    implementation that allows evolutionary
    development into a complete system that evolves
    to the full LHC scale
  • David Foster lets recognize that very little
    work has so far been done on the underlying
    mechanisms needed to provide the appropriate
    foundations (message passing structures, fault
    recovery procedures, component instrumentation
    etc)
  • ARDA prototype would be pretty lightweight
  • Stability through basing on global database to
    which services talk through a database proxy
  • people know how to do large databases -- well
    founded principle (see e.g. SAM for RunII), with
    many possible migration paths
  • HEP-specific services, however based on generic
    OGSI-compliant services
  • Expect LCG/EGEE middleware effort to play major
    role to evolve this foundation, concepts and
    implementation
  • re-casting the (HEP-specific event-data analysis
    oriented) services into more general services,
    from which the ARDA services would be derived
  • addressing major issues like a solid OGSI
    foundation, robustness, resilience, fault
    recovery, operation and debugging

26
Conclusions
  • ARDA is identifying a services oriented
    architecture and an initial decomposition of
    services required for distributed analysis
  • Recognize a central role for a Grid API which
    provides a factory of user interfaces for
    experiment frameworks, applications, portals, etc
  • ARDA Prototype would provide an distributed
    physics analysis environment of distributed
    experimental data
  • for experiment framework based analysis
  • Cobra, Athena, Gaudi, AliRoot,
  • for ROOT based analysis
  • interfacing to other analysis packages like JAS
    event displays like Iguana grid portals etc.
    can be implemented easily
Write a Comment
User Comments (0)
About PowerShow.com