ATLAS Distributed Analysis: Overview - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

ATLAS Distributed Analysis: Overview

Description:

... describes an operation that can act on a dataset to produce a new dataset ... New release sorts this list into groups. typically one per processing stage ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 23
Provided by: rhic1
Category:

less

Transcript and Presenter's Notes

Title: ATLAS Distributed Analysis: Overview


1
ATLAS Distributed AnalysisOverview
Distributed Analysis working group ATLAS software
workshop
  • David Adams
  • BNL
  • December 8, 2004

2
Contents
  • ADA Architecture
  • Components
  • Datasets
  • Transformations
  • Services
  • Changes
  • Generic dataset schema
  • Hierarchical content
  • DIAL catalog interfaces
  • Goals for this release
  • Current status
  • Goals for the next release
  • Transformation interface
  • Conclusions

3
ADA Architecture
Generalized
4
Components
  • ADA model
  • Data described by a dataset
  • Location of data, e.g. files
  • Content, e.g. list of event IDs and the type of
    data for each event
  • Transformation describes an operation that can
    act on a dataset to produce a new dataset
  • Application code shared by multiple
    transformations
  • Task user-supplied configuration (parameters or
    code)
  • Job is an instance of a transformation acting on
    a dataset
  • User preferences may be provided
  • Should not affect the essential result
  • Typically run as a collection of sub-jobs by
    splitting the input dataset
  • Each sub-job applies the same xform its
    sub-dataset
  • Results (output datasets) must be merged
  • More generally the transformation might be a DAG
    (future)

5
Components (cont)
Transformation
6
Datasets
  • Datasets enable users to examine and access data
  • For ATLAS data, we identify
  • Types of data
  • Used to define dataset categories
  • Category is part of the content specification
  • Types of datasets
  • Currently C classes with XML data
    representation
  • Third column indicates if this class exists
  • Parameter in the new dataset XML
  • See table on following page for ATLAS examples
  • There is now a single XML schema for all types of
    datasets

7
Datasets
8
Datasets (cont)
Example dataset
acas001gt dataset_property -i 10003-20151 print
AtlasPoolEventDataset 10003-20151 with no parent
is locked and not empty   Content includes 1
block     AtlasPoolEventDatasetAOD      
Content ID list has 17 entries         type
MissingET with with key MET_Base         type
MissingET with with key MET_Calib         type
MissingET with with key MET_Truth         type
ParticleBaseContainer with with key BCandidates
        type ParticleBaseContainer with with key
ElectronCollection         type
ParticleBaseContainer with with key
MuonCollection         type ParticleBaseContainer
with with key ParticleJetContainer         type
ParticleBaseContainer with with key
PhotonCollection         type ParticleBaseContain
er with with key TauJetCollection         type
LVL1_ROI with with key LVL1_ROI         type
VxContainer with with key VxPrimaryCandidate
        type CTP_Decision with with key
CTP_Decision         type INavigable4MomentumColl
ection with with key MuonboyTrackParticles
        type INavigable4MomentumCollection with
with key TrackParticleCandidate         type
RecTrackParticleContainer with with key
MooreTrackParticles         type
RecTrackParticleContainer with with key
MuidCombnoSeedTrackParticles         type
RecTrackParticleContainer with with key
MuidStandAloneTrackParticles       Event count
is 1073   Location has 1 logical file    
Logical file       Catalog MagdaFileCatalogAtl
as       ID AOD_3401_MultiLeptonGamma.AOD.pool.r
oot       State READONLY
Type
ID
Content type
Content
Too many events to list
Location
No sub-datasets
9
Transformations (cont)
Now
Soon
  • For ATLAS we identify the above transformations
  • Characterized by input and output dataset
    categories
  • Most common ones listed above
  • Others likely
  • Those available now are highlighted
  • See talks by F. Fassi and C. Haeberli

10
Services
  • Services enable users to find and examine
    existing data and create new data.
  • Services include
  • Analysis services to submit and monitor jobs
  • Catalog services to
  • Select data
  • Record data, metadata and transformations
  • Examine and record data provenance
  • Data management services to access the data
    (files)
  • Clients provide the user interface to these
    services
  • ROOT command line
  • Python command line (back soon)
  • GUI (based on Python) planned

11
Changes
  • Move from DIAL 0.92 to 0.94 (almost)
  • Generic dataset schema (see following)
  • Hierarchical content (see following)
  • Unique ID service
  • Many changes to catalog interface (see following)
  • Transformations
  • Integration with production system (C. Haeberli)
  • Integrate analysis algorithm from the analysis
    tools group (F. Fassi)
  • Package management
  • Define user/application interface (G. Rybkine)
  • Provide reference implementation (G. Rybkine)

12
Changes (cont)
  • Analysis services
  • Continued integration with GLite (D.Liko)
  • Begin work on prodsys analysis service (F.
    Brochu)
  • Data management
  • Improved understanding of SRM
  • Integration of gLite prototype file catalog into
    DQ (F. Orellana)

13
Generic dataset schema
  • Version 0.94 of DIAL include a class
    GenericDataset
  • Means to write to and read from an XML
    description
  • All ADA datasets inherit from this without adding
    persistent data
  • Advantages
  • Processing system does not need to know the full
    dataset type
  • Much easier to make use of datasets outside of
    DIAL
  • Including languages other than C, e.g. python
  • Other components already have generic schema
  • I.e., the application, task, job
  • Schema for the first two need work

14
Hierarchical content
  • Each dataset description includes content
  • List of event IDs if relevant and not too large
  • List of type-keys describing the contained object
  • For each event in an event dataset
  • Like the type-keys in StoreGate
  • New release sorts this list into groups
  • typically one per processing stage
  • For ATLAS RDO, ESD, AOD,
  • Dataset can now hold both ESD and AOD with clear
    distinction

15
DIAL catalog interface
  • Much work in DIAL to rationalize the interface
    through which users interact with catalogs
  • Class interface for standard catalog types
  • XyzRepository stores string (XML) descriptions of
    Xyz objects
  • XyzSelectionCatalog associates metadata with Xyz
    ID and name
  • XyzReplicaCatalog associates replica-logical IDs
    for Xyz
  • Here Xyz Dataset, Job, Application, Task,
  • Generic interface for each of the above
  • String ID instead Xyz ID
  • So implementation of GenericRepository interface
    can be shared by DatasetRepository,
    JobRepository,
  • Generic implementations include
  • File based (only for GenericRepository)
  • MySQL table
  • AMI
  • Web service (so far only GenericRepository)

16
Goals for this release
  • User should be able to
  • Select dataset from DSC (dataset selection
    catalog)
  • Run aodhisto transformation
  • Input is any AOD (or other event collection)
    dataset
  • Output is a dataset containing root histograms
  • Makes use of the analysis tools algorithm
  • User can supply their own job options and
    analysis algorithm
  • Run atlasreco transformation
  • Input is any RDO dataset
  • Output is ESD dataset
  • Makes use of the production system transform for
    release 9.0.x
  • Monitor job status for running jobs
  • Get description including location of any output
    dataset
  • Easily view the histograms in a root histogram
    dataset

17
Current status
  • Releases
  • DIAL release 0.94 is on hold until everything
    else needed for the release goals is in place
  • Dial 0.93 changes often but is now close to what
    0.94 will be
  • Functionality
  • Root demos 4 and 5 have been added to illustrate
    use of aodhisto and atlasreco, respectively
  • aodhisto has only been run with one dataset at
    one site
  • atlasreco cannot use 9.0.2 and is flaky with
    9.0.1 due to ATLAS SW problems
  • Magda is being used to catalog and move files
  • A few demo single-file datasets have been
    cataloged
  • See http//www.atlasgrid.bnl.gov/dialds/dlShowMain
    .pl

18
Goals for the next release
  • Transformations
  • Clarify transformation interface (see following)
  • So users can add transformations
  • Continue development of aodhisto (F. Fassi)
  • Complete suite of prodsys transformations (C.
    Haeberli)
  • Catalogs
  • Build catalog of datasets from existing
    production and user data
  • Add transformation catalogs
  • Add local (to server) and global job catalogs
  • Provide catalog interface integrated with job
    submission client(s)
  • Analysis services
  • Enable ADA production with DC2 production system
    (F. Brochu)
  • Enable ADA production and analysis with gLite WMS
    (D. Liko)

19
Goals for the next release (cont)
  • Data management
  • User anywhere can put and get data from a storage
    element (SE)
  • SE can retrieve requested data from other SEs
  • Integrate DIAL with DQ and SRM (F. Orellana)
  • Package management
  • Continue development of ADA package management
    interface and implementations (G. Rybkine)
  • Integrate DIAL with ADA PM system
  • Deploy PM at processing sites, i.e. integrate
    with existing systems
  • AJDL
  • Revisit transformation specification
  • Integrate with GANGA and DC2 production system
  • Better error reporting

20
Transformation interface
  • Clarify and document transformation interface
  • How xform is packaged and released
  • How analysis service finds xform
  • Runtime environment that a xform can expect
  • How xform is called
  • How xform finds input dataset and extracts it
    files
  • How transform locates software (including itself
    and its task)
  • How transform stores output files and creates
    output dataset
  • How transform indicates job status (running,
    failed, done, )
  • How task (user code) is built and accessed
  • Make it easy for users to add their own
    transformations
  • E.g. run my athena algorithm
  • Keep task mechanism for runtime configuration

21
Conclusions
  • Status
  • Much progress since last meeting but more to do
  • Still in demo mode
  • Releases
  • Expect DIAL 0.94 soon
  • When other pieces are in place
  • Then like to get feedback on interface and
    functionality
  • Aim for ADA/DIAL 1.0 in February
  • Useful system more than demo
  • Meeting the short-term goals outlined earlier
  • Need more people
  • Within ADA
  • More attention from external providers (DQ, AMI,
    prodsys)
  • Physics contributions of data and algorithms

22
More information
  • For more information on ADA, see the home page
  • http//www.usatlas.bnl.gov/ADA
  • Includes status of subprojects, relevant talks
    and documents, and links to associated projects
  • DIAL release 0.94 is described at
  • http//www.usatlas.bnl.gov/dladams/dial/releases/
    0.94/index.html
  • To try it out, run DIAL root demos 4 and 5 in
    that release
  • Comments and questions
  • ADA mailing list
  • ADA Savannah coming soon
  • DIAL Savannah (with bug reporting) linked from
    DIAL page
Write a Comment
User Comments (0)
About PowerShow.com