ATLAS Data Management over GRID - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

ATLAS Data Management over GRID

Description:

Subscription. Any site can subscribe to dataset ... All managed data movement in the system is automated using the subscription system ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 13
Provided by: david2130
Category:

less

Transcript and Presenter's Notes

Title: ATLAS Data Management over GRID


1
ATLAS Data Management over GRID
  • Alexei Klimentov , Pavel Nevski and Torre Wenaus,
    BNL
  • HEPiX spring 2006
  • Rome, Apr 5th 2006

2
Distributed Data Management
  • Very first assumption
  • Raw computing power and storage capacity is
    everywhere.
  • For ATLAS it is 3 Grids, 88 sites with 8K CPUs
    and 2PBs of disks
  • Distributed computing power
  • Distributed storage capacities
  • Data is stored on different storage systems using
    different access technologies
  • So we have not just Grid of resources it is a
    Grid of technologies (ML)
  • Do we have tools to manipulate terabytes of data
  • What do we need
  • High performance and reliable data movement
  • Manage information about data location
  • Manage information about replicating data
  • Support the multiple Grid flavors, the Grid
    specifics must be hidden from the user

3
ATLAS average Tier-1 Data Flow (2008)
D.Barberis
Tape
Inbound from T0 to T1 58.6MB/s (no HI
data) Outbound to T0 2.5MB/s
Tier-0
diskbuffer
CPUfarm
Data access for analysisESD, AODm
diskstorage
4
ATLAS Distributed MC Production
Production per cite
3 Grids 20 countries 69 sites 260000 Jobs 2
MSi2k.months
5
ATLAS Data Management - Don Quijote
  • The second generation of ATLAS DDM system (DQ2,
    M.BrancoD.Cameron)
  • Moved to dataset based approach
  • Technicalities
  • Datasets an aggregation of files plus
    associated metadata
  • Datablock a frozen (permanently immutable)
    aggregation of files for the purposes of
    distributing
  • Global services
  • No global physical file replica catalog
  • global dataset repository
  • global dataset location catalog
  • Local Site services
  • Per grid/site/tier providing logical to physical
    file name mapping. implementations of this
    catalog are Grid specific. Currently all local
    catalogues are deployed per ATLAS site/SE
  • Transfer service (currently gLite FTS) and
    transient database of queued transfers. Triggers
    file transfers, handling all necessary
    bookkeeping.
  • Subscription
  • Any site can subscribe to dataset
  • The new version of dataset is automatically made
    available on site
  • All managed data movement in the system is
    automated using the subscription system
  • Notification
  • When content of dataset is modified, the sites
    subscribing to it are notified and data is moved
    accordingly

6
DQ2 Architecture
7
ATLAS Distributed Data Management
DQ2production status
  • The production version 0_1_4
  • (development version 0_2_x)
  • Deployed on 7 T-1s VO boxes and on T-2s mostly in
    US
  • Central DataBase and services located at CERN
  • Integrated with Panda (US ATLAS production and
    analysis jobs execution system)
  • In use for ATLAS Tile Calorimeter commissioning
    data
  • Under test for Distributed Analysis

8
ATLAS Data Distribution with DQ2
9
Data Handling in Panda Production
10
Data Flow in a Commissioning Project
Project
Proj. Mgr, SwInG
Proj. Mgr
partition
CondDB
TDAQ
ATLAS Commissioning
ATLAS CdataAgent
IS
meta-info
RunControl
Recorder
RunParams panel
DS Selection Catalogue
EventStorage
ATLAS DDM
AMI
End users Proj. Mgr
Raw data files meta
DQ2
local DAQ disks
Raw data files meta
Raw datasets
Data transfer agent
CDR
Point-1
T0
SE (CASTOR)
A.Klimentov
11
Lessons we learned
  • Grid of technologies (and sometimes it is too
    complex)
  • We gain if consolidate data on Tier-1s for
    permanent data storage
  • CPU resources are associated with storage
    resources
  • No gain of using large (TB) sets of data over
    Grid
  • No gain of running very long (weeks) jobs over
    Grid
  • Performance issues depends not so much from the
    network, but from source and destination storage
    systems

12
More Information
  • ATLAS Computing TDR
  • http//atlas-proj-computing-tdr.web.cern.ch/atlas-
    proj-computing-tdr/Html/Computing-TDR.htm
  • DDM
  • https//uimon.cern.ch/twiki/bin/view/Atlas/Distrib
    utedDataManagement
  • Panda
  • https//uimon.cern.ch/twiki/bin/view/Atlas/PanDA
Write a Comment
User Comments (0)
About PowerShow.com