Title: ATLAS Data Management over GRID
1ATLAS Data Management over GRID
- Alexei Klimentov , Pavel Nevski and Torre Wenaus,
BNL - HEPiX spring 2006
- Rome, Apr 5th 2006
2Distributed Data Management
- Very first assumption
- Raw computing power and storage capacity is
everywhere. - For ATLAS it is 3 Grids, 88 sites with 8K CPUs
and 2PBs of disks - Distributed computing power
- Distributed storage capacities
- Data is stored on different storage systems using
different access technologies - So we have not just Grid of resources it is a
Grid of technologies (ML) - Do we have tools to manipulate terabytes of data
- What do we need
- High performance and reliable data movement
- Manage information about data location
- Manage information about replicating data
- Support the multiple Grid flavors, the Grid
specifics must be hidden from the user
3ATLAS average Tier-1 Data Flow (2008)
D.Barberis
Tape
Inbound from T0 to T1 58.6MB/s (no HI
data) Outbound to T0 2.5MB/s
Tier-0
diskbuffer
CPUfarm
Data access for analysisESD, AODm
diskstorage
4ATLAS Distributed MC Production
Production per cite
3 Grids 20 countries 69 sites 260000 Jobs 2
MSi2k.months
5ATLAS Data Management - Don Quijote
- The second generation of ATLAS DDM system (DQ2,
M.BrancoD.Cameron) - Moved to dataset based approach
- Technicalities
- Datasets an aggregation of files plus
associated metadata - Datablock a frozen (permanently immutable)
aggregation of files for the purposes of
distributing - Global services
- No global physical file replica catalog
- global dataset repository
- global dataset location catalog
- Local Site services
- Per grid/site/tier providing logical to physical
file name mapping. implementations of this
catalog are Grid specific. Currently all local
catalogues are deployed per ATLAS site/SE - Transfer service (currently gLite FTS) and
transient database of queued transfers. Triggers
file transfers, handling all necessary
bookkeeping. - Subscription
- Any site can subscribe to dataset
- The new version of dataset is automatically made
available on site - All managed data movement in the system is
automated using the subscription system - Notification
- When content of dataset is modified, the sites
subscribing to it are notified and data is moved
accordingly
6DQ2 Architecture
7ATLAS Distributed Data Management
DQ2production status
- The production version 0_1_4
- (development version 0_2_x)
- Deployed on 7 T-1s VO boxes and on T-2s mostly in
US - Central DataBase and services located at CERN
- Integrated with Panda (US ATLAS production and
analysis jobs execution system) - In use for ATLAS Tile Calorimeter commissioning
data - Under test for Distributed Analysis
8ATLAS Data Distribution with DQ2
9Data Handling in Panda Production
10 Data Flow in a Commissioning Project
Project
Proj. Mgr, SwInG
Proj. Mgr
partition
CondDB
TDAQ
ATLAS Commissioning
ATLAS CdataAgent
IS
meta-info
RunControl
Recorder
RunParams panel
DS Selection Catalogue
EventStorage
ATLAS DDM
AMI
End users Proj. Mgr
Raw data files meta
DQ2
local DAQ disks
Raw data files meta
Raw datasets
Data transfer agent
CDR
Point-1
T0
SE (CASTOR)
A.Klimentov
11Lessons we learned
- Grid of technologies (and sometimes it is too
complex) - We gain if consolidate data on Tier-1s for
permanent data storage - CPU resources are associated with storage
resources - No gain of using large (TB) sets of data over
Grid - No gain of running very long (weeks) jobs over
Grid - Performance issues depends not so much from the
network, but from source and destination storage
systems
12More Information
- ATLAS Computing TDR
- http//atlas-proj-computing-tdr.web.cern.ch/atlas-
proj-computing-tdr/Html/Computing-TDR.htm - DDM
- https//uimon.cern.ch/twiki/bin/view/Atlas/Distrib
utedDataManagement - Panda
- https//uimon.cern.ch/twiki/bin/view/Atlas/PanDA