ATLAS Data Management over GRID - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

ATLAS Data Management over GRID

Description:

Subscription. Any site can subscribe to dataset ... All managed data movement in the system is automated using the subscription system ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 13

Provided by: david2130

Category:

more less

Transcript and Presenter's Notes

Title: ATLAS Data Management over GRID

1
ATLAS Data Management over GRID

Alexei Klimentov , Pavel Nevski and Torre Wenaus,
BNL
HEPiX spring 2006
Rome, Apr 5th 2006

2
Distributed Data Management

Very first assumption
Raw computing power and storage capacity is
everywhere.
For ATLAS it is 3 Grids, 88 sites with 8K CPUs
and 2PBs of disks
Distributed computing power
Distributed storage capacities
Data is stored on different storage systems using
different access technologies
So we have not just Grid of resources it is a
Grid of technologies (ML)
Do we have tools to manipulate terabytes of data
What do we need
High performance and reliable data movement
Manage information about data location
Manage information about replicating data
Support the multiple Grid flavors, the Grid
specifics must be hidden from the user

3
ATLAS average Tier-1 Data Flow (2008)
D.Barberis
Tape
Inbound from T0 to T1 58.6MB/s (no HI
data) Outbound to T0 2.5MB/s
Tier-0
diskbuffer
CPUfarm
Data access for analysisESD, AODm
diskstorage
4
ATLAS Distributed MC Production
Production per cite
3 Grids 20 countries 69 sites 260000 Jobs 2
MSi2k.months
5
ATLAS Data Management - Don Quijote

The second generation of ATLAS DDM system (DQ2,
M.BrancoD.Cameron)
Moved to dataset based approach
Technicalities
Datasets an aggregation of files plus
associated metadata
Datablock a frozen (permanently immutable)
aggregation of files for the purposes of
distributing
Global services
No global physical file replica catalog
global dataset repository
global dataset location catalog
Local Site services
Per grid/site/tier providing logical to physical
file name mapping. implementations of this
catalog are Grid specific. Currently all local
catalogues are deployed per ATLAS site/SE
Transfer service (currently gLite FTS) and
transient database of queued transfers. Triggers
file transfers, handling all necessary
bookkeeping.
Subscription
Any site can subscribe to dataset
The new version of dataset is automatically made
available on site
All managed data movement in the system is
automated using the subscription system
Notification
When content of dataset is modified, the sites
subscribing to it are notified and data is moved
accordingly

6
DQ2 Architecture
7
ATLAS Distributed Data Management
DQ2production status

The production version 0_1_4
(development version 0_2_x)
Deployed on 7 T-1s VO boxes and on T-2s mostly in
US
Central DataBase and services located at CERN
Integrated with Panda (US ATLAS production and
analysis jobs execution system)
In use for ATLAS Tile Calorimeter commissioning
data
Under test for Distributed Analysis

8
ATLAS Data Distribution with DQ2
9
Data Handling in Panda Production
10
Data Flow in a Commissioning Project
Project
Proj. Mgr, SwInG
Proj. Mgr
partition
CondDB
TDAQ
ATLAS Commissioning
ATLAS CdataAgent
IS
meta-info
RunControl
Recorder
RunParams panel
DS Selection Catalogue
EventStorage
ATLAS DDM
AMI
End users Proj. Mgr
Raw data files meta
DQ2
local DAQ disks
Raw data files meta
Raw datasets
Data transfer agent
CDR
Point-1
T0
SE (CASTOR)
A.Klimentov
11
Lessons we learned

Grid of technologies (and sometimes it is too
complex)
We gain if consolidate data on Tier-1s for
permanent data storage
CPU resources are associated with storage
resources
No gain of using large (TB) sets of data over
Grid
No gain of running very long (weeks) jobs over
Grid
Performance issues depends not so much from the
network, but from source and destination storage
systems

12
More Information

ATLAS Computing TDR
http//atlas-proj-computing-tdr.web.cern.ch/atlas-
proj-computing-tdr/Html/Computing-TDR.htm
DDM
https//uimon.cern.ch/twiki/bin/view/Atlas/Distrib
utedDataManagement
Panda
https//uimon.cern.ch/twiki/bin/view/Atlas/PanDA

Write a Comment

User Comments (0)