Title: THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM: COMPONENTS AND OPERATIONS
1THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM
COMPONENTS AND OPERATIONS
2Outline
- ATLAS Computing Model, tiers and clouds
- The ATLAS Distributed Data Management
- architecture and components
- Interaction with external services
- Support for users and groups
- Data Management Operations
- Experience of system commissioning and usage in
production environments - Wrap Up and Conclusions
3The ATLAS Computing Model
- ATLAS computing model is Grid oriented
- High level of decentralization
- Sites are organized in a multi-tier structure
- Hierarchical model
- Tiers are defined by ROLE in the experiment
computing model - Tier-0 at CERN
- Record RAW data
- Distribute second copy to Tier-1s
- Calibrate anddo first-pass reconstruction
- Tier-1 centers
- Manage permanent storage RAW, simulated,
processed - Capacity for reprocessing, bulk analysis
- Tier-2 centers
- Monte Carlo event simulation
- End-user analysis
Tier-0
Online filter farm
RAW ESD AOD
RAW
RAW
Reconstruction farm
ESD AOD
RAW ESD AOD
Tier-1
RAW ESD AOD MC
ESD, AOD
RAW
Tier-2
SelSD, AOD
MC
ESD, AOD
4ATLAS tiers the CLOUD model
- ATLAS sites are organized in clouds
- Every cloud consists of a T1 and several T2s
- Most clouds are defined from geography or
founding - Not really a rule
- Implications of the cloud model
- Services deployment
- Support
5The DDM stack
6The DDM in a nutshell
- The Distributed Data Management
- enforces the concept of dataset
- Units of data placement and replication
- based on a subscription model
- Datasets are subscribed to sites
- A series of services enforce the subscription
- Lookup data location in LFC
- Trigger data movement via FTS
- Validate data transfer
7Dataset Definition and Organization DDM Central
Catalogs
Dataset Repository Catalog
Holds all dataset names and unique IDs ( system
metadata)
Maps each dataset to its constituent files.
Files are identified by a GUID (unique
identifier) and a LFN (logical file name)
Dataset Content Catalog
Dataset Location Catalog
Stores locations of each dataset
Container Catalog
Maintains versioning information and information
on container datasets, datasets consisting of
other datasets
Local Replica Catalog
One catalog per site, providing logical to
physical file name mapping.
8More on datasets
- DDM contains some system metadata for files and
datasets - File sizes and file checksums
- Dataset owner, size,
- But DDM is not a metadata catalog
- ATLAS has a dataset metadata catalog AMI
- Link XXX
- Datasets can be in 3 states
- Open new files can be added
- Closed a datasets version has been tagged. No
more files can be added to such version, but a
new version can be opened and filled. - Frozen no more files can be added (or removed)
from the dataset - Dataset Hierarchy
- Dataset containers allow to define a 1-level
hierarchy - No container of containers
- Datasets can overlap
- The same file can be hold by different datasets
9CENTRAL CATALOGS
SITE
Dataset Name ltmydatasetgt
Entries ltguid1gt ltSURL1/1gt ltguid2gt
ltSURL2/1gt ltgt ltgt
Dataset Content Catalog
Content ltguid1gt ltlfn1gt ltguid2gt ltlfn2gt ltgt
ltgt
CNAF Local Catalog
CNAF Storage
Dataset Name ltmydatasetgt
Dataset Location Catalog
SURL1 SURL2
Dataset Location CNAF,LYON
10Catalogs and Storages in the Cloud Model
- .. i.e how are services really deployed?
- Every site (T1 and T2) hosts a Storage Element
- The Local File Catalog
- Relies on the LCG File Catalog (LFC) middleware
- One LFC per cloud (at T1)
- Contains infos about all files in the T1 and all
T2s of the cloud - Purely a deployment strategy.
The Italian Cloud
CNAF (T1)
CLOUD LFC
SE
CE
NAPOLI
MILANO
LNF
ROMA
SE
SE
SE
SE
CE
CE
CE
CE
11SRM and Space Tokens
- Many Storage Elements implementation
- Some offer disk-only storage, other offer a
gateway to mass storage systems - The Storage Resource Manager (SRM) offers a
common interface to all storages - GridFTP is the common transfer protocol
- Storage specific access protocols
- SRM comes with Space Tokens
- Partitioning of storage resources for different
activities - A DDM site is identified by the Grid Site name
the storage space token
REQUESTS
SRM
CASTOR / dCache DPM / StoRM BestMAN
gridFTP
Access protocols
'CERN-PROD_DATADISK' ,'srm'
'tokenATLASDATADISKsrm//srm-atlas.cern.ch8443/
srm/managerv2?SFN/castor/cern.ch/grid/atlas/atlas
datadisk/,
12Accounting
Accounting with granularity of space token per
site
13Subscriptions
- Subscriptions are dataset placement requests
- Dataset X should be hosted in site A
- Transfer Agents enforce the request for given
site - Resolve the dataset content
- via central catalog
- Look for missing files at destination site
- via destination site LFC
- Finds existing location of missing files
- ask location catalog and source site LFC
- Trigger data movement
- via FTS
- Register destination file in destination LFC
14FTS and Data Movement
- FTS is a point to point File transfer service
- One FTS server per cloud
- FTS channels are defined for privileged paths
- Could be associated with privileged physical
networks - Other transfers happen via normal network routes
- No site multi-hops
- The FTS channel at T1 of cloud X defines channels
for - T1(X)-T2(X) and T2(X)-T1(X)
- T1s-T1(X)
- -T1(X) and -T2s(X)
- CERN-T1s are served from CERN FTS
Tier-0
LFC
FTS server
Tier-1
Channel T0-T1
Channel T1-T1
FTS server
Tier-1
LFC
FTS server
LFC
Channel T1-T2
Tier-2
Tier-2
15Staging Services
- Site Services provide also capability for tape
recalls - Special case of dataset subscription
SRM
Other DDM services
TAPE
CPUs
Disk buffer
Other Storage Elements
DDM Staging service
16Deletion Service
- Allows dataset deletion from a given site
- Consistent between
- Content Catalog
- Location Catalog
- Local File Catalogs
- Storage Elements
17Clients and APIs
- Command Line and Python APIS exist for all basic
operations - Created datasets, register subscriptions, delete
datasets etc - High level tools allow users to
- Upload a dataset in DDM (dq2-put)
- Download a dataset from DDM (dq2-get)
- List content of a DDM dataset (dq2-ls)
18Tracing
- DDM high level tools record relevant information
about datasets access and usage
Successful and Unsuccessful dataset downloads vs
Time
Most requested datasets
19Subscription Request Interface
- Allows to request data placement at a given site
- By dataset name or pattern
- Includes validation and approval steps
20Groups
- We recognize groups as defined in the VOMS server
- https//lcg-voms.cern.ch8443/vo/atlas/vomrs
- Group space will be available at T1s and T2s
- But not all sites will support all groups
- From one to few groups per site
- Every group at a Grid site will have data
placement permission in a DDM site - CERN-PROD_PHYS-HIGGS, TOKYO-LCG2_PHYS-SUSY
- All groups will share the same space token at a
given site - Accounting available as for other DDM sites
- Quota system will be in place based on such
accounting
21Users
- Two types of storage for users permanent and
volatile - Volatile SCRATCHDISK space token
- Available at all T1s and T2s as DDM site
- No quota first come first serve
- But will put something basic in place
- Dataset based garbage collection runs every
month - Relies on DDM deletion service and DDM dataset
catalog information - Permanent LOCALGROUPDISK space
- Not really ATLAS space (out of the ATLAS pledged
resources) - Available in many T1s, T1s and T3s under DDM
- Owner of the space manages access right, policies
etc ..
22Monitoring the DDM dashboard
23Monitoring SLS
24DDM Exports and Consolidation
T0
T0
This is the main data flow But surely not complete
Detector Data
T1
T1
MC Simulation
Reprocessing Reconstruction
T2
T2
T2
T2
T2
25The Common Computing Readiness Challenge
12h backlog Fully Recovered in 30 minutes
All Experiments in the game
MB/s
Subscriptions injected every 4 hours and
immediately honored
T0-gtT1s throughput
MB/s
26Data Exports and Consolidation
Number of replicated files over a one month
period 850K files/day average, paek at 1.5M
files/day
ATLAS Beam and Cosmics data replication to 70
Tier2s
Cosmic Dataset replication time T0-T1s
(completed transfers) 96.4 data replication is
done (92 within 3h)
Cosmic Data Replication (number of datasets)
27DDM and panda
- The ATLAS production system (Panda) relies on DDM
- To deliver input data to processing sites
(ACTIVATED) - To aggregate output at custodial T1(TRANSFERING)
28User Analysis
- Both ATLAS analysis frameworks (pAthena and
Ganga) are integrated with DDM - Jobs go to data, located via DDM automatically
- Output is organized in DDM datasets
- Strategies for output datasets
- Write the output directly from the WN in their
LOCALGROUPDISK (wherever that is) - Pro data aggregation comes for free
- Con not very robust, especially for long
distance uploads - Write the output in SCRATCHDISK at the site where
job runs, then aggregate it via DDM subscription - Pro more robust upload (local to WN)
- Con need to rely on subscription (asynchronous..
slower?) - Pro traffic and load of SRM servers is throttled
- Notice also SCRATCHDISK is volatile pro and
con - Alternatively.. Can dq2-get data from SCRATCHDISK
to local file system.
29Experience from one week of beam data
30Day 1 we were ready
31Data arrived
32We started exporting, and
Throughput in MB/s
- Interference between user activity and
centralized data export operation - Overload of disk server
- But the user was not reading the detector data
Number of errors
33Conclusions
- We have a Data Management system which proved to
work for the major ATLAS production challenges - In use for Data Export, Data Consolidation and
Simulated production since years - But the real challenge now is the support of user
activities - This is very difficult to simulate real life
will provide new challenges and opportunities