Title: Data Management
1Data Management
- Ron Trompert
- SARA
- Grid Tutorial, 18-19 September 2006
2Outline
- Storage Infrastructures
- SRM
- Storage Elements in gLite
- Low Level Data Management
- LCG File Catalog (LFC)
- Datamanagement CLIs and APIs
- Examples
- FTS
3Storage Infrastructures
- Disk-only
- Hierarchical storage management (HSM)
- policy-based management of file backup and
archiving in a way that uses storage devices
economically and without the user needing to be
aware of when files are being retrieved from or
stored on backup storage media. - The hierarchy represents different types of
storage media, such as disks systems, optical
storage, or tape, each type representing a
different level of cost and speed of retrieval
when access is needed. For example, as a file
ages in an archive, it can be automatically moved
to a slower but less expensive form of storage. - HSM Software TSM, DMF, CASTOR, Enstore, HPSS,
4Storage Infrastructures
5SRM
- SRM standard
- SRM implementations provide uniform access to
heterogeneous storage resources on the Grid - Storage Resource Managers
- SRM is a control protocol for
- Space reservation
- File management
- Pinning
- Lifetime management
- Replication
- Protocol negotiation
6SRM
- SRM implementation
- SRM I/F is implemented as a web service
- Implementations
- dCache (disk/HSM)
- DPM (disk)
- CASTOR (HSM)
- SRB (disk/HSM)
- .
- SRM Examples
- srmRm
- srmLs
- srmPrepareToPut
- srmBringOnline
- srmCopy
- srmGetTransferProtocols
- .
7Storage Elements in gLite
- Classic SE
- No SRM
- Will become deprecated in the autumn of this year
- Transfer protocols gridftp
- Storage type disk
- DPM
- SRM
- Transfer protocols gridftp, secure rfio
- Storage type disk
- dCache
- SRM
- Transfer protocols gridftp, gsidcap
- Storage type disk, HSM
8Low Level Data Management
- GridFTP (all SEs)
- globus-url-copy file///home/ron/file
\gsiftp//srm.grid.sara.nl/pnfs/grid.sara.nl/data
/dteam/file - Third party transfer
- globus-url-copy gsiftp//hostA/pathA
gsiftp//hostB/pathB - Also edg-gridftp-ls, edg-gridftp-rm,
edg-gridftp-mkdir etc. - Uberftp
- Interactive gridftp client
- ftp commands
- Gsi authentication
9Low Level Data Management
- Gsidcap (dCache SEs)
- dccp -p 2000025000 /tmp/file \gsidcap//srm.grid
.sara.nl22128/pnfs/grid.sara.nl/data/dteam/file - 2000025000 is derived from GLOBUS_TCP_PORT_RANGE
environment variable - Secure rfio
- rfcp /path/myfile \t2se01.physics.ox.ac.uk/dpm/p
hysics.ox.ac.uk/home/dteam/file - Srmcp ( ! Classic SEs )
- Srmcp file////tmp/file \srm//srm.grid.sara.nl8
443//pnfs/grid.sara.nl/data/dteam/file
10Information system
- LDAP-based
- Ldap servers running on service nodes (GRIS/BDII)
- Ldap servers collecting the information for a
site (site BDII) - Ldap servers collecting the information for all
sites (BDII) - Need to set environment variable LCG_GFAL_INFOSYS
- Needs to be set to a BDII
- lcg-infosites
- Example finding an SE
- gt lcg-infosites --vo tutor se
- Avail Space(Kb) Used Space(Kb) Type SEs
- --------------------------------------------------
-------- - 214632 1901097784 n.a tbn15.nikhef.nl
- 626880000 1163120000 n.a tbn18.nikhef.nl
- 488106596 368854044 n.a mu2.matrix.sara.nl
11Information system
- lcg-info
- For more advanced searchesFor example, finding
out where to put your filesgtlcg-info --list-se
--query 'SEmu2.matrix.sara.nl --attrs Path-
SE mu2.matrix.sara.nl- Path
/flatfiles/SE00/tutor - ldapsearch
- For the real troopers among us
12LFC
- LFC stands for LCG File Catalog
- LCG stands for LHC Computing Grid
- LHC stands for Large Hadron Collider
- User and programs produce and require data
- Resource Broker can send (small amounts of) data
to/from jobs Input and Output Sandbox. Not
recommended for large amounts of data - Data is stored on the grid
- Located in Storage Elements
- Several replicas of one file in different sites
- Accessible by Grid users and applications from
anywhere - Locatable by the WMS/RB (data requirements in
JDL) - Also
- Data may be copied from/to local filesystems
(WNs, UIs) to the Grid or opened remotely on the
SE (GFAL,gsidcap,rfio).
13LFC
- LFC
- Keeps track of the location of copies (replicas)
of files on the Grid
14Name conventions
- Logical File Name (LFN)
- An alias created by a user to refer to some item
of data, e.g. lfn/grid/tutor/mydir/myfile - Globally Unique Identifier (GUID)
- A non-human-readable unique identifier for an
item of data, e.g. - guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
- Site URL (SURL) (or Physical File Name (PFN) or
Site FN) - The location of an actual piece of data on a
storage system, e.g. srm//pcrd24.cern.ch/flatfil
es/cms/output10_1 (SRM)
sfn//lxshare0209.cern.ch/data/alice/ntuples.dat
(Classic SE) - Transport URL (TURL)
- Locator of a replica access protocol
understood by a SE, e.g. - rfio//lxshare0209.cern.ch//data/alice/ntuples.d
at
15Naming conventions
- How do they fit together?
- LFC holds the mapping LFN-GUID-SURL
LFC
16LFC
17LFC
- LFN acts as main key in the database. It has
- Symbolic links to it (additional LFNs)
- Unique Identifier (GUID)
- System metadata
- Information on replicas
- One field of user metadata
18LFC
- Two kinds of LFC
- Central LFCFor each VO, one site on the grid
will publish a global catalog. This will record
entries (file replicas or dataset entities)
across the whole of the grid. - Local LFCLocal catalogs record the file replicas
stored at that site's SEs only.
19LFC
- Provides
- User exposed transaction C/C API ( auto
rollback on failure) - Python wrapper provided (python module lfc)
- Command line tools with administrative
functionality - Hierarchical unix-like namespace and namespace
operations for LFNs - lfn/grid/ltvo namegt/mydir/myfile
- lfc-mkdir, lfc-chmod
- Integrated GSI Authentication Authorization
- Access Control Lists (Unix Permissions and POSIX
ACLs) - Checksums
- Sessions (multiple operations inside a single
transaction ) - Bulk operations (inside transactions )
20LFC
Summary of the LFC Catalog commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
21LFC
C/C API Low level methods (many POSIX-like)
lfc_setacl lfc_setatime lfc_setcomment lfc_seterrb
uf lfc_setfsize lfc_starttrans lfc_stat lfc_symlin
k lfc_umask lfc_undelete lfc_unlink lfc_utime send
2lfc
lfc_deleteclass lfc_delreplica lfc_endtrans lfc_en
terclass lfc_errmsg lfc_getacl lfc_getcomment lfc_
getcwd lfc_getpath lfc_lchown lfc_listclass lfc_li
stlinks
lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclas
s lfc_opendir lfc_queryclass lfc_readdir lfc_readl
ink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr
lfc_access lfc_aborttrans lfc_addreplica lfc_apiin
it lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_c
losedir lfc_creat lfc_delcomment lfc_delete
22LFC Interfaces
- Integration with GFAL and lcg_utils APIs
- ? lcg-utils/GFAL access the catalog in a
transparent way - Integration with the WMS
- The RB can locate Grid files allows for data
based match-making - Jdl file
- InputData "lfn/grid/tutor/MyFile"
23Data Management CLIs APIs
- lcg_utils lcg- commands lcg_ API calls
- Provide (all) the functionality needed by the LCG
user - Transparent interaction with file catalogs and
storage interfaces when needed - Abstraction from technology of specific
implementations - Grid File Access Library (GFAL) API
- Adds file I/O and explicit catalog interaction
functionality - Still provides the abstraction and transparency
of lcg_utils
24Data Management CLIs APIs
- lcg-utils commands Replica Management
lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to Done for a given SURL in a SRM request
lcg-utils commands File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
25Data Management CLIs APIs
lcg-cp lcg-lr
lcg-cr lcg-ra
lcg-del lcg-rf
lcg-rep lcg-uf
lcg-sd lcg-la
lcg-aa lcg-lg
lcg-gt
26Data Management CLIs APIs
- GFAL
- Grid storage interactions today require using
some existing software components - The file catalog services to locate valid
replicas of files in order to - Download them to the user local machine
- Move them from a SE to another one
- Make job running on the worker node able to
access and manage files stored on remote storage
element. - The SRM software to ensure
- Files existence on disk or disk pool (they are
recalled from mass storage if necessary) - Space allocation on disk for new files (they
are possibly migrated to mass storage later)
27Data Management CLIs APIs
- The GFAL Features
- Hides interactions to the SRM to the end user
- Provides a Posix-like interface for File I/O
Operation - Posix calls prefixed with gfal_
- Based on shared libraries (both threaded e
unthreaded version) - Needs only one header file (gfal_api.h) to write
C applications - Supports following protocols
- file for local access, also lfn/guid
- dcap, gsidcap and kdcap for dCache access
protocol - rfio for CASTOR access protocol.
- SRM
- Access to SRMs in secure mode, i.e. using a valid
Grid proxy obtained by voms-proxy-init command.
28Examples
- Using lcg utils and lfc commands
- Define the server hostname
- The LFC server must be published in the BDII
(LCG_GFAL_INFOSYS) - Use environmental variable LFC_HOSTltLFC_server_
hostnamegt - LFC_HOST must be set
29Examples
- Listing the entries of a LFC directory
- lfc-ls -cdiLlRTu --class --comment
--deleted --display_side --ds path - where path specifies the LFN pathname (mandatory)
- Remember that LFC has a directory tree structure
- /grid/ltVO_namegt/ltyou create itgt
- All members of a VO have read-write permissions
under their directory - You can set LFC_HOME to use relative paths
- gt lfc-ls /grid/tutor/me
- gt export LFC_HOME/grid/tutor
- gt lfc-ls -l me
- gt lfc-ls -l -R /grid
LFC Namespace
Defined by the user
-l long listing -R list the contents of
directories recursively Dont use it!
30Examples
- Creating directories in the LFC
- lfc-mkdir -m mode -p path...
- Where path specifies the LFC pathname
- Remember that while registering a new file (using
lcg-cr, for example) the corresponding
destination directory must be created in the
catalog beforehand. - Examples
- gt lfc-mkdir /grid/tutor/me
- You can just check the directory with
- gt lfc-ls -l /grid/tutor/me
- drwxr-xrwx 0 19122 1077
0 Jun 14 1136 demo
31Examples
- Let us copy and register a file using lcg-utils
- gt lcg-cr --vo tutor -l me/test -d
mu2.matrix.sara.nl filepwd/test - guid7b4efaef-bb0f-42a3-bb6f-bbe35080d105
- gt lcg-lr --vo tutor lfnme/test
- sfn//mu2.matrix.sara.nl/flatfiles/SE00/tutor/gene
rated/2006-09-18/file378fc829-351f-4558-8679-9d2ce
530cbb4 - gt lfc-ls -l me
- -rw-rw-r-- 1 30010 2024
114 Sep 18 1033 test
32Examples
- Creating a symbolic link
- lfc-ln -s file linkname
- lfc-ln -s directory linkname
- Create a link to the specified file or directory
with linkname - Examples
- gt lfc-ln -s /grid/tutor/me/test
/grid/tutor/aLink - Lets check the link using lfc-ls with long
listing (-l) - gt lfc-ls -l
- lrwxrwxrwx 1 30010 2024
0 Sep 18 1038 aLink -gt /grid/tutor/me/test
Original File
Symbolic link
33Examples
- Adding/deleting metadata information
- lfc-setcomment path comment
- lfc-delcomment path
- lfc-setcomment adds/replaces a comment associated
with a file/directory in the LFC Catalog - lfc-delcomment deletes a comment previously added
- This is the only metadata (one field) supported
by the catalog - Examples
- gt lfc-setcomment me/test nice file
- Lets see what happened
- gt lfc-ls --comment /grid/tutor/me/test
- /grid/tutor/me/test nice file
34Examples
- Deleting the file
- lfc-rm
- lfc-rm removes file/link/directory only from the
catalog - lcg-del
- Lcg-del removes file from SEs and the lfns/links
from the catalog - Examples, delete all replicas
- gt lcg-del a --vo tutor guid8e413879-7cb3-4260-af
9f-6964392da7e8 - Example, delete only one replica
- gt lcg-del a --vo tutor s mu2.matrix.sara.nl
guid8e413879-7cb3-4260-af9f-6964392da7e8
35File Transfer Service
- A batch system for submitting datatransfer jobs
- For data intensive sciences
- Currently in use in the LCG project
36FTS
- Allows for
- Managed transfers by means of channels to sites
- Channels are between sites i.e. CERN-SARA for
example. - Site admins can adapt the configuration of
incoming channels to their site, can switch their
channel off etc. - Set priorities for different VOs.
- Optimisation of network tuning parametres per
channel
37FTS
- Command line interface
- glite-transfer-cancel
- Cancels a file transfer job
- glite-transfer-list
- Lists ongoing data transfer jobs
- glite-transfer-status
- Displays the status of an ongoing data transfer
job - glite-transfer-submit
- Submits a new data transfer job