Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

Data Management

Description:

HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,... Grid Tutorial, RC RUG, 18-19 September 2006 ... rfio for CASTOR access protocol. SRM ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 38
Provided by: emidiog
Category:

less

Transcript and Presenter's Notes

Title: Data Management


1
Data Management
  • Ron Trompert
  • SARA
  • Grid Tutorial, 18-19 September 2006

2
Outline
  • Storage Infrastructures
  • SRM
  • Storage Elements in gLite
  • Low Level Data Management
  • LCG File Catalog (LFC)
  • Datamanagement CLIs and APIs
  • Examples
  • FTS

3
Storage Infrastructures
  • Disk-only
  • Hierarchical storage management (HSM)
  • policy-based management of file backup and
    archiving in a way that uses storage devices
    economically and without the user needing to be
    aware of when files are being retrieved from or
    stored on backup storage media.
  • The hierarchy represents different types of
    storage media, such as disks systems, optical
    storage, or tape, each type representing a
    different level of cost and speed of retrieval
    when access is needed. For example, as a file
    ages in an archive, it can be automatically moved
    to a slower but less expensive form of storage.
  • HSM Software TSM, DMF, CASTOR, Enstore, HPSS,

4
Storage Infrastructures
  • HSM example at SARA

5
SRM
  • SRM standard
  • SRM implementations provide uniform access to
    heterogeneous storage resources on the Grid
  • Storage Resource Managers
  • SRM is a control protocol for
  • Space reservation
  • File management
  • Pinning
  • Lifetime management
  • Replication
  • Protocol negotiation

6
SRM
  • SRM implementation
  • SRM I/F is implemented as a web service
  • Implementations
  • dCache (disk/HSM)
  • DPM (disk)
  • CASTOR (HSM)
  • SRB (disk/HSM)
  • .
  • SRM Examples
  • srmRm
  • srmLs
  • srmPrepareToPut
  • srmBringOnline
  • srmCopy
  • srmGetTransferProtocols
  • .

7
Storage Elements in gLite
  • Classic SE
  • No SRM
  • Will become deprecated in the autumn of this year
  • Transfer protocols gridftp
  • Storage type disk
  • DPM
  • SRM
  • Transfer protocols gridftp, secure rfio
  • Storage type disk
  • dCache
  • SRM
  • Transfer protocols gridftp, gsidcap
  • Storage type disk, HSM

8
Low Level Data Management
  • GridFTP (all SEs)
  • globus-url-copy file///home/ron/file
    \gsiftp//srm.grid.sara.nl/pnfs/grid.sara.nl/data
    /dteam/file
  • Third party transfer
  • globus-url-copy gsiftp//hostA/pathA
    gsiftp//hostB/pathB
  • Also edg-gridftp-ls, edg-gridftp-rm,
    edg-gridftp-mkdir etc.
  • Uberftp
  • Interactive gridftp client
  • ftp commands
  • Gsi authentication

9
Low Level Data Management
  • Gsidcap (dCache SEs)
  • dccp -p 2000025000 /tmp/file \gsidcap//srm.grid
    .sara.nl22128/pnfs/grid.sara.nl/data/dteam/file
  • 2000025000 is derived from GLOBUS_TCP_PORT_RANGE
    environment variable
  • Secure rfio
  • rfcp /path/myfile \t2se01.physics.ox.ac.uk/dpm/p
    hysics.ox.ac.uk/home/dteam/file
  • Srmcp ( ! Classic SEs )
  • Srmcp file////tmp/file \srm//srm.grid.sara.nl8
    443//pnfs/grid.sara.nl/data/dteam/file

10
Information system
  • LDAP-based
  • Ldap servers running on service nodes (GRIS/BDII)
  • Ldap servers collecting the information for a
    site (site BDII)
  • Ldap servers collecting the information for all
    sites (BDII)
  • Need to set environment variable LCG_GFAL_INFOSYS
  • Needs to be set to a BDII
  • lcg-infosites
  • Example finding an SE
  • gt lcg-infosites --vo tutor se
  • Avail Space(Kb) Used Space(Kb) Type SEs
  • --------------------------------------------------
    --------
  • 214632 1901097784 n.a tbn15.nikhef.nl
  • 626880000 1163120000 n.a tbn18.nikhef.nl
  • 488106596 368854044 n.a mu2.matrix.sara.nl

11
Information system
  • lcg-info
  • For more advanced searchesFor example, finding
    out where to put your filesgtlcg-info --list-se
    --query 'SEmu2.matrix.sara.nl --attrs Path-
    SE mu2.matrix.sara.nl- Path
    /flatfiles/SE00/tutor
  • ldapsearch
  • For the real troopers among us

12
LFC
  • LFC stands for LCG File Catalog
  • LCG stands for LHC Computing Grid
  • LHC stands for Large Hadron Collider
  • User and programs produce and require data
  • Resource Broker can send (small amounts of) data
    to/from jobs Input and Output Sandbox. Not
    recommended for large amounts of data
  • Data is stored on the grid
  • Located in Storage Elements
  • Several replicas of one file in different sites
  • Accessible by Grid users and applications from
    anywhere
  • Locatable by the WMS/RB (data requirements in
    JDL)
  • Also
  • Data may be copied from/to local filesystems
    (WNs, UIs) to the Grid or opened remotely on the
    SE (GFAL,gsidcap,rfio).

13
LFC
  • LFC
  • Keeps track of the location of copies (replicas)
    of files on the Grid

14
Name conventions
  • Logical File Name (LFN)
  • An alias created by a user to refer to some item
    of data, e.g. lfn/grid/tutor/mydir/myfile
  • Globally Unique Identifier (GUID)
  • A non-human-readable unique identifier for an
    item of data, e.g.
  • guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
  • Site URL (SURL) (or Physical File Name (PFN) or
    Site FN)
  • The location of an actual piece of data on a
    storage system, e.g. srm//pcrd24.cern.ch/flatfil
    es/cms/output10_1 (SRM)
    sfn//lxshare0209.cern.ch/data/alice/ntuples.dat
    (Classic SE)
  • Transport URL (TURL)
  • Locator of a replica access protocol
    understood by a SE, e.g.
  • rfio//lxshare0209.cern.ch//data/alice/ntuples.d
    at

15
Naming conventions
  • How do they fit together?
  • LFC holds the mapping LFN-GUID-SURL

LFC
16
LFC
17
LFC
  • LFN acts as main key in the database. It has
  • Symbolic links to it (additional LFNs)
  • Unique Identifier (GUID)
  • System metadata
  • Information on replicas
  • One field of user metadata

18
LFC
  • Two kinds of LFC
  • Central LFCFor each VO, one site on the grid
    will publish a global catalog. This will record
    entries (file replicas or dataset entities)
    across the whole of the grid.
  • Local LFCLocal catalogs record the file replicas
    stored at that site's SEs only.

19
LFC
  • Provides
  • User exposed transaction C/C API ( auto
    rollback on failure)
  • Python wrapper provided (python module lfc)
  • Command line tools with administrative
    functionality
  • Hierarchical unix-like namespace and namespace
    operations for LFNs
  • lfn/grid/ltvo namegt/mydir/myfile
  • lfc-mkdir, lfc-chmod
  • Integrated GSI Authentication Authorization
  • Access Control Lists (Unix Permissions and POSIX
    ACLs)
  • Checksums
  • Sessions (multiple operations inside a single
    transaction )
  • Bulk operations (inside transactions )

20
LFC
Summary of the LFC Catalog commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
21
LFC
C/C API Low level methods (many POSIX-like)
lfc_setacl lfc_setatime lfc_setcomment lfc_seterrb
uf lfc_setfsize lfc_starttrans lfc_stat lfc_symlin
k lfc_umask lfc_undelete lfc_unlink lfc_utime send
2lfc
lfc_deleteclass lfc_delreplica lfc_endtrans lfc_en
terclass lfc_errmsg lfc_getacl lfc_getcomment lfc_
getcwd lfc_getpath lfc_lchown lfc_listclass lfc_li
stlinks
lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclas
s lfc_opendir lfc_queryclass lfc_readdir lfc_readl
ink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr
lfc_access lfc_aborttrans lfc_addreplica lfc_apiin
it lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_c
losedir lfc_creat lfc_delcomment lfc_delete
22
LFC Interfaces
  • Integration with GFAL and lcg_utils APIs
  • ? lcg-utils/GFAL access the catalog in a
    transparent way
  • Integration with the WMS
  • The RB can locate Grid files allows for data
    based match-making
  • Jdl file
  • InputData "lfn/grid/tutor/MyFile"

23
Data Management CLIs APIs
  • lcg_utils lcg- commands lcg_ API calls
  • Provide (all) the functionality needed by the LCG
    user
  • Transparent interaction with file catalogs and
    storage interfaces when needed
  • Abstraction from technology of specific
    implementations
  • Grid File Access Library (GFAL) API
  • Adds file I/O and explicit catalog interaction
    functionality
  • Still provides the abstraction and transparency
    of lcg_utils

24
Data Management CLIs APIs
  • lcg-utils commands Replica Management

lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to Done for a given SURL in a SRM request
lcg-utils commands File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
25
Data Management CLIs APIs
  • lcg-utils C/C API

lcg-cp lcg-lr
lcg-cr lcg-ra
lcg-del lcg-rf
lcg-rep lcg-uf
lcg-sd lcg-la
lcg-aa lcg-lg
lcg-gt
26
Data Management CLIs APIs
  • GFAL
  • Grid storage interactions today require using
    some existing software components
  • The file catalog services to locate valid
    replicas of files in order to
  • Download them to the user local machine
  • Move them from a SE to another one
  • Make job running on the worker node able to
    access and manage files stored on remote storage
    element.
  • The SRM software to ensure
  • Files existence on disk or disk pool (they are
    recalled from mass storage if necessary)
  • Space allocation on disk for new files (they
    are possibly migrated to mass storage later)

27
Data Management CLIs APIs
  • The GFAL Features
  • Hides interactions to the SRM to the end user
  • Provides a Posix-like interface for File I/O
    Operation
  • Posix calls prefixed with gfal_
  • Based on shared libraries (both threaded e
    unthreaded version)
  • Needs only one header file (gfal_api.h) to write
    C applications
  • Supports following protocols
  • file for local access, also lfn/guid
  • dcap, gsidcap and kdcap for dCache access
    protocol
  • rfio for CASTOR access protocol.
  • SRM
  • Access to SRMs in secure mode, i.e. using a valid
    Grid proxy obtained by voms-proxy-init command.

28
Examples
  • Using lcg utils and lfc commands
  • Define the server hostname
  • The LFC server must be published in the BDII
    (LCG_GFAL_INFOSYS)
  • Use environmental variable LFC_HOSTltLFC_server_
    hostnamegt
  • LFC_HOST must be set

29
Examples
  • Listing the entries of a LFC directory
  • lfc-ls -cdiLlRTu --class --comment
    --deleted --display_side --ds path
  • where path specifies the LFN pathname (mandatory)
  • Remember that LFC has a directory tree structure
  • /grid/ltVO_namegt/ltyou create itgt
  • All members of a VO have read-write permissions
    under their directory
  • You can set LFC_HOME to use relative paths
  • gt lfc-ls /grid/tutor/me
  • gt export LFC_HOME/grid/tutor
  • gt lfc-ls -l me
  • gt lfc-ls -l -R /grid

LFC Namespace
Defined by the user
-l long listing -R list the contents of
directories recursively Dont use it!
30
Examples
  • Creating directories in the LFC
  • lfc-mkdir -m mode -p path...
  • Where path specifies the LFC pathname
  • Remember that while registering a new file (using
    lcg-cr, for example) the corresponding
    destination directory must be created in the
    catalog beforehand.
  • Examples
  • gt lfc-mkdir /grid/tutor/me
  • You can just check the directory with
  • gt lfc-ls -l /grid/tutor/me
  • drwxr-xrwx 0 19122 1077
    0 Jun 14 1136 demo

31
Examples
  • Let us copy and register a file using lcg-utils
  • gt lcg-cr --vo tutor -l me/test -d
    mu2.matrix.sara.nl filepwd/test
  • guid7b4efaef-bb0f-42a3-bb6f-bbe35080d105
  • gt lcg-lr --vo tutor lfnme/test
  • sfn//mu2.matrix.sara.nl/flatfiles/SE00/tutor/gene
    rated/2006-09-18/file378fc829-351f-4558-8679-9d2ce
    530cbb4
  • gt lfc-ls -l me
  • -rw-rw-r-- 1 30010 2024
    114 Sep 18 1033 test

32
Examples
  • Creating a symbolic link
  • lfc-ln -s file linkname
  • lfc-ln -s directory linkname
  • Create a link to the specified file or directory
    with linkname
  • Examples
  • gt lfc-ln -s /grid/tutor/me/test
    /grid/tutor/aLink
  • Lets check the link using lfc-ls with long
    listing (-l)
  • gt lfc-ls -l
  • lrwxrwxrwx 1 30010 2024
    0 Sep 18 1038 aLink -gt /grid/tutor/me/test

Original File
Symbolic link
33
Examples
  • Adding/deleting metadata information
  • lfc-setcomment path comment
  • lfc-delcomment path
  • lfc-setcomment adds/replaces a comment associated
    with a file/directory in the LFC Catalog
  • lfc-delcomment deletes a comment previously added
  • This is the only metadata (one field) supported
    by the catalog
  • Examples
  • gt lfc-setcomment me/test nice file
  • Lets see what happened
  • gt lfc-ls --comment /grid/tutor/me/test
  • /grid/tutor/me/test nice file

34
Examples
  • Deleting the file
  • lfc-rm
  • lfc-rm removes file/link/directory only from the
    catalog
  • lcg-del
  • Lcg-del removes file from SEs and the lfns/links
    from the catalog
  • Examples, delete all replicas
  • gt lcg-del a --vo tutor guid8e413879-7cb3-4260-af
    9f-6964392da7e8
  • Example, delete only one replica
  • gt lcg-del a --vo tutor s mu2.matrix.sara.nl
    guid8e413879-7cb3-4260-af9f-6964392da7e8

35
File Transfer Service
  • A batch system for submitting datatransfer jobs
  • For data intensive sciences
  • Currently in use in the LCG project

36
FTS
  • Allows for
  • Managed transfers by means of channels to sites
  • Channels are between sites i.e. CERN-SARA for
    example.
  • Site admins can adapt the configuration of
    incoming channels to their site, can switch their
    channel off etc.
  • Set priorities for different VOs.
  • Optimisation of network tuning parametres per
    channel

37
FTS
  • Command line interface
  • glite-transfer-cancel
  • Cancels a file transfer job
  • glite-transfer-list
  • Lists ongoing data transfer jobs
  • glite-transfer-status
  • Displays the status of an ongoing data transfer
    job
  • glite-transfer-submit
  • Submits a new data transfer job
Write a Comment
User Comments (0)
About PowerShow.com