Title: gLite Data ServicesData Management'
1gLite Data Services/Data Management.
- UNN Grid Computing Overview
2Acknowledgements
- The slides including the illustrations were
derived from the following sources - Annamaria, M (2008) Architecture of the gLite
Data Management System. First South African Grid
Training, 16th -26th June, Catania, Italy. - Alex V and Markus B (2007) gLite/EGEE in
Practice, ISPDC, 5th 8th July, Hagenberg,
Austria. - Mike Mineter(2008) Overview of gLite, the EGEE
Middleware. Presented at EGEE User Tutorial, 28
29 March, Johannesburg.
3Outline
- Grid Data Management Challenge
- Storage Element Requirements
- Storage Resource Manager(SRM)
- Storage Element Protocols/Types
- Files Naming Conventions
- What is a Catalog?
- Different Types of Catalog
- LFC File Catalog
- LCG utils commands
4gLite Data Services( File Management)
Users
Storage Transfer Replica management Metadata
service
Weve big files to manage and share
My data are in files, and Ive terabytes
Our data are in files, and Ive terabytes
- EGEE data primarily file-based
Resources
Data storage
Network resources
Compute elements
5The Grid DM Challenge
- Need common interface to storage resources
- Storage Resource Manager (SRM)
- Need to keep track where data is stored
- File and Replica Catalogs
- Need scheduled, reliable file transfer
- File transfer service
- Need a way to describe files content and query
them - Metadata service
- Heterogeneity
- Data are stored on different storage systems
using different access technologies - Distribution
- Data are stored in different locations in most
cases there is no shared file system or common
namespace - Data need to be moved between different
locations - Data description
- Data are stored as files need a way to
describe files and locate them according to their
contents
6 Storage Element Requirements
- The Storage Element is the service which allow a
user or an application to store data for future
retrieval - Manage local storage (disks) and interface to
Mass Storage Systems(tapes) like - HPSS, CASTOR, DiskeXtender (UNITREE),
- Be able to manage different storage systems
uniformly and transparently for the user
(providing an SRM interface) - Support basic file transfer protocols
- GridFTP mandatory
- Others if available (https, ftp, etc)
- Support a native I/O (remote file) access
protocol - POSIX (like) I/O client library for direct access
of data (GFAL)
7SRM in an example 1
8SRM in an example 2
I talk to them on your behalf I will even
allocate space for your files And I will use
transfer protocols to send your files there
SRM
9Storage Resource Management Responsibilities
- Data are stored on disk pool servers or Mass
Storage Systems - storage resource management needs to take into
account - Transparent access to files (migration to/from
disk pool) - File pinning
- Space reservation
- File status notification
- Life time management
- The SRM (Storage Resource Manager) takes care of
all these details - The SRM is a single interface that takes care of
local storage interaction and provides a Grid
interface to the outside world - In gLite, interactions with the SRM is hidden by
higher level services (DM tools and APIs)
10 SE Protocols/ Types (1/2)
- gLite 3.0 data access protocols
- File Transfer GSIFTP (GridFTP)
- File I/O (Remote File access) gsidcap
- insecure RFIO
- secured RFIO (gsirfio)
- Classic SE
- GridFTP server
- Insecure RFIO daemon (rfiod) only LAN limited
file access - Single disk or disk array
- No quota management
- Does not support the SRM interface
11 SE Types (2/2)
- Mass Storage Systems (Castor- CERN Advanced
STORage manager) - Files migrated between front-end disk and
back-end tape storage hierarchies - GridFTP server
- Insecure RFIO (Castor)
- Provide a SRM interface with all the benefits
- Disk pool managers (dCache and gLite DPM)
- manage distributed storage servers in a
centralized way - Physical disks or arrays are combined into a
common (virtual) file system - Disks can be dynamically added to the pool
- GridFTP server
- Secure remote access protocols (gsidcap for
dCache, gsirfio for DPM) - SRM interface
12SRM Interactions
13Files Naming Conventions
- Logical File Name (LFN)
- An alias created by a user to refer to some item
of data, - e.g. lfn/grid/gilda/20030203/run2/track1
- Globally Unique Identifier (GUID)
- A non-human-readable unique identifier for an
item of data, e.g. - guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
- Site URL (SURL) (or Physical File Name (PFN) or
Site FN) - The location of an actual piece of data on a
storage system, - e.g. srm//grid009.ct.infn.it/dpm/ct.infn.it/gild
a/output10_1 (SRM) - Transport URL (TURL)
- Temporary locator of a replica access
protocol understood by a SE, - e.g. rfio//lxshare0209.cern.ch//data/alice/n
tuples.dat
14What is a file catalog?
SE
SE
SE
gLite UI
15What is a File Catalog? 2
- Each file has a unique identifier
- Files/directories are organized on a Catalogue
- Similar to a filesystem (Logical File Name)
- There is one Catalogue per VO
- The data can be stored on several Storage
Elements (SE) - The Catalogue hides the actual location
Catalogue
Logical File Name LFN /grid/gilda/dornbirn/file.
txt Storage Resource Manager srm//trigrid-ce01.
unime.it/dpm/unime.it/home/gilda/generated/ 2006-0
9-20/filef026441a-5834-431f-b28d-06cb7e4c784f P
hysical Filename /home/gilda/generated/2006-09-20/
filef026441a-5834-431f-b28d- 06cb7e4c784f
SE
SE
SE
SE
SE
16Different Types of Catalog
- File Catalog
- Filesystem-like view on logical file names
- Keeps track of sites where data is stored
- Conflict resolution
- Replica Catalog
- Keeps information at a site
- (Meta Data Catalog)
- Attributes of files on the logical level
- Boundary between generic middleware and
application layer
Metadata Catalog
Metadata Catalog
Metadata
Metadata
File Catalog
File Catalog
LFN
GUID
Site ID
Site ID
Replica Catalog Site A
Replica Catalog Site B
Replica Catalog Site B
LFN
LFN
GUID
SURL
GUID
SURL
SURL
SURL
17LFC Commands
Summary of the LFC Catalog commands
18lcg utils commands
File Catalog Interaction
19Thank you for listening