Title: OGF24
1Data Area Overview
Erwin Laure ltErwin.Laure_at_cern.chgt David E. Martin
ltmartinde_at_us.ibm.comgt Data Area Directors
2Data Area Goals
- The Data Area groups explore different aspects of
data handling on grids - Access
- Transport
- Management
- Overall Data Architecture developed by OGSA Data
Architecture group - http//www.ogf.org/documents/GFD.121.pdf
3Data Access
- Goals locate and provide seamless access to data
stored on Grids - Data Access and Integration Services (DAIS-WG)
- Base Specs Published for Database Access (GFD
74,75,76) - Implementation in OMII-UK
- Now Working on Data Access Services for RDF Data
Resources - Grid File Systems (GFS-WG)
- Naming Spec Published Resource Namespace
Service (GFD101) - Working on Resource Catalog
- Prototypes from SDSC, UVA, Univ. of Tsukuba
- Data Format Description Language (DFDL-WG)
- XML-based languagefor describing the structure of
binary and textual files and data streams - Simplifying the Concepts and Trying to Remove
Complexity to Shorten Draft Spec - Prototypes from LANL and IBM
- Byte IO (ByteIO-WG)
- Web Service interface for providing "POSIX-like"
file functionality (GFD 87,88) - Spec Finished Comment, Need to Make Small Changes
- Production Version from UVA, Will Be in OMII
4Data Transport
- OGSA Data Movement Interface (OGSA-DMI-WG)
- Discover and negotiate proper data transport
protocols and manage data transport (GFD134) - Working on interoperability
- GridFTP WG (GridFTP-WG)
- Grid enabled FTP protocol
- Spec Published 3 Years Ago (GFD20)
- Many Production Implementations
- Need Experience Report for Full Standard
5Data Management
- Grid Storage Management (GSM-WG)
- Storage Resource Manager (SRM) to provide common
interface to storage resources (GFD129) - Several interoperating implementations in
production use - Working on 3.0 Spec
- Information Dissemination (INFOD-WG)
- Model for Information Dissemination focus on
query-like operations - Base specs published (GFD110)
- Looking at candidates for follow-on Work
- Storage Networking Community Group (SN-CG)
- Led by Vincent Franceschini, Chair of SNIA Board
- Portal to SNIA Work
- Follow-on to EGA Data Provisioning WG
6Data Grid Specifications and Use Cases
- Material provided byAndrew Grimshaw
(grimshaw_at_virginia.edu)
7Outline
- Background The Rule of 3s
- Specifications
- Implementations
8Classic three layer view
9Classic 3-layer name scheme
Abstract name EPI, rebinding
Addresses
Human names
RNS file name 1
File replica 1
WS-name EPR
File replica 2
RNS file name n
File replica m
This is essentially a table
WS-Names are WS-Addresses with optional EPI and
resolver EPR
10Outline
- Background The Rule of 3s
- Specifications
- Implementations
11Six specs
- RNS directory service that maps human names
(strings) to abstract names or addresses (EPRs) - Insert, delete, list
- Can build directed graphs, including trees
- Leaves can be most anything, web pages, ByteIO
endpoints, DMI endpoints, BES resources - RNS 1.1 under development
- WS-Naming A profile on WS-Addressing that
supports identity, abstract name to address
mapping, and rebinding of addresses migration,
failure, and replication transparency - ByteIO think POSIX file/steam, read, write,
stat - WS-DAI query interface onto structured data,
e.g., relational databases or XML databases - SRM Management of data stores
- BES Accepts JSDL documents and executes them
12Outline
- Background The Rule of 3s
- Specifications
- Implementations
13There are several implementations(not a complete
list!)
RNS ByteIO WS-Naming WS-DAI SRM
Genesis II Yes Yes Yes Yes
gFarm Yes planned
EGEE/glite Experimental Prototype Planned? Used by some user communities yes
NeSC Edinburgh yes yes
Globus yes (just rebinding) yes
There are over a dozen OGSA-BES/HPC-BP
implementations .
14Lets see what you can do with these
specifications
- Imagine
- an access layer that consists of a Grid-aware
FUSE file system driver for Linux (both Genesis
II and gFarm have these) or a Grid-aware
Installable File System (IFS) for Windows
(Genesis II has one G-ICING). - a provisioning layer that proxies Windows/Unix
files and directories into the Grid as RNS and
ByteIO endpoints and relational databases as
WS-DAI endpoints. - OGSA-BES endpoints that also support the RNS
specification allowing jobs to be started
simply by copying a JSDL file into the
directory. - a WS-Trust STS endpoint that also supports RNS
15- Users can access Grid resources simply by copying
files, dragging and dropping, etc. - Applications dont need to be re-written to
access the Grid
16You dont have to imagine
17Windows Grid-awre IFS
18Linux Grid-aware FUSE
19Using RNS to name non-file-system components
- BES resources are also RNS directories
- We can schedule a job on a resource simply by
dropping it into the directory
20Use SRM to abstract from Storage implementations
Client
SRM
4
1
2
3
5
Storage
- could use RNS
- give back byte-I/O endpoint
- The client asks the SRM for the file providing an
SURL (Site URL) - The SRM asks the storage system to provide the
file - The storage system notifies the availability of
the file and its location - The SRM returns a TURL (Transfer URL), i.e. the
location from where the file can be accessed - The client interacts with the storage using the
protocol specified in the TURL
21WS-DAI endpoints that support RNS
- To execute a query, copy a text file with the SQL
into the directory that represents the database.
The results of the query are accessible as either
a file (they can be read, catd, or loaded into
an Excel file as a csv), or subsequently queried
as well.
22Mapping data into the Grid
- Links directories and files from source location
to data grid directory and user-specified name - Presents unified view of the data across
platforms, locations, domains, etc. - Data publisher controls authorization policy.
Data clients
Data clients
Data publisher
Data publisher
Data publisher
Windows
Windows
Linux
23Moral of the story
- RNS allows us to place arbitrary resources into a
traditional directed graph/tree structure - FUSE/IFS map RNS namespaces into the local file
system - Users can interact with the grid without knowing
anything about grids
24Data Area Future
- From Data Area Gaps Analysis
- High-level Data Movement
- Caching and Replication
- Integrated Data Management
- Transactions in a Grid
- Recent Interest
- Storage Provisioning
- Virtualization
- Provenance, Integrity, Policy
- Link to Digital Libraries
- Dependencies
- OGSA
- Security IETF, OASIS
- Management DMTF, WSDM/WS-Man Convergence
- WS- OASIS and W3C, WS-RF/WS-T Convergence