Title: OGSITestbed Project
1Integrating OGSA-DAI to computational Grid
workflows
Tamas Kukla, Tamas Kiss, Gabor Terstyanszky Univer
sity of Westminster, UK Peter Kacsuk MTA SZTAKI,
Hungary
2Motivation I.What advantages does the
integration of databases to workflow solutions
provide?
- Several successful Grid workflow systems (e.g.
Taverna, Triana, Kepler, P-GRADE) - composition, orchestration and execution of
computationally intensive processes - Limited data handling capabilities of Grid
workflow solutions - Restricted mainly to file based data
- No or very limited database access
3Motivation II.
Workflow level interoperation of grid data
resources
Grid 1
Grid 2
Workflow engine
DB1
J1
J3
J2
J4
DB2
J5
J Job FS File storage system, e.g. SRB or
SRM DB Database management system
4Why OGSA-DAI?
- Open Grid Services Architecture Data Access and
Integration project is concerned with
constructing middleware to assist with access and
integration of data from separate data sources
via the grid.
- An engineered extensible framework for data
access and integration. - Expose heterogeneous data resources to a grid
through web services. - Interaction with data resources
- Queries and updates.
- Data transformation / compression
- Data delivery.
- Customise for your project using
- Additional Activities
- Client Toolkit APIs
- Data Resource handlers
- A base for higher-level services
- federation, mining, visualisation,
Source GGF 16, Feb 2006 by Neil Chue Hong
5OGSA DAI integration aspectsData staging
Legend
?s Data gathering request specification
- Static databases can be accessed before and
after workflow execution, but they cannot be
accessed at runtime - Semi-dynamic data is accessed during workflow
execution, but the parameters of the OGSA-DAI
request are already specified before execution
and cannot be generated at runtime - Dynamic access the databases at runtime and the
parameters of the request are also generated
during workflow execution
s
e
s
e
?e Data gathering request execution
?s Data uploading request specification
s
s
e
e
?e Data uploading request execution
s
s
e
e
6OGSA DAI integration aspectsSubject of OGSA-DAI
integration
- Auxiliary tool the workflow management system is
extended with an auxiliary tool (typically a
portlet) - Workflow editor enables the workflow editor to
be capable of communicating with databases
exposed via OGSA-DAI services data gathering
during workflow authoring - Workflow engine the workflow engine is enhanced
to be able to execute the OGSA-DAI requests
7OGSA DAI integration aspectsRequest
representation
8OGSA DAI integration aspectsSupported OGSA-DAI
functionalities
- Specific support only a subset of OGSA-DAI
functionalities are supported - higher level of
usability, but restricted functionality - Generic support full support for every OGSA-DAI
functionality could be more complex to use in
specific use-cases
Client integration level
- Coupled OGSA-DAI client becomes part of the
workflow system - Decoupled connection is provided via an
interface through which the client can be invoked
on the behalf of the system
9The targeted OGSA-DAI integration
Static
Semi-dynamic
Data staging
Dynamic
WF Editor
Auxiliary Tool
Subject of integration
WF Engine
OGSA-DAI integration aspects
Port Level
Request representation
Node Level
Specific
Functionality support
General
Coupled
Client integration level
Decoupled
10Implementation environment P-GRADE Portal
- Open source, general purpose, workflow-oriented
computational Grid portal. Supports the
development and execution of workflow-based Grid
applications a tool for Grid orchestration - Based on GridSphere-2
- Easy to expand with new portlets (e.g.
application-specific portlets) - Easy to tailor to end-user needs
- Developed by P-GRADE portal Alliance (lead by
SZTAKI) - Grid services supported by the portal
Service EGEE grids (LCG/gLite) Globus 2 grids
Job execution Computing Element GRAM
File/data storage Storage Element GridFTP server, SRB server
Certificate management MyProxy MyProxy
Information system BDII MDS-2, MDS-4
Brokering Workload Management System (GTbroker)
Job monitoring Mercury Mercury
Workflow job visualization PROVE PROVE
Legacy Code Management GEMLCA GEMLCA
11What is a P-GRADE Portal workflow?
- A directed acyclic graph where
- Nodes represent jobs - either sequential or
parallel programs
- Ports represent input/output files the jobs
expect/produce
- Arcs represent file transfer between the jobs
- Integration at required integration level allow
the submission of a general/specific OGSA-DAI
command line client application to the Grid as a
P-GRADE workflow node
12How to submit the OGSA-DAI client to the Grid?
- Direct submission is not feasible
- Software dependencies
- Complexity for the user
- Requires an application repository integrated to
the workflow engine - GEMLCA
- An application repository extended with a job
submitter - Open source Globus incubator project
- Deployment of a code in the GEMLCA repository
means simply the creation of an XML-based
description file (supported even from a portlet
interface) - User can select previously deployed applications
from the repository and run them with custom
parameter values - GEMLCA is fully integrated to the P-GRADE
workflow engine
13OGSA-DAI integration through GEMLCA
OGSA-DAI node
Workflow
OGSA-DAIservice
Computationalresources
GEMLCArepository
...
submit
OGSA-DAI client
...
Database
OGSA-DAI client
The solution is generic as any workflow engine
can be made capable to communicate with the
GEMLCA service (GT4 based Grid service)
Set custom parameter values
14OGSA-DAI integration through GEMLCA
- OGSA-DAI client applications supporting both
OGSA-DAI 3.0 Axis (WSI) and GT (WSRF) deployed in
the GEMLCA repository - Query client to submit query statements to a
given database exposed by an OGSA-DAI service - Update client to submit update statements to a
given database exposed by an OGSA-DAI service - Request document client to execute general
OGSA-DAI workflows represented as request
documents (database query and update execution,
data transfer, data transformation)
15Using the query client
Selecting Grid
Setting OGSA-DAI service URL
Selecting deployed OGSA-DAI client
Setting Database Resource ID
Selecting computational site
Setting query file
Log file
Results in CSV file
16An Application exampledeveloping a performance
rating framework for UK hospitals - Health Care
Modelling and Informatics Research Group UoW
Executes the given OGSA-DAI query
Generates sampler queries
Analysis on the sample data
Gathering results
17So this is what we have achived Data Transfer
Level Interoperation in P-GRADE
Grid infrastructure
Portal server
GridFTP servers
LOCAL INPUT FILES
User levelstorage
LOCAL INPUT FILES
SRB servers
REMOTE INPUTFILES
LOCAL OUTPUT FILES
REMOTE OUTPUTFILES
LOCAL OUTPUT FILES
Computing resources
Data manipulation Input to workflows Output from
workflows
Workflow level Interoperation of local, GridFTP,
SRM and SRB file catalogues and databases exposed
by OGSA-DAI
Control of remote input/output
OGSA-DAI services
EGEE Storage elements
18How can the UK-e-Science community utilise the
solution?
- Deployed at production level in the NGS P-GRADE
portal - portal URL https//grid2-portal.cpc.wmin.ac.uk8
080 - Information page http//ngs-portal.cpc.wmin.ac.u
k - Please visit our next demonstration session on
the NGS booth Booth 13 Appleton tower - Wednesday 10-12
19Thank you for your attention
Email kisst_at_wmin.ac.uk Website www.cpc.wmin.ac.
uk