myGrid Information Repository - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

myGrid Information Repository

Description:

Based on the myGrid Information Model. Stores user and provenance data ... Browses data elements via the mIR service interface. 9/6/09. myGrid Developers' Day ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 16

Provided by: arijitmu

Category:

more less

Transcript and Presenter's Notes

Title: myGrid Information Repository

1
myGrid Information Repository

Arijit Mukherjee
School of Computing Science
University of Newcastle

2
Concept

Based on the myGrid Information Model
Stores user and provenance data
Exposed as a service
Uses ORM technology to map objects onto
relational databases

3
myGrid Information Model

Based on the CCLRC Scientific Metadata model
Encompasses
Design of eScience experiments
People and organizations
Biological and provenance results from
experiments
Computational operations involved
An XML schema conforming to the model defines all
the types.

4
Info Model eScience experiments
5
Info Model people and organization
6
Info Model Experiment Provenance
7
Info Model Operations
8
mIR Service Interface

Document oriented based on the info model
schema
Rich semantics able to capture any level of
complexity in the request
Exposes store/update/get/delete interfaces
Uses Hibernate to store/access objects to/from
the underlying store, hides the database access
complexity, database type, dialect and driver
issues
LSIDs are used as identifiers for each data item
assigned by the myGrid LSID Authority
mIR grabs base LSIDs for all namespaces during
startup

9
mIR Generic Query Interface

Able to send any SQL query to mIR
Uses slightly modified OGSA-DAI WS-I Tech Preview
package
OGSA-DAI abstracts underlying database and driver
issues
Able to make 3rd party deliveries (for example to
URL, files etc.)
Creates a client side library based on
Axis-1.2RC3 for other myGrid components
Returns results in WebRowSet format
Newer versions would remove Axis problems

10
Taverna Plugins

MIR-Taverna-Plug-in
Transparent to the user
Catches events from workflow enactor
Packs data elements into documents (conforming to
info model schema)
Sends requests to mIR for store/update/get
MIR Browser
Launched inside Taverna
Explorer-like user interface for mIR
Browses data elements via the mIR service
interface

11
Myths and facts

Myth Hibernate increases overhead
Fact Imagine inserting 10,000 rows explicitly,
each of which causing updates to several other
tables, let alone all database dialect, driver
issues
Myth OGSA-DAI causes problems, Axis
incompatibility
Fact OGSA-DAI follows OMII (currently Axis
1.2beta) it abstracts database/driver issues,
allows 3rd party deliveries
Myth MIR requires initialization, its
difficult starting to use, user context is
unnecessary
Fact You always need some sort of context
even at ATMs MIR-INIT is a temporary solution
until some MIR-ADMIN is developed you need just
a bit of copy-paste to start using it

12
Real Issues

Info Model does not pose any restrictions on the
request document, so any level of nesting is
permitted which in turn leads to
Performance Issues
Richness of the information model leads to
extreme complexity
Complex request documents affects processing for
a deep-structured document 50 of the processing
time goes to database commit as number of
references grow
Usage of arrays instead of sets leads to
processing complexity (code duplication and
explicit index updates)
47 of the total time can be attributed to Axis
serialization and deserialization for complex
documents

13
Issues (contd.)

Usage of 3rd party assigned LSID as ID and
model complexity together restricts cascaded
store/update
MIRBrowser requires considerable changes
Steps taken to boost performance
Caching base LSIDs at start-up
Configuring hibernate only once at start-up
Start-up cost can be reduced further if the
service is initialized during container start-up
De-normalizing certain tables, 23 gain
Saving data on a PER-WORKFLOW basis (can be set
to PER-PROCESS within a config file), 54 gain in
processing, but increases communication costs
And it does not stop you or slow you down

14
Possible Future Work