myGrid Information Repository - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

myGrid Information Repository

Description:

Based on the myGrid Information Model. Stores user and provenance data ... Browses data elements via the mIR service interface. 9/6/09. myGrid Developers' Day ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 16
Provided by: arijitmu
Category:

less

Transcript and Presenter's Notes

Title: myGrid Information Repository


1
myGrid Information Repository
  • Arijit Mukherjee
  • School of Computing Science
  • University of Newcastle

2
Concept
  • Based on the myGrid Information Model
  • Stores user and provenance data
  • Exposed as a service
  • Uses ORM technology to map objects onto
    relational databases

3
myGrid Information Model
  • Based on the CCLRC Scientific Metadata model
  • Encompasses
  • Design of eScience experiments
  • People and organizations
  • Biological and provenance results from
    experiments
  • Computational operations involved
  • An XML schema conforming to the model defines all
    the types.

4
Info Model eScience experiments
5
Info Model people and organization
6
Info Model Experiment Provenance
7
Info Model Operations
8
mIR Service Interface
  • Document oriented based on the info model
    schema
  • Rich semantics able to capture any level of
    complexity in the request
  • Exposes store/update/get/delete interfaces
  • Uses Hibernate to store/access objects to/from
    the underlying store, hides the database access
    complexity, database type, dialect and driver
    issues
  • LSIDs are used as identifiers for each data item
    assigned by the myGrid LSID Authority
  • mIR grabs base LSIDs for all namespaces during
    startup

9
mIR Generic Query Interface
  • Able to send any SQL query to mIR
  • Uses slightly modified OGSA-DAI WS-I Tech Preview
    package
  • OGSA-DAI abstracts underlying database and driver
    issues
  • Able to make 3rd party deliveries (for example to
    URL, files etc.)
  • Creates a client side library based on
    Axis-1.2RC3 for other myGrid components
  • Returns results in WebRowSet format
  • Newer versions would remove Axis problems

10
Taverna Plugins
  • MIR-Taverna-Plug-in
  • Transparent to the user
  • Catches events from workflow enactor
  • Packs data elements into documents (conforming to
    info model schema)
  • Sends requests to mIR for store/update/get
  • MIR Browser
  • Launched inside Taverna
  • Explorer-like user interface for mIR
  • Browses data elements via the mIR service
    interface

11
Myths and facts
  • Myth Hibernate increases overhead
  • Fact Imagine inserting 10,000 rows explicitly,
    each of which causing updates to several other
    tables, let alone all database dialect, driver
    issues
  • Myth OGSA-DAI causes problems, Axis
    incompatibility
  • Fact OGSA-DAI follows OMII (currently Axis
    1.2beta) it abstracts database/driver issues,
    allows 3rd party deliveries
  • Myth MIR requires initialization, its
    difficult starting to use, user context is
    unnecessary
  • Fact You always need some sort of context
    even at ATMs MIR-INIT is a temporary solution
    until some MIR-ADMIN is developed you need just
    a bit of copy-paste to start using it

12
Real Issues
  • Info Model does not pose any restrictions on the
    request document, so any level of nesting is
    permitted which in turn leads to
  • Performance Issues
  • Richness of the information model leads to
    extreme complexity
  • Complex request documents affects processing for
    a deep-structured document 50 of the processing
    time goes to database commit as number of
    references grow
  • Usage of arrays instead of sets leads to
    processing complexity (code duplication and
    explicit index updates)
  • 47 of the total time can be attributed to Axis
    serialization and deserialization for complex
    documents

13
Issues (contd.)
  • Usage of 3rd party assigned LSID as ID and
    model complexity together restricts cascaded
    store/update
  • MIRBrowser requires considerable changes
  • Steps taken to boost performance
  • Caching base LSIDs at start-up
  • Configuring hibernate only once at start-up
  • Start-up cost can be reduced further if the
    service is initialized during container start-up
  • De-normalizing certain tables, 23 gain
  • Saving data on a PER-WORKFLOW basis (can be set
    to PER-PROCESS within a config file), 54 gain in
    processing, but increases communication costs
  • And it does not stop you or slow you down

14
Possible Future Work
  • Need feedback from real biologists using it
  • Role-based Access (GOLD is investigating)
  • Data Security over the wire and uniform myGrid
    security framework
  • Distributed Query Processing between various mIR
    instances OGSA-DQP can be used
  • Database store/update notifications
  • Store large files elsewhere (SRB/XML?), use ref
    in mIR
  • Configurable canned-query interface
  • Modifications to the information model
  • Some simplifications
  • Some restrictions
  • Some new relationships

15
Thank You
  • Questions/Discussions
Write a Comment
User Comments (0)
About PowerShow.com