Data Access - PowerPoint PPT Presentation

About This Presentation
Title:

Data Access

Description:

Chris Taylor. Phil Jones. Nisha Vinod. University of Manchester. Simon Hubbard. Steve Oliver ... A.) U.C.L.. David Jones. Christine Orengo. Melissa Pentony (R.A. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 24
Provided by: Luc6176
Category:

less

Transcript and Presenter's Notes

Title: Data Access


1
Data Access Integration in the ISPIDER
Proteomics Grid
  • L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen,
  • A. Jones, N. Martin, A. Poulovassilis, S.
    Hubbard,
  • S. M. Embury, N. W. Paton

2
Overview
  • The ISPIDER project
  • Data Access Integration of Proteomics Resources
  • Challenges
  • Middleware
  • Proteomics resources global schema
  • System architecture query processing
  • Future Work

3
ISPIDER
  • Project Goals
  • Build an integrated platform of proteomic
    resources
  • Use existing resources produce new ones
  • Create clients for querying, visualisation, etc.

4
ISPIDER
  • Objective develop an integrated platform of
    proteome-related resources, using existing
    standards
  • Benefits
  • Access to increased breadth of information
  • More reliable analyses
  • Integration brings added value

5
Challenges
  • Proteomics repositories in disparate locations
  • ?need for distributed solution
  • common access, distributed query processing
  • ?need for integration
  • overlapping data, different representations
  • Data/schemas constantly updated/evolve
  • ? need virtual or hybrid integration
  • ? need schema evolution support

6
Middleware (1/2)
  • OGSA-DAI middleware exposing data sources on
    Grids via web services
  • open-source and extensible
  • uniform access to relational XML data sources
  • supports a variety of operations, e.g.
    querying/updating, data transformation, data
    delivery
  • OGSA-DQP service-based distributed query
    processor
  • supports querying of relational OGSA-DAI data
    sources
  • offers implicit parallelism for data-intensive
    requests

7
Middleware (2/2)
  • AutoMed heterogeneous data transformation and
    integration system
  • subsumes traditional data integration approaches
  • handles various data models easily extensible
  • virtual/materialised/hybrid integration
  • schema evolution
  • data warehousing tools

8
Data Integration Approaches
  • Global-As-View (GAV) approach describe GS
    constructs with view definitions over LSi
    constructs
  • Local-As-View (LAV) approach describe LSi
    constructs with view definitions over GS
    constructs

9
Both-As-View (BAV) Approach
  • Schema transformation approach
  • For each pair (LSi,GS) incrementally modify
    LSi/GS to match GS/LSi

10
BAV Example
  • Transformation pathway consists of primitive
    transformations
  • Pathway contains both GAV LAV definitions
  • Transformations are automatically reversible
  • Metadata in AutoMed Repository

11
Proteomics Resources
  • PEDRo
  • collection of descriptions of experimental data
    sets in proteomics
  • has been used as a format for exchanging
    proteomics data
  • gpmDB
  • contains a large number of proteins and peptide
    identifications
  • initially designed to assist in the validation of
    peptide MS/MS spectra and protein coverage
    patterns
  • PepSeeker
  • developed as part of the ISPIDER project
  • comprehensive resource of peptide/protein
    identifications
  • PRIDE
  • centralised, standards compliant, public
    proteomics repository
  • contains protein/peptide identifications
    evidence supporting them

12
Global Schema
  • Trade-off between
  • being able to answer specific user queries
  • a full integration
  • Properties
  • Based on PEDRos peptide/ protein identification
    section and
  • expanded with information unique in other
    resources
  • Entities identified by LSIDs

13
System Architecture
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

14
System Architecture
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

15
System Architecture
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

16
System Architecture
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

17
System Architecture
  • Sources wrapped with OGSA-DAI
  • AutoMed toolkit wraps OGSA-DAI resources
  • Integration of OGSA-DAI resources
  • Queries submitted to AutoMed QP are evaluated
    with the help of OGSA-DQP

18
Query Processing
  • Query is submitted to AutoMeds GQP
  • Reformulated
  • Optimised
  • AutoMed-DQP Wrapper
  • IQL ? OQL
  • OGSA-DQP evaluates OQL queries
  • OQL result ? IQL result

19
Query Processing
  • Query is submitted to AutoMeds GQP
  • Reformulated
  • Optimised
  • AutoMed-DQP Wrapper
  • IQL ? OQL
  • OGSA-DQP evaluates OQL queries
  • OQL result ? IQL result

20
Summary
  • Proteomics repositories in disparate locations
  • ?need for distributed solution
  • ?need for integration
  • Data/schemas constantly updated/evolve
  • ? need virtual or hybrid integration
  • ? support schema evolution

21
Future Work
  • Schema evolution
  • Evaluation of AutoMed advantage
  • Expose AutoMed functionality to the Grid
  • AutoMed and Taverna integration

22
Future Work
  • Taverna tool for Web Service orchestration in
    workflows
  • Related services may be incompatible
  • Current solution involves writing custom code for
    every pair of WS
  • Use AutoMed toolkit for semi-automatic
    integration of XML Web Services
  • mappings from WS to ontologies
  • automatic integration

23
ISPIDER Project Members
  • Birkbeck College
  • Nigel Martin
  • Alex Poulovassilis
  • Lucas Zamboulis (R.A.)
  • Hao Fan (former R.A.)
  • European Bioinformatics Institute
  • Rolf Apweiler
  • Henning Hermjakob
  • Weimin Zhu
  • Chris Taylor
  • Phil Jones
  • Nisha Vinod
  • University of Manchester
  • Simon Hubbard
  • Steve Oliver
  • Suzanne Embury
  • Norman Paton
  • Carol Goble
  • Robert Stevens
  • Khalid Belhajjame (R.A.)
  • Jennifer Siepen (R.A.)
  • U.C.L.
  • David Jones
  • Christine Orengo
  • Melissa Pentony (R.A.)
Write a Comment
User Comments (0)
About PowerShow.com