caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005 - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005

Description:

Peptides can undergo a second MS to help ... products/CategoryDetails.jsp?hierarchyID=101&category3rd=112051&trail=no ... Wizard for grid-enabling existing code ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 20

Provided by: arumaniman

Learn more at: http://bmi.osu.edu

Category:

more less

Transcript and Presenter's Notes

Title: caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16th -18th, 2005

1
caGrid Version 0.5 Reference ImplementationRProte
omicscaBIG Architecture Workspace Face to
FaceGeorgetown UniversityAugust 16th -18th,
2005

Patrick McConnell
Duke Comprehensive Cancer Center
patrick.mcconnell_at_duke.edu
Shannon Hastings
Ohio State University
hastings_at_bmi.osu.edu

2
Outline

High Level Overview of Proteomics
Data Model
Project Architecture
Process of getting to Silver level compliance
Functionality Exposed to Grid
Process of Grid Enablement
Demo/Screenshots
Lessons Learned / Technical Difficulties / Wish
List
Acknowledgements

3
Proteomics Overview

Goal
Find biomarker
Build predictive model
Proteins are split into peptide fragments
Mass is measured by time-of-flight (TOF)
Mass of peptides can be used to identify
proteins
Peptides can undergo a second MS to help
identification

http//www.appliedbiosystems.com/catalog/myab/Stor
eCatalog/products/CategoryDetails.jsp?hierarchyID
101category3rd112051trailno
4
Proteomics Data

A modest study can be on the order of 10 GB of
data

5
Project Overview

RProteomics is a development project in the
Proteomics SIG of the ICR Workspace
Developing analytical routines for proteomics
data
Denoising, background removal, peak
identification, spectral alignment,
normalization, peptide quantitation
Focus is on analytics
NOT databases, LIMS, protein identification
RProteomics is a critical step in the proteomics
pipeline
LIMS -gt repository -gt RProteomics -gt
classification -gt protein identification
RProteomics provides integration
Q5 classification has been integrated

6
Statistics Background Removal
7
Statistics Denoising
8
Statistics Spectral Alignment
9
Statistics Protein Quantitation
10
Data Model

mzXML
Encodes raw spectra data (mz-intensity pairs)
Some metadata about instrumentation
Utilizes base64 encoding for binary data
scanFeatures
Encodes analysis results as a set of features
Some metadata about the experiment
Utilizes base64 encoding for binary data
Service parameters
JpegImage
Lsid
WindowSize
ThreshholdMultiplier

11
Project Architecture
12
Project Architecture
13
Process of getting to Silver level compliance

Programming and messaging interfaces
Apache Axis for web services
Wrapped functionality with Java interfaces that
made sense
Vocabularies, terminologies, and ontologies
Data elements
Wrote tool for XML Schema to XMI conversion
Manually curated UML
Went through semantic connecting process
Information models
XML Schema to begin with, so information models
were easy

14
Functionality Exposed to the Grid

Analytical service no security requirements
Discuss its input and output and what it does
scientifically
Functionality to be exposed
20 more statistical methods
Data access methods, translation methods

15
Process of Grid Enablement

Process
Creation/extraction of data types using XML
Schema
Upload data types into caGrid GME
Use the Analytical Toolkit Portal to create and
modify grid service interface.
Implement the server stub that is generated by
making the appropriate calls into the original
non-grid-enabled RProteomics application.
Compile, and deploy.

16
Demo and/or Screenshots

Demonstration of RProteomics GUI with grid
functionality

17
Lessons Learned / Technical Difficulties / Wish
List

Think grid from the beginning
Have an idea what the service interface will be
ahead of time
Wrap parameters with objects
Technology is complex
XML, Schema, CDEs, Globus, Web Services, etc.
Installation is complex
Have to have working knowledge of Tomcat, Axis,
Ant, environment variables, etc.
Need to have compatible versions of each
component, esp. Java 1.4.2_04
Wish list
Wizard for grid-enabling existing code
Documentation of every aspect of installation and
functionality
Clone Shannon for each development project

18
Lessons Learned / Technical Difficulties / Wish
List

Starting with a non-grid-enabled application
which has been tested and is stable made wrapping
it to a grid service easier to debug.
Need a standard mechanism for dealing with large
data objects.
Some sort of lazy loaded object/pointer would be
sufficient.
Integration of toolkit portal into some standard
IDEs might make development even easier.

19
Acknowledgements