RProteomics Interoperability Review - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

RProteomics Interoperability Review

Description:

De facto standard by Sashimi. Instrumentation. Data processing. Separation technique ... http://sashimi.sourceforge.net/software_glossolalia.html. ScanFeatures ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 16
Provided by: NCI1
Category:

less

Transcript and Presenter's Notes

Title: RProteomics Interoperability Review


1
RProteomicsInteroperability Review
  • Patrick McConnell1,Salvatore Mungal1, Richard
    Haney1, Mark Peedin1
  • 1Duke Comprehensive Cancer Center

2
RProteomics Team
  • Duke, ICR Developer
  • Patrick McConnell, Project lead
  • Richard Haney, Architect and developer of
    statistical systems
  • Salvatore Mungal, Middle-tier Java developer
  • Mark Peedin, Database developer
  • Northwestern University, Collaborator
  • Simon Lin, Proteomics domain expert
  • Oregon Health Sciences University, ICR Adopter
  • Shannon McWeeney
  • Veena Rajaraman
  • University of Pennsylvania, ICR Adopter
  • David Fenstermacher
  • Craig Street
  • University of North Carolina, Collaborator
  • Cristoph Borchers

3
RProteomics Overview
  • Statistical routines to analyze proteomics data
  • MS and LC-MS data
  • Integrative Cancer Research Workspace
  • Proteomics Special Interest Group
  • Adopters
  • University of Pennsylvania
  • Oregon Health Sciences University
  • Architecture Workspace
  • Reference implementation for analysis services
  • Project ends October 1, 2005

4
RProteomics Focus
  • We are NOT concerned with
  • LIMS and database management
  • Identification of proteins
  • Database searching and pattern matching
  • Statistical Modeling of Spectra (SMOS)
  • Data standards
  • Generic statistical data and spectral data
  • Analysis
  • Spectra processing and analysis audit trail

5
RProteomics Details
  • Statistical routines to analyze MS and LC-MS data
  • Background removal, denoising, alignment/calibrati
    on, normalization, peak finding, isotope
    deconvolution, peptide quantitation, high-level
    modeling
  • Open Statistical Services (OSS)
  • Bridging Java and R with web services and XML
  • Proteomics database
  • Data model (XML Schema), object model (Java), XML
    database (Mako)
  • Grid services
  • Data access, data transformation, and analysis
  • Graphical user interface
  • Load and query data, run analytics, view plots
  • Future plans
  • Multilevel and hierarchical statistics
  • Dynamic GUI
  • Peptide/protein identification services
  • Integrate with PIR, Q5, caMassClass, proteomics
    repository

6
Statistics Denoising
7
Modeling Process
XMI Class Attribute Association Definition As
sociation multiplicity
XML Schema Complex types Elements with in-lined
complex types Elements of complex
types Elements of simple types Attributes Anno
tations minOccurs/maxOccurs
  • Mapping Rules
  • Remove Type from end
  • Map schema types
  • Add id attitribute
  • Move single-attribute classes to an attribute

8
Data Model Overview
  • mzXML
  • De facto standard for encoding raw proteomics
    data
  • ScanFeatures
  • Generically encodes proteomics data and
    analytical results
  • AML-routine
  • Describes analysis routines in great detail
  • Metadata to help researchers understand grid
    services
  • AML-run
  • Keeps track analysis routines (provenance)
  • Hooks input data and output data together
  • StatML
  • Generic encoding for statistical data (lists,
    arrays, and scalars)
  • Service parameters
  • Parameters to the operations of the grid service

9
mzXML
  • Encodes raw spectra data (mz-intensity pairs)
  • De facto standard by Sashimi
  • Instrumentation
  • Data processing
  • Separation technique
  • Spot description
  • m/z scan values
  • MALDI acquisition
  • Data integrity
  • Other candidates mzData and mqData

http//sashimi.sourceforge.net/software_glossolali
a.html
10
ScanFeatures
  • Support for statistical data
  • Metadata
  • project, patient, fraction, replicate, date,
    outcome, scanNumber, scanStartPos, scanStepSize
  • Features
  • Name
  • m/z, intensity, peakWidth, peakHeight, etc.
  • Controlled vocabulary
  • Value
  • Scalar or array
  • Support for a hierarchy of features

11
AML-routine
  • Authors
  • Writers of the routine and contact information
  • Routine name
  • One-word description
  • Title
  • Formal one-sentence description
  • Aliases
  • Alternate names
  • Ontological description
  • Controlled vocabulary (CDEs)
  • Textual description
  • Human-readable description (1-3 sentences)
  • External references
  • Journals, websites
  • Pseudo-code
  • Approximation of actual code
  • Source code
  • Actual source code
  • Routine Signature
  • Input/output parameters
  • Contract
  • Pre and post conditions
  • Usage
  • Textual description and examples of how the
    routine is to be used
  • Implementation
  • OS/hardware/compiler on which the routine is
    implemented
  • Caveats
  • Any user comments not previously covered
  • Benchmarks
  • Theoretical performance and links to performance
    runs

12
AML-run
AML-run
lsid
AML-routine
inputs
outputs
lsid
lsid
data
data
lsid
lsid
data
data
lsid
lsid
data
data
user submit time complete time
13
AML-run Cont.
data
data
data
data
aml-run
aml-run
Input can be used for more than one run
Different levels combined together
data
data
aml-run
aml-run
Output can be input for another run
Track back the final output to find what analysis
was performed and what data was used
data
14
StatML
  • Scalars
  • String, boolean, integer, long, float, double
  • Arrays
  • Base64 encoded (turn 3 bytes into 4)
  • Square arrays (multi-dimensional)
  • Integer, long, float, double
  • Lists
  • Can contain lists, arrays, scalars
  • Null
  • Lack of value

Efficient, textual encoding
15
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com