Analytical and Data Services Guidelines - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Analytical and Data Services Guidelines

Description:

Analytical and Data Services Guidelines – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 27
Provided by: scott136
Category:

less

Transcript and Presenter's Notes

Title: Analytical and Data Services Guidelines


1
Analytical and Data Services Guidelines
  • Architecture/VCDE WorkspacesJoint Face to
    FaceFebruary 1st-2nd, 2006

Scott Oster Ohio State University oster_at_bmi.osu.e
du
Patrick McConnell Duke Comprehensive Cancer
Center patrick.mcconnell_at_duke.edu
2
Overview
  • Overview of Data and Analytical Services
  • Distinction between Analytical Tool and
    Analytical Service
  • Metadata definition and usage
  • Current UML model for service metadata
  • Need for harmonization
  • Plan to consensus
  • Leveraging existing data standards in caBIG
  • Defacto standards into UML
  • Bridging caDSR and GME
  • Namespace issues (existing standards)
  • Connecting CDEs and Schema types

3
caBIG Services
Analytical Service
Grid-Enabled Client
Tool 1
Tool 2
Research Center
NCICB
Grid Data Service
Tool 3
Tool 4
Grid Portal
Research Center
4
caBIG Services
  • Data Services
  • Data services present an object view of data
    sources
  • Objects exposed as data services comply with
    common data elements registered in the caDSR/EVS,
    and transported as XML using schema types
    registered in GME
  • Currently Query only (no update, insert, or
    delete)
  • Analytical Services
  • Analytical Services are base Globus services
  • Required to be strongly-typed with respect to
    input and output
  • Analytical services input and output objects
    conforming to registered classes in caDSR, and
    schema types registered in GME
  • Graphical tool to automatically create source
    code, configuration files, and build process for
    new analytical services
  • Input and output parameters can be discovered
    from GME

5
Analytical Tool vs. Analytical Service
  • Analytical services provide data back to the grid
  • Analytical tools only consume data from the grid
  • Examples
  • caWorkbench
  • RProteomics

6
Analytical Service Guidelines
  • Inputs and outputs (parameters) defined by
  • Objects with metadata registered in caDSR and
  • Objects with XML Schema defined
  • Parameters defined as objects, not simple data
    elements
  • a.k.a no Java primitives
  • Provide service level metadata, the structure of
    which is defined in the caDSR
  • Internal (non API) classes do not need to be
    registered in the caDSR

7
Analytical Tool Guidelines
  • Inputs defined by
  • Objects with metadata registered in caDSR and
  • Objects with XML Schema defined
  • No output types need be defined in the caDSR
  • No service level metadata must be provided
  • Internal (non API) classes do not need to be
    registered in the caDSR

8
Analytical service and tool open questions
  • Tools that are provided as an API in a
    programming language
  • Example Q5
  • Should tools be a dead-end for data
  • Many tools can output well-defined,
    standards-based objects
  • Example caWorkbench
  • Many tools can abstract analyses into services
  • Example VISDA
  • Should analytical service method signatures be
    reviewed and harmonized
  • Issue raised in interoperability review of
    RProteomics
  • Promote interoperability, plug-and-play analytics
  • Provides context by which to evaluate parameter
    CDEs

9
caBIG Service Description
  • Client and service APIs are object oriented, and
    operate over well-defined and curated data types
  • Objects are defined in UML and converted into
    Administered Components, which are in turn
    registered in the Cancer Data Standards
    Repository (caDSR)
  • Object definitions draw from vocabulary
    registered in the Enterprise Vocabulary Services
    (EVS), and their relationships are thus
    semantically described
  • XML serialization of objects adhere to XML
    schemas registered in the Global Model Exchange
    (GME)
  • All data in caGrid travel between services and
    between client and services as XML documents that
    conform to well-defined schemas stored in GME

10
Current Metadata
  • Metadata and Registry Services
  • Support for Advertisement and Discovery processes
  • Metadata and registry services maintain metadata
    associated with data and analytical services
  • All services register information to an Index
    Service
  • Services can be discovered using semantics of
    their data types
  • Three types of Service Metadata
  • Common Metadata describes generic information
    about service providing Cancer Center
  • Data Service Metadata describes the data exposed
    using terminology and objects from caDSR/EVS
  • Analytical Service Metadata describes the
    supported operations and their inputs and outputs
    using terminology and objects from caDSR/EVS

11
The need for more service-level metadata
  • Why?
  • Find the service you want (discovery)
  • Help understand what a service does (extension of
    advertisement)
  • Types of fields
  • Name
  • Description with concept
  • Keywords
  • For high precision calculations operating
    system, hardware
  • Contact information
  • Method signatures

12
VCDE proposed model for service level metadata
13
VCDE proposed model for service level metadata
cont.
14
Service level metadata next steps
  • Form a cross-cutting working group
  • Evaluate two models, use cases
  • Get input from caGrid team
  • Propose model to VCDE, Architecture, caGrid

15
Bringing existing biomedical standards to caBIG
  • There is a wealth of existing standards in the
    biomedical field
  • The great thing about standards is that there
    are so many to choose from
  • The problem with standards is that there are so
    many to choose from
  • MAGE-OM/MAGE-ML, BioPax, mzXML, etc.
  • Most standards based on XML Schema
  • Or alternate non-UML encodings RDF, OWL,
    Protégé, etc.
  • Translating XML Schema to well defined object
    models in UML is not trivial
  • Passing standards-based XML across the grid using
    the caGrid infrastructure has not been explored

16
Converting from XML Schema to caBIG UML
  • Names of classes and attributes fixed by schema
    (if you actually want to follow the schema)
  • Plurals, poor semantics, contain parent name,
    etc.
  • caGrid requires specific namespace to enter GME
  • The namespace is probably already defined in the
    schema
  • Extension of simple types (e.g. extending String)
  • XML Schema allows such extension, caDSR does not
  • Elements can contain both values (text) and
    sub-elements
  • Examples XHTML, PubMed abstracts
  • caCORE SDK compatibility
  • id attributes, Collection
  • Elements can contain text and have attributes
  • Basically an extension of String, but also with
    attributes
  • XML Schema intentionally very hierarchical
  • End up with a bunch of empty classes
  • XML Schema constructs not supported by UML and/or
    caDSR
  • Example choice
  • Many simple types do not exist in the caDSR
  • Duration, int versus integer, etc.
  • Collections of primitives
  • Cannot model in caDSR with primitive type

17
Potential solutions XSD-gtXMI
  • Preface XMI-gtXSD is much easier
  • You can even do this with EnterpriseArchitect
  • HyperModel XSD-gtUML, UML-gtXSD
  • Defacto standard for XSD-gtUML conversion
  • Plugin to Eclipse
  • Freely available, but not open source
  • XMIGenerator XSD-gtUML
  • Developed at Duke to addresses some deficiencies
    in HyperModel
  • Standalone, command-line based application
  • Open source, freely available
  • XSD-gtJava-gtUML
  • Many tools to do this, but you will get many
    artifacts in the UML

18
XSD-gtJAXB-gtJava-gtEA-gtUML (mzXML)
19
XSD-gtHyperModel-gtXMI (mzXML)
20
XSD-gtHyperModel-gtXMI (pepXML)
21
XSD-gtXMIGenerator-gtXMI (mzXML)
22
Discussion from breakout yesterday
  • Point 1
  • Point 2
  • Point 3

23
Existing Mapping from caDSR to GME
  • In caDSR, each project (application) will have
    its own Classification Scheme (e.g. caCORE). A
    Classification Scheme may define a subproject,
    which is represented as a Classification Scheme
    Item (CSI) (e.g. caBIO). In caGrid 0.5, each CSI
    had its own schema.
  • Each XML schema will be published into the caGrid
    GME service. As the caDSR ensures semantic
    interoperability, the GME ensures programmatic
    data exchange (syntactic) interoperability.

24
From caDSR to GME (cont)
  • The caGrid 0.5 recommendation for assigning
    schema namespaces for caBIG objects is shown
    below
  • For example
  • gme//caTIES.caBIG/3.0/edu.upmc.opi.cabig.caties.d
    ocument.domain
  • This provides a coarse-grain, rule-based mapping
    from caDSR to GME

caDSR
ltClassification Schemegt.ltContextgt/ltClassification
Scheme Versiongt/ltClassification Scheme Itemgt
GME
ltdomaingt /
ltversiongt /
ltnamegt
25
Connecting the caDSR and GME
  • Some applications will need to work at both the
    CDE level and the XML level
  • Examples workflow engine, translational query
    system, etc.
  • There is no defined link between
  • A CDE and an XML element
  • A CDE and an XML element or attribute

Names different
Attribute or element?
What about Collection associations
?
26
Potential solutions
  • Change the caDSR
  • Provide a link from each CDE and attribute to the
    location in the XSD
  • Change the GME
  • Provide a link from each element/attribute in the
    XSD to the caDSR
  • Provide a mapping service
  • Given a context and CDE, give me the XSD
    element/attribute
  • Given an element/attribute and context, give me
    the CDE\
  • Likely we should start a cross-cutting working
    group to address the problem
Write a Comment
User Comments (0)
About PowerShow.com