caBIG Pilot Project Selection Process - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

caBIG Pilot Project Selection Process

Description:

Need: Enable investigators and research teams nationwide to combine and leverage ... Terrapin Systems. Panther Informatics. NCICB. Ken Buetow. Peter Covitz ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 36
Provided by: ggf2
Learn more at: http://www.ggf.org
Category:

less

Transcript and Presenter's Notes

Title: caBIG Pilot Project Selection Process


1
0
Cancer Biomedical Informatics Grid (caBIG) An
Approach towards Data Access and Integration
Avinash Shanbhag Director, Core Infrastructure
Engineering National Cancer Institute Center for
Bioinformatics
2
National Cancer Institute 2015 Goal
  • Relieve suffering and death due to cancer by the
    year 2015

3
Origins of caBIG
  • Need Enable investigators and research teams
    nationwide to combine and leverage their findings
    and expertise in order to meet NCI 2015 Goal.
  • Strategy Create scalable, actively managed
    organization that will connect members of the
    NCI-supported cancer enterprise by building a
    biomedical informatics network and data can be
    seamlessly shared

4
caBIG Challenges
  • Handle diversity of data types
  • Precise Meaning of data
  • Provide local hosting of data
  • Local access control
  • Provide tools to publish and access data
    easily
  • High Performance computing will be needed in
    future

5
Interoperability
  • ability of a system to access and use the parts
    or equipment of another system

Semanticinteroperability
Syntacticinteroperability
6
How to Achieve Interoperability for Data Systems?
  • Well Documented public API access to data
  • Based on object oriented abstraction of
    underlying data
  • No particular technology or tool specified
  • Abstraction layer must be derived using widely
    accepted standards
  • Model Driven Architecture
  • Information Model is the Metadata of the data
    and needs to be persisted and accessible via API
  • Need to be able to unambiguously and
    programmatically determine the meaning of data

7
OMG Model Driven Architecture (MDA) Approach
  • Analyze the problem space and develop the
    artifacts for each scenario
  • Use Cases
  • Use Unified Modeling Language (UML) to
    standardize model representations and artifacts.
    Design the system by developing artifacts based
    on the use cases
  • Class Diagram Information Model
  • Sequence Diagram Temporal Behavior
  • Use meta-model tools to generate the code

8
Limitations of MDA
  • Limited expressivity for semantics
  • No facility for runtime semantic metadata
    management

9
caCORE
  • Syntactic and Semantic Integration
  • MDA Plus a whole lot more!

10
caCORE
11
Use Cases
  • Description
  • Actors
  • Basic Course
  • Alternative Course

12
Bioinformatics Objects
13
Common Data Elements
  • What do all those data classes and attributes
    actually mean, anyway?
  • Data descriptors or semantic metadata required
  • Computable, commonly structured, reusable units
    of metadata are Common Data Elements or CDEs.
  • NCI uses the ISO/IEC 11179 standard for metadata
    structure and registration
  • Semantics all drawn from Enterprise Vocabulary
    Service resources

14
Description Logic
Enterprise Vocabulary
Concept Code
Relationships
Preferred Name
Definition
Synonyms
15
Semantic metadata example Agent
  • ltAgentgt
  • ltnamegtTaxollt/namegt
  • ltnSCNumbergt007lt/nSCNumbergt
  • lt/Agentgt

16
Why do you need metadata?
Class/ Attribute Example Object Data CIA Metadata NCI Metadata
Agent A sworn intelligence agent a spy Chemical compound administered to a human being to treat a disease or condition, or prevent the onset of a disease or condition
Agent nSCNumber 007 Identifier given to an intelligence agent by the National Security Council Identifier given to chemical compound by the US Food and Drug Administration Nomenclature Standards Committee
Agent name Taxol CIA code name given to intelligence agents Common name of chemical compound used as an agent
17
Computable Interoperability
Agent
Drug
name
id
nSCNumber
NDCCode
CTEPName
approvalDate
FDAIndID
approver
IUPACName
fdaCode
My model
Your model
18
Cancer Data Standards Repository
  • ISO/IEC 11179 Registry for Common Data Elements
    units of semantic metadata
  • Client for Enterprise Vocabulary metadata
    constructed from controlled terminology and
    annotated with concept codes
  • Precise specification of Classes, Attributes,
    Data Types, Permissible Values Strong typing of
    data objects.

19
caCORE Tools
  • UML Loader automatically register UML models as
    metadata components
  • CDE Curation Fine tune metadata and constrain
    permissible values with data standards
  • Form Builder Create standards-based data
    collection forms
  • CDE Browser search and export metadata
    components
  • Common Security Module Provides role based
    security

20
caCORE Software Development Kit
  • UML Modeling Tool (any with XMI export)
  • Semantic Connector (concept binding utility)
  • UML Loader (model registration in caDSR)
  • Codegen (middleware code generator)
  • Security Adaptor (Common Security Module)

caCORE SDK generates syntactically and
semantically interoperable data service system
21
caGrid
caCORE meets grid technology!
22
Use cases not satisfied by caCORE alone
  • Advertisement
  • Service Provider composes service metadata
    describing the service and publishes it to grid.
  • Discovery
  • Researcher (or application developer) specifies
    search criteria describing a service of interest
  • The research submits the discovery request to a
    discovery service, which identifies a list of
    services matching the criteria, and returns the
    list.
  • Invocation
  • Researcher (or application developer)
    instantiates the grid service and access its
    resources

23
OTHER TOOLKITS
NCI
OTHER caBIG SERVICE PROVIDERS
Cancer Center
Cancer Center
Cancer Center
Cancer Center
Cancer Center
24
caGrid Components
  • Leverage existing technologies
  • caDSR, EVS, Mobius GME Common data elements,
    controlled vocabularies, schema management
  • Globus Toolkit (currently version 4.0.1)
  • Core grid services infrastructure
  • Service deployment, service registry, invocation,
    base security infrastructure
  • Additional Core Infrastructure
  • Higher-level security services (Dorian)
  • Grid service access to metadata components
    (caDSR, GME, etc)
  • Workflow, Identifier services
  • Service Provider Tooling (Introduce)
  • Graphical service development and configuration
    environment
  • Abstractions from service infrastructure for Data
    and Analytical services
  • Deployment wizards
  • Client Tooling
  • High-level APIs for interacting with core
    components and services
  • Graphical Tools

25
caGrid 0.5 Architecture(May be updated for 1.0)
Functions
Quality of Service
Business Process
Semantic service
ID Resolution
GUMS
Analytical
UI
Security
Resource Management
caDSR
Service Registry
Service
GSI
OGSA-DAI
GT3
GME
Index
Service Description
caDSR
Grid Communication Protocol
GLOBUS Toolkit
GT3
CAMS
Transport
EVS
GT3
26
Data Object Semantics, Metadata, and Schemas
  • Object oriented, APIs, well-defined data types
  • Classes defined in UML and converted into ISO/IEC
    11179, registered in the caDSR
  • Definitions drawn from Enterprise Vocabulary
    Services (EVS), relationships semantically
    described
  • XML serialization of objects adhere to XML
    schemas registered in the Global Model Exchange
    (GME)

27
Introduce Toolkit
  • A framework which enables fast and easy creation
    of caGrid compatible services whether they are
    data, analytical, custom, or core services.
  • Provide easy to use graphical service authoring
    tools.
  • Hide all grid-ness from the developer so that
    they can concentrate on the domain expert
    implementation.
  • Utilize best practice layered grid service
    architectures.
  • Handle all service architecture requirements of
    the caGrid.
  • Strong service interface data typing
  • Metadata and service registration
  • Grid security integration

28
Data Service Access on caGrid
  • Specialization of caGrid grid services to expose
    data through a common query interface
  • Present an object view of data sources
  • Exposed objects are registered in caDSR and their
    XML representation in GME
  • Queries made with caBIG Query Language (CQL)
    Query objects
  • Results returned as objects (or identifiers)
    nested in a CQL Query Result Set

29
Data Service Query Language
  • Specialization of caGrid grid services to expose
    data through a common query interface
  • Present an object view of data sources
  • Exposed objects are registered in caDSR and their
    XML representation in GME
  • Queries made with CQL Query objects
  • Results returned as objects (or identifiers)
    nested in a CQL Query Result Set

30
Data Service Interface
public CQLQueryResultsType processQuery(CQLQueryTy
pe query)
  • Data Providers only responsibility is to
    implement CQL over their local data resource
  • A default implementation will be provided for
    caCORE SDK created systems
  • caGrid provides grid service implementation to
    invoke providers CQL implementation
  • Service provides all features necessary for
    compliance, such as advertisement of data service
    metadata, and security integration

31
Data Service Query Scenario
  1. Client builds a CQL Query
  2. CQL Query is serialized and submitted to the Grid
    Data Service
  3. Grid Data Service deserializes the CQL Query
    Object and processes it
  1. Data Source is queried by the Grid Data Service
  2. Grid Data Service Builds a CQL Result Set
  3. Result Set is serialized and returned to the
    client
  4. Client deserializes result set
  5. Result set is iterated with client tools to
    retrieve objects

32
Federated and Aggregated Queries
  • Componentized library being developed to
    facilitate limited federating and aggregating
    queries
  • An extension language used to describe
    distributed queries
  • Library creates and executes a Query Plan for the
    distributed query, using multiple CQL queries to
    targeted data services

33
Data Service Client Tooling
  • APIs provided to discover available data services
    on the grid based on client-defined criteria
    (such exposed data models and concepts)
  • Object-Oriented API for building queries,
    querying a given data service, and processing the
    results
  • Client tools available to iterate query result
    sets
  • Object iterator deserializes XML into registered
    objects
  • XML iterator simply returns XML documents

34
Acknowledgements (caGrid Team)
  • Ohio State University - Department of BioMedical
    Informatics
  • Dave Ervin
  • Shannon Hastings
  • Tahsin Kurc
  • Stephen Langella
  • Scott Oster
  • Joel Saltz
  • Argonne National Lab / University of Chicago
  • William Allcock
  • Jarek Gawor
  • Ravi Madduri
  • Frank Siebenlist
  • Michael Wilde
  • Duke University
  • A. Jamie Cuticchia
  • Patrick McConnell
  • Georgetown University
  • Colin Freas
  • Paul A. Kennedy
  • Chad La Joie
  • SAIC (http//www.saic.com)
  • Manav Kher
  • ScenPro/Semantic Bits
  • Vinay Kumar
  • David Wellborn
  • Valerie Bragg
  • Booz Allen Hamilton (http//www.bah.com)
  • Arumani Manisundaram
  • Michael Keller
  • Reechik Chatterjee

35
Acknowledgements
NCI Andrew von Eschenbach Anna Barker Wendy
Patterson OC DCTD DCB DCP DCEG DCCPS CCR
Industry Partners SAIC BAH Oracle ScenPro Ekagra A
pelon Terrapin Systems Panther Informatics
NCICB Ken Buetow Peter Covitz George Komatsoulis
Denise Warzel Frank Hartel Sherri De
Coronado Dianne Reeves Gilberto Fragoso Jill
Hadfield Leslie Derr
Write a Comment
User Comments (0)
About PowerShow.com