The Cancer Biomedical Informatics Grid caBIG 2006 CODATA Conference Beijing, China Mary Jo Deering , - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

The Cancer Biomedical Informatics Grid caBIG 2006 CODATA Conference Beijing, China Mary Jo Deering ,

Description:

NCI Center for Bioinformatics. Cancer Biomedical Informatics Grid (caBIGTM) ... Learn about the existing bioinformatics infrastructure, caCORE, at https://ncicb. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 46
Provided by: cod5
Category:

less

Transcript and Presenter's Notes

Title: The Cancer Biomedical Informatics Grid caBIG 2006 CODATA Conference Beijing, China Mary Jo Deering ,


1
The Cancer Biomedical Informatics Grid
(caBIG) 2006 CODATA ConferenceBeijing,
China Mary Jo Deering , Ph.D.Director,
Informatics DisseminationNCI Center for
Bioinformatics

2
Cancer Biomedical Informatics Grid (caBIGTM)
  • Common, widely distributed infrastructure
    permits research community to focus on
    innovation
  • Shared vocabulary, data elements, data models
    facilitate information exchange
  • Collection of interoperable applications
    developed to common standards
  • Raw published cancer research data is available
    for mining and integration

3
Cancer Biomedical Informatics Grid (caBIGTM)
  • caBIG infrastructure
  • and tools are widely
  • applicable outside cancer
  • caBIG components may be used by anyone

4
caBIG principles
  • Open source
  • Open access
  • Open development
  • Federated

5
caBIGs Informatics Core
6
caBIG Operational Structure
7
2006 Clinical Trial Tools Development Activities
  • caAERS
  • Patient Study Calendar
  • Lab Data Hub
  • Making other CTMS systems caBIG compatible

8
Clinical Research IT Infrastructure
External Reporting
Clinical Systems
Clinical Trials
TranslationService
etc.
HL7-v3, Janus
HL7-v3, Janus
HL7- v2.x,other
Labs, EMR, Tissue, etc.
HL7- v3
Lifecycle Management
ClinicalResearchInformation Exchange
HL7 trans-actionaldatabase
HL7/CAM SDK
Adverse Events
FDA
Participant Registry
SPONSOR
EDC
NCI
Clinical Data Mgmt
other
PatientHealthRecord
ResearchDataWarehouse
De-identification Services
9
Integrated Cancer Research
  • Microarray Repositories
  • Data Analysis Statistics
  • Informatics for Proteomics
  • Genome Annotation
  • Pathways Tools
  • Translational Tools
  • Population Sciences and Cancer Control

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Tissue Banks and Pathology Tools
  • caTISSUE Core (WU) Core specimen handling and
    tracking functions
  • caTISSUE Clinical Annotation Engine (UPMC) -
    Annotation of specimens with clinical data
  • caTIES (UPMC) - Text extraction and
    de-identification of surgical pathology reports

14
caTISSUE Core Register Specimen Group
15
caIMAGE Cancer Images Database
  • caIMAGE allows researchers to submit and retrieve
    images and annotations.
  • Images are streamed for efficient access.
  • Researchers can search images based on tissue and
    diagnosis and experiment information.
  • Use of common terminology originating from the
    NCI Enterprise Vocabulary Server (EVS).

16
(No Transcript)
17
caBIG Compatibility
  • caBIG is all about Interoperability
  • Key is to create tools for sharing information
  • Extensible infrastructure
  • Expandable and modular software to plug into
    existing systems so current development efforts
    are not wasted
  • Ensures partnerships
  • Encourages relationships between academic,
    government and industry
  • Evolving
  • Compatibility guidelines are being translated
    into certification procedures
  • Compatibility Guidelines at https//cabig.nci.nih.
    gov/guidelines_documentation

18
Interoperability
ability of a system to
and
use
access
the parts or equipment of another system
Semanticinteroperability
Syntacticinteroperability
19
caCORE
20
Professional Documentation
21
caCORE Software Development Kit Components
  • UML Modeling Tool (any with XMI export)
  • Semantic Connector (concept binding utility)
  • UML Loader (model registration in caDSR)
  • Codegen (middleware code generator)
  • Security Adaptor (Common Security Module)
  • caCORE SDK generates a caBIG-Silver compliant
    system

22
(No Transcript)
23
Grid Technology in caBIGTM
  • What is a Grid
  • A Grid is a system that coordinates resources
    that are not subject to centralized control using
    standard, open, general-purpose protocols and
    interfaces to deliver nontrivial qualities of
    service. - Ian Foster Grid Today, July 20, 2002
  • Grid Technology supplies two useful components to
    a network of computers
  • Advertising Inform the network about the
    capabilities of new systems
  • Discovery Allow users to find resources that
    meet their needs.
  • The caGrid project is the Grid in caBIGTM the
    actual infrastructure that data and analytical
    services will use to interoperate.
  • The current caGrid is version 0.5 caGrid 1.0 in
    December.
  • The combination of data and analytical service
    nodes in caBIGTM produced a design that utilizes
    a variety of standard Grid technologies including
    the Globus Toolkit and OGSA-DAI, DQP, GRAM, etc.

24
Test bed Infrastructure
caGrid 0.5 Test Bed
25
Cancer Biomedical Informatics Grid (caBIGTM)
  • caBIG infrastructure
  • and tools are widely
  • applicable outside cancer
  • caBIG components may be used by anyone

26
Contact Information
  • Mary Jo Deering, Ph.D
  • Director for Informatics Dissemination 
  • NCI Center for Bioinformatics
  • National Cancer Institute
  • National Institutes of Health, USDHHS
  • 6116 Executive Blvd. - 403
  • Rockville, MD  20852
  • (o) 301-496-3458
  • (f) 301-480-4222
  • deeringm_at_mail.nih.gov

27
Additional Background and Detail
  • The following slides were not included in the
    presentation.

28
Current caBIG community
  • NCI-designated Cancer Centers (50)
  • Academic Centers (integrated into broader
    biomedical infrastructure)
  • Stand-alone (community leaders)
  • Community outreach
  • NCI Divisions and Programs
  • National Institutes of Health
  • Other Government Agencies
  • Industry
  • International Groups
  • Standards development organizations
  • U.K.s National Cancer Research Institute
  • 900 active participants

29
Four Domain Workspaces and two Cross Cutting
Workspaces have been launched
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
Addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
Provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
Provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
Provides for the sharing and analysis of in vivo
imaging data.
Responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery.
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
Developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
30
Strategic Level Workspaces
Data Sharing and Intellectual Capital
Addresses issues related to the sharing of data,
applications and infrastructure both within the
consortium and in the larger cancer research
community.
Training
Developing strategies for providing training in
the use of the caBIG developed resources
including on-line tutorials, workshops, and
training programs.
Strategic Planning
Assists in identifying strategic priorities for
the development and evolution of the caBIGTM
effort.
31
REMBRANDT Building a robust translational
research framework for brain tumor
studiesREpository of Molecular BRAin Neoplasia
DaTa
http//rembrandt.nci.nih.gov
32
Rembrandt Knowledgebase
caIntegrator -DataMart
Expression array data
Better understanding Better treatments
Clinical data
caBIG Analytic Tools
33
caBIGTM Compatibility Guidelines
  • The caBIGTM compatibility guidelines are designed
    to insure that systems designed in a Federated
    environment are still interoperable on the
    caBIGTM Grid, both syntactically and semantically
  • Since achieving interoperability is a process,
    caBIGTM recognizes four levels of compatibility,
    starting from Legacy (not interoperable) through
    Bronze, Silver and Gold (fully interoperable)
  • caBIGTM compatibility is all about interfaces
    rather than the scientific content of the system

34
SYNTACTIC
caBIG Compatibility Guidelines
35
Common Data Elements
  • What do all those data classes and attributes
    actually mean, anyway?
  • Data descriptors or semantic metadata required
  • Computable, commonly structured, reusable units
    of metadata are Common Data Elements or CDEs.
  • NCI uses the ISO/IEC 11179 standard for metadata
    structure and registration
  • Semantics all drawn from Enterprise Vocabulary
    Service resources

36
Cancer Data Standards Repository (caDSR)
  • Basic caDSR unit of metadata information to
    describe a datum is a Common Data Element or CDE
  • Enterprise-class system for storing metadata,
    with APIs that give runtime access to both
    metadata and semantics
  • Implements the ISO 11179 standard, a flexible
    model for describing arbitrary metadata
  • Used to describe metadata associated with
    clinical case report forms and UML Models

37
Enterprise Vocabulary Services
  • Controlled vocabulary resources for caCORE and
    the cancer research community
  • Vocabulary Products and Services
  • NCI Thesaurus
  • NCI Metathesaurus
  • External vocabularies
  • NCI Thesaurus - controlled vocabulary source for
    metadata
  • Has excellent coverage of cancer terminology
  • Expands based on needs for additional terminology
  • Based on concepts rather than terms
  • Each concept has a unique identifier or CUI with
    definitions and synonym

38
Data Standards in caBIG
  • The V/CDE workspace is responsible for
    facilitating the development and ratification of
    Data Standards for caBIG
  • Data Standards can be Vocabularies or Common Data
    Elements (CDEs) with their associated controlled
    terminology
  • A caBIG Data Standard is, in effect, a
    pre-approved mechanism for semantically
    modeling an attribute or series of attributes in
    a data object. Ideally, having a standard
    available shortens development time for other
    projects that need to present such data
  • Whenever possible, caBIG adopts standards that
    are derived from other standards bodies (HL7,
    ISO, USPS, UPU, W3C, etc.) and in general use
    within our community
  • In the last year, the V/CDE workspace has
    developed a consensus driven mechanism for
    approving Data Standards and applied it to an
    increasing number of CDEs

39
caCORE Architecture
Clients
Data
Middleware
Web Application Server
HTTP Clients
A P I
Biomedical Data
Interfaces Java SOAP XML
A P I
SOAP Clients
Common Data Elements
Domain Objects Gene, Disease, etc.
Domain Objects Gene, Disease, Agent, etc.
Data Access Objects
A P I
Perl Clients
Enterprise Vocabulary
Data Access Objects
A P I
Java Applications
Authorization
40
Use cases for caGrid
  • Advertisement
  • Service Provider composes service metadata
    describing the service and publishes it to grid.
  • Discovery
  • Researcher (or application developer) specifies
    search criteria describing a service of interest
  • The research submits the discovery request to a
    discovery service, which identifies a list of
    services matching the criteria, and returns the
    list.
  • Invocation
  • Researcher (or application developer)
    instantiates the grid service and access its
    resources

41
caGrid 0.5 Services
  • Data Services
  • caBIO Gene-centric bioinformatics objects
  • NCICB-Rockville, MD
  • caArray MAGE-OM compliant microarray repository
  • NCICB-Rockville, MD
  • Lombardi Cancer Center-Georgetown, DC
  • gridPIR Protein Information Resource
  • Lombardi Cancer Center-Georgetown, DC
  • caTIES Text Information Extraction System for
    pathology reports
  • UPMC-Pittsburgh, PA
  • SNP500 Polymorphism database with population
    frequencies
  • NCI Core Genotyping Facility-Gaithersburg, MD
  • caMOD II Cancer Model Organism Database
  • NCI Mouse Models of Human Cancer Consortium
    (MMHCC)
  • Analytical Service
  • RProteomics Statistical analysis of proteomics
    data
  • Duke-Durham, NC

42
caGrid Service-Oriented Architecture
Functions
Management
Metadata Management
ID Resolution
Schema Management
Workflow
Security
Resource Management
Service Registry
Service
Service Description
Grid Communication Protocol
Transport
OGSA Compliant - Service Oriented Architecture
43
Enabling Technology
  • The NCI provides freely available enabling
    technology for caBIGTM compatibility
  • These technologies are distributed under a
    non-viral open source license.
  • caCORE
  • Enterprise Vocabulary Services (EVS)
  • Cancer Data Standards Repository (caDSR)
  • caCORE Software Development Kit
  • When complete process is followed, the outcome is
    a caBIG Silver compliant data system.

44
How can my research benefit from caBIG Tools?
  • Everything developed by the program is open
    source and freely available
  • Training is available at https//cabig.nci.nih.gov
    /training
  • The latest versions of all the software developed
    as part of the project can be obtained from the
    caBIG project gforge site
  • http//gforge.nci.nih.gov

45
caBIG Getting Involved
  • To get involved with caBIG
  • Track caBIG activities on the NCIs caBIG
    website, https//cabig.nci.nih.gov/
  • Attend caBIG Annual Meeting, February 5-7, 2007,
    Wardman Park Marriott, Washington, DC
  • Learn about the existing bioinformatics
    infrastructure, caCORE, at https//ncicb.nci.nih.g
    ov/core
  • Download currently available caBIG tools from
    the caBIG website at https//cabig.nci.nih.gov/in
    ventory
  • Sign up for the caBIG mailing list at
    http//list.nih.gov/archives/cabig_announce.html
  • Please visit the main caBIG website for more
    information https//cabig.nci.nih.gov/
Write a Comment
User Comments (0)
About PowerShow.com