Title: The Cancer Biomedical Informatics Grid caBIG 2006 CODATA Conference Beijing, China Mary Jo Deering ,
1The Cancer Biomedical Informatics Grid
(caBIG) 2006 CODATA ConferenceBeijing,
China Mary Jo Deering , Ph.D.Director,
Informatics DisseminationNCI Center for
Bioinformatics
2Cancer Biomedical Informatics Grid (caBIGTM)
- Common, widely distributed infrastructure
permits research community to focus on
innovation - Shared vocabulary, data elements, data models
facilitate information exchange - Collection of interoperable applications
developed to common standards - Raw published cancer research data is available
for mining and integration
3Cancer Biomedical Informatics Grid (caBIGTM)
- caBIG infrastructure
- and tools are widely
- applicable outside cancer
- caBIG components may be used by anyone
-
4caBIG principles
- Open source
- Open access
- Open development
- Federated
5caBIGs Informatics Core
6caBIG Operational Structure
72006 Clinical Trial Tools Development Activities
- caAERS
- Patient Study Calendar
- Lab Data Hub
- Making other CTMS systems caBIG compatible
8Clinical Research IT Infrastructure
External Reporting
Clinical Systems
Clinical Trials
TranslationService
etc.
HL7-v3, Janus
HL7-v3, Janus
HL7- v2.x,other
Labs, EMR, Tissue, etc.
HL7- v3
Lifecycle Management
ClinicalResearchInformation Exchange
HL7 trans-actionaldatabase
HL7/CAM SDK
Adverse Events
FDA
Participant Registry
SPONSOR
EDC
NCI
Clinical Data Mgmt
other
PatientHealthRecord
ResearchDataWarehouse
De-identification Services
9Integrated Cancer Research
- Microarray Repositories
- Data Analysis Statistics
- Informatics for Proteomics
- Genome Annotation
- Pathways Tools
- Translational Tools
- Population Sciences and Cancer Control
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Tissue Banks and Pathology Tools
- caTISSUE Core (WU) Core specimen handling and
tracking functions - caTISSUE Clinical Annotation Engine (UPMC) -
Annotation of specimens with clinical data - caTIES (UPMC) - Text extraction and
de-identification of surgical pathology reports
14caTISSUE Core Register Specimen Group
15caIMAGE Cancer Images Database
- caIMAGE allows researchers to submit and retrieve
images and annotations. - Images are streamed for efficient access.
- Researchers can search images based on tissue and
diagnosis and experiment information. - Use of common terminology originating from the
NCI Enterprise Vocabulary Server (EVS).
16(No Transcript)
17caBIG Compatibility
- caBIG is all about Interoperability
- Key is to create tools for sharing information
- Extensible infrastructure
- Expandable and modular software to plug into
existing systems so current development efforts
are not wasted - Ensures partnerships
- Encourages relationships between academic,
government and industry - Evolving
- Compatibility guidelines are being translated
into certification procedures - Compatibility Guidelines at https//cabig.nci.nih.
gov/guidelines_documentation
18Interoperability
ability of a system to
and
use
access
the parts or equipment of another system
Semanticinteroperability
Syntacticinteroperability
19caCORE
20Professional Documentation
21caCORE Software Development Kit Components
- UML Modeling Tool (any with XMI export)
- Semantic Connector (concept binding utility)
- UML Loader (model registration in caDSR)
- Codegen (middleware code generator)
- Security Adaptor (Common Security Module)
- caCORE SDK generates a caBIG-Silver compliant
system
22(No Transcript)
23Grid Technology in caBIGTM
- What is a Grid
- A Grid is a system that coordinates resources
that are not subject to centralized control using
standard, open, general-purpose protocols and
interfaces to deliver nontrivial qualities of
service. - Ian Foster Grid Today, July 20, 2002 - Grid Technology supplies two useful components to
a network of computers - Advertising Inform the network about the
capabilities of new systems - Discovery Allow users to find resources that
meet their needs. - The caGrid project is the Grid in caBIGTM the
actual infrastructure that data and analytical
services will use to interoperate. - The current caGrid is version 0.5 caGrid 1.0 in
December. - The combination of data and analytical service
nodes in caBIGTM produced a design that utilizes
a variety of standard Grid technologies including
the Globus Toolkit and OGSA-DAI, DQP, GRAM, etc.
24Test bed Infrastructure
caGrid 0.5 Test Bed
25Cancer Biomedical Informatics Grid (caBIGTM)
- caBIG infrastructure
- and tools are widely
- applicable outside cancer
- caBIG components may be used by anyone
-
26Contact Information
- Mary Jo Deering, Ph.D
- Director for Informatics Dissemination
- NCI Center for Bioinformatics
- National Cancer Institute
- National Institutes of Health, USDHHS
- 6116 Executive Blvd. - 403
- Rockville, MD 20852
- (o) 301-496-3458
- (f) 301-480-4222
- deeringm_at_mail.nih.gov
27Additional Background and Detail
- The following slides were not included in the
presentation.
28Current caBIG community
- NCI-designated Cancer Centers (50)
- Academic Centers (integrated into broader
biomedical infrastructure) - Stand-alone (community leaders)
- Community outreach
- NCI Divisions and Programs
- National Institutes of Health
- Other Government Agencies
- Industry
- International Groups
- Standards development organizations
- U.K.s National Cancer Research Institute
- 900 active participants
29Four Domain Workspaces and two Cross Cutting
Workspaces have been launched
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
Addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
Provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
Provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
Provides for the sharing and analysis of in vivo
imaging data.
Responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery.
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
Developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
30Strategic Level Workspaces
Data Sharing and Intellectual Capital
Addresses issues related to the sharing of data,
applications and infrastructure both within the
consortium and in the larger cancer research
community.
Training
Developing strategies for providing training in
the use of the caBIG developed resources
including on-line tutorials, workshops, and
training programs.
Strategic Planning
Assists in identifying strategic priorities for
the development and evolution of the caBIGTM
effort.
31REMBRANDT Building a robust translational
research framework for brain tumor
studiesREpository of Molecular BRAin Neoplasia
DaTa
http//rembrandt.nci.nih.gov
32Rembrandt Knowledgebase
caIntegrator -DataMart
Expression array data
Better understanding Better treatments
Clinical data
caBIG Analytic Tools
33caBIGTM Compatibility Guidelines
- The caBIGTM compatibility guidelines are designed
to insure that systems designed in a Federated
environment are still interoperable on the
caBIGTM Grid, both syntactically and semantically - Since achieving interoperability is a process,
caBIGTM recognizes four levels of compatibility,
starting from Legacy (not interoperable) through
Bronze, Silver and Gold (fully interoperable) - caBIGTM compatibility is all about interfaces
rather than the scientific content of the system
34SYNTACTIC
caBIG Compatibility Guidelines
35Common Data Elements
- What do all those data classes and attributes
actually mean, anyway? - Data descriptors or semantic metadata required
- Computable, commonly structured, reusable units
of metadata are Common Data Elements or CDEs. - NCI uses the ISO/IEC 11179 standard for metadata
structure and registration - Semantics all drawn from Enterprise Vocabulary
Service resources
36Cancer Data Standards Repository (caDSR)
- Basic caDSR unit of metadata information to
describe a datum is a Common Data Element or CDE - Enterprise-class system for storing metadata,
with APIs that give runtime access to both
metadata and semantics - Implements the ISO 11179 standard, a flexible
model for describing arbitrary metadata - Used to describe metadata associated with
clinical case report forms and UML Models
37 Enterprise Vocabulary Services
- Controlled vocabulary resources for caCORE and
the cancer research community - Vocabulary Products and Services
- NCI Thesaurus
- NCI Metathesaurus
- External vocabularies
- NCI Thesaurus - controlled vocabulary source for
metadata - Has excellent coverage of cancer terminology
- Expands based on needs for additional terminology
- Based on concepts rather than terms
- Each concept has a unique identifier or CUI with
definitions and synonym
38Data Standards in caBIG
- The V/CDE workspace is responsible for
facilitating the development and ratification of
Data Standards for caBIG - Data Standards can be Vocabularies or Common Data
Elements (CDEs) with their associated controlled
terminology - A caBIG Data Standard is, in effect, a
pre-approved mechanism for semantically
modeling an attribute or series of attributes in
a data object. Ideally, having a standard
available shortens development time for other
projects that need to present such data - Whenever possible, caBIG adopts standards that
are derived from other standards bodies (HL7,
ISO, USPS, UPU, W3C, etc.) and in general use
within our community - In the last year, the V/CDE workspace has
developed a consensus driven mechanism for
approving Data Standards and applied it to an
increasing number of CDEs
39caCORE Architecture
Clients
Data
Middleware
Web Application Server
HTTP Clients
A P I
Biomedical Data
Interfaces Java SOAP XML
A P I
SOAP Clients
Common Data Elements
Domain Objects Gene, Disease, etc.
Domain Objects Gene, Disease, Agent, etc.
Data Access Objects
A P I
Perl Clients
Enterprise Vocabulary
Data Access Objects
A P I
Java Applications
Authorization
40Use cases for caGrid
- Advertisement
- Service Provider composes service metadata
describing the service and publishes it to grid.
- Discovery
- Researcher (or application developer) specifies
search criteria describing a service of interest - The research submits the discovery request to a
discovery service, which identifies a list of
services matching the criteria, and returns the
list. - Invocation
- Researcher (or application developer)
instantiates the grid service and access its
resources
41caGrid 0.5 Services
- Data Services
- caBIO Gene-centric bioinformatics objects
- NCICB-Rockville, MD
- caArray MAGE-OM compliant microarray repository
- NCICB-Rockville, MD
- Lombardi Cancer Center-Georgetown, DC
- gridPIR Protein Information Resource
- Lombardi Cancer Center-Georgetown, DC
- caTIES Text Information Extraction System for
pathology reports - UPMC-Pittsburgh, PA
- SNP500 Polymorphism database with population
frequencies - NCI Core Genotyping Facility-Gaithersburg, MD
- caMOD II Cancer Model Organism Database
- NCI Mouse Models of Human Cancer Consortium
(MMHCC) - Analytical Service
- RProteomics Statistical analysis of proteomics
data - Duke-Durham, NC
42caGrid Service-Oriented Architecture
Functions
Management
Metadata Management
ID Resolution
Schema Management
Workflow
Security
Resource Management
Service Registry
Service
Service Description
Grid Communication Protocol
Transport
OGSA Compliant - Service Oriented Architecture
43Enabling Technology
- The NCI provides freely available enabling
technology for caBIGTM compatibility - These technologies are distributed under a
non-viral open source license. - caCORE
- Enterprise Vocabulary Services (EVS)
- Cancer Data Standards Repository (caDSR)
- caCORE Software Development Kit
- When complete process is followed, the outcome is
a caBIG Silver compliant data system.
44How can my research benefit from caBIG Tools?
- Everything developed by the program is open
source and freely available - Training is available at https//cabig.nci.nih.gov
/training - The latest versions of all the software developed
as part of the project can be obtained from the
caBIG project gforge site - http//gforge.nci.nih.gov
45caBIG Getting Involved
- To get involved with caBIG
- Track caBIG activities on the NCIs caBIG
website, https//cabig.nci.nih.gov/ - Attend caBIG Annual Meeting, February 5-7, 2007,
Wardman Park Marriott, Washington, DC - Learn about the existing bioinformatics
infrastructure, caCORE, at https//ncicb.nci.nih.g
ov/core - Download currently available caBIG tools from
the caBIG website at https//cabig.nci.nih.gov/in
ventory - Sign up for the caBIG mailing list at
http//list.nih.gov/archives/cabig_announce.html - Please visit the main caBIG website for more
information https//cabig.nci.nih.gov/