The Cancer Biomedical Informatics Grid - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

The Cancer Biomedical Informatics Grid

Description:

The Cancer Biomedical Informatics Grid caBIG: Overview of the Integrative Cancer Research Workspace – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 39
Provided by: cabigN
Category:

less

Transcript and Presenter's Notes

Title: The Cancer Biomedical Informatics Grid


1
The Cancer Biomedical Informatics Grid (caBIG)
Overview of the Integrative Cancer Research
Workspace
Carl Schaefer National Cancer Institute Center
for Bioinformatics March 16, 2006
2
caBIG Introductory Seminars March 2006
  • Topics
  • caBIG Overview March 13
  • Overview of caBIG Activities for Clinical Trials
    and Tissue Banking March 15
  • Overview of caBIG Activities for Integrated
    Cancer Research March 16
  • caBIG Interoperability and Compatibility Basics
  • March 17
  • https//cabig.nci.nih.gov/seminars
  • http//videocast.nih.gov/

3
Agenda
  • Mission and goals (briefly)
  • Overview of Year 1 Year 2 products
  • Example usage scenarios
  • Year 3

4
Cancer Biomedical Informatics Grid (caBIGTM)
  • Common, widely distributed infrastructure
    permits research community to focus on
    innovation
  • Shared vocabulary, data elements, data models
    facilitate information exchange
  • Collection of interoperable applications
    developed to common standards
  • Raw published cancer research data is available
    for mining and integration

5
System Interoperability
  • Need to use common data elements (e.g. mutation
    type)
  • registered in caDSR
  • Need to use common vocabularies (e.g. missense,
    nonsense insertion, )
  • registered in EVS
  • Need to know how data elements are aggregated
    into complex objects (e.g. mutation locus
    mutation type normal allele mutated
    allele)
  • UML model
  • Need to use a standard query/transport protocol
    (e.g. WSDL SOAP or HTTP GET/POST XML)
  • caBIG Compatibility Guidelines at
    https//cabig.nci.nih.gov/guidelines_documentation

6
Four Domain Workspaces and two Cross Cutting
Workspaces have been launched
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
Addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
Provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
Provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
Provides for the sharing and analysis of in vivo
imaging data.
Responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
Developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
7
Strategic Level Workspaces
Data Sharing and Intellectual Capital
Addresses issues related to the sharing of data,
applications and infrastructure both within the
consortium and in the larger cancer research
community.
Training
Developing strategies for providing training in
the use of the caBIG developed resources
including on-line turtorials, workshops, training
programs.
caBIG Strategic Planning
Assists in identifying strategic priorities for
the development and evolution of the caBIG effort.
8
ICR Mission
  • Facilitate translational research by integrating
    clinical and basic research data
  • Produce informatics systems and tools that are
  • interoperable
  • modular
  • well-engineered, well-documented
  • validated

9
Participating Cancer Centers
  • Burnham Institute
  • Cold Spring Harbor
  • Columbia UniversityHerbert Irving
  • DartmouthNorris Cotton
  • Duke University
  • Fox Chase
  • Fred Hutchinson Cancer Research Center
  • Georgetown UniversityLombardi
  • Massachusetts Institute of Technology
  • Memorial Sloan Kettering
  • Meyer L. Prentis-Karmanos
  • New York University

Northwestern UniversityRobert H. Lurie Oregon
Health and Science University Thomas Jefferson
UniversityKimmel University of California San
Francisco University of Chicago University of
IowaHolden University of Michigan University of
North CarolinaLineberger University of
PennsylvaniaAbramson University of South
FloridaH. Lee Moffitt Vanderbilt
UniversityIngram Washington UniversitySiteman Wi
star
10
End-Users
  • Informatics researchers
  • Bench researchers
  • Clinical researchers
  • Patients

11
Typical Project Tasks
  • Use case document (developer, with adopter
    approval)
  • Software requirements specification (developer)
  • Data model (developer, with VCDE WS approval),
    data elements registered in caDSR
  • Code, compatible with caBIG guidelines
    (developer, with Architecture WS approval)
  • Test Procedures (adopter)
  • Installation Guide (developer)
  • Training Plan (adopter)
  • User Guide (adopter)

12
Overview of Year 1 Year 2 Products
13
ICR Special Interest Groups
14
ICR Projects by Domain (SIG)
15
ICR Products by Type
16
Pathways Projects
  • Reactome Data
  • Developer CSHL
  • Adopter MSKCC
  • Pathways Tools (cPath, Cytoscape, BioPAX)
  • Developer MSKCC
  • Adopter OHSU
  • QPACA
  • Developer UCSF
  • Adopter OHSU

17
Proteomics Tools
  • RProteomics
  • Developer Duke
  • Adopters Penn, OHSU
  • Proteomics LIMS
  • Developer Fox Chase
  • Adopter Moffitt
  • Q5
  • Developer Dartmouth
  • Adopter OHSU

18
Genome Annotation
  • GOMiner
  • Developer CCR
  • Adopter Wistar
  • TrAPSS
  • Developer Iowa
  • Adopter Wistar
  • HapMap Data
  • Developer CHSL
  • Adopter Wistar
  • Vertebrate Promoter Data
  • Developer CSHL
  • Adopter MSKCC
  • FunctionExpress
  • Developer Wash U
  • Adopter Wistar
  • Cancer Molecular Pages
  • Developer Burnham
  • Adopter Moffitt
  • Seed
  • Developer U Chicago
  • Adopter Georgetown
  • PIR
  • Developer Georgetown
  • Adopter Penn

19
Microarray Repositories
  • caArray
  • Developer NCICB
  • Adopters Georgetown, NYU, Wistar, Thomas
    Jefferson
  • NCI-60 Data
  • Developer CCR
  • Adopter MSKCC

20
Data Analysis and Statistical Tools
  • DWD
  • Developer UNC
  • Adopter Wistar
  • VISDA
  • Developer Georgetown/VA Tech
  • Adopter Wistar
  • Magellan
  • Developer UCSF
  • Adopter Penn
  • GenePattern
  • Developer MIT
  • Adopter NYU
  • caWorkbench
  • Developer Columbia
  • Adopter Northwestern

21
CDEs and Vocabularies
22
Example Scenarios
23
Example Scenarios
  • Annotate lists of genes, proteins
  • Search for and retrieve array-based data
  • Display expression data on pathway networks
  • Integrate biologically heterogenous data
  • Aggregate data from heterogeneous platforms
  • Build tumor/normal mass-spec classifier
  • Custom analysis and visualization

24
Annotate List of Genes and Proteins
  • Example get physical and functional properties
    and homologies for 1500 proteins detected in
    serum sample
  • Using caBIG standard APIs, query
  • Cancer Molecular Pages Burnham
  • PIR/UniProt Georgetown Now on the grid
  • SEED U. Chicago
  • Retrieve data -- molecular weight, functional
    domains, modified residues, homologies, etc.

25
Search for/Retrieve Array-Based Bata
  • Example Find copy number alteration data and
    gene expression data for cases of invasive ductal
    carcinoma
  • caArray NCICB and CC adopters
  • MAGE-compliant repository for microarray data
  • international standard for array data
  • Oligo arrays, spotted arrays, array CGH,
  • Raw data and (in the future) analyzed data
  • Data in via web-based data annotation forms
  • MIAME 1.1 level annotations using controlled
    terminology from MGED ontology
  • Data out via low-level MAGE-OM API and
    higher-level services API
  • Now on the grid

26
Display Expression Data on Pathways
  • Example highlight functional roles of genes
    overexpressed in glioblastoma multiforme samples
    (compared with normal)
  • Query caArray repositories for availability of
    samples retrieve data in MAGE-ML format.
  • Query cPath and Reactome for network data in
    BioPAX format
  • cPath protein/protein interaction data MSKCC
  • Reactome curated pathways CSHL
  • Using Cytoscape, superimpose epxression data on a
    network with gene expression values displayed
    along a color gradient
  • Cytoscape plugins for cPath, BioPAX, MAGE-ML
    MSKCC
  • Use QPACA UCSF to assess match between
    expression data and pathway membership

27
Integrate Heterogeneous Data
  • Example If we select genes whose mRNA expression
    correlates with an outcome, do copy number
    changes of loci that map close to those genes
    also correlate?
  • Magellan UCSF
  • Allows use of biological annotation information
    to reduce false positives from multiple
    comparisons in high-throughput data
  • qualitative descriptions of biological variables
  • quantitative results of computations
  • Allows user-defined data types (stored as
    entity-value pairs)
  • Interoperable with caArray (for mRNA expression,
    CGH)and caBIO (for genomic location).

28
Build Proteomics Classifier
  • Example given peptide mass-spectra from serum
    samples (100 cases of non small-cell carcinoma,
    100 controls), infer diagnostic profile
  • Retrieve data from future proteomics repository
    in mzXML format.
  • Use RProteomics Duke to e-noise, remove
    background, align peaks
  • Reads mzXML
  • Analysis routines in R with Java wrappers
  • Now on the grid
  • Use Q5 Dartmouth to build the classifier.

29
Aggregate Data from Heterogenous Platforms
  • Example infer differential expression patterns
    for subtypes of breast cancer where available
    data was generated on multiple array platforms by
    multiple institutions
  • FunctionExpress WashU to correlate probes on
    different platforms
  • Distance-Weighted Discrimination UNC Lineberger
  • Tools for combining comparable but distinct types
    of micro-array data sets, with the goal of
    improving statistical power
  • Cross-platform analysis of oligo arrays
    (Affymetrix) and cDNA spotted arrays
  • In tests of DWD, institutional and chip
    clustering disappears while a clear clustering by
    cancer type emerges

30
Custom Analysis and Visualization
  • Example Jointly analyze microarray expression
    profiles, sequences, motifs, and transcription
    factors to identify candidate upstream regulators
    of a particular transcription factor
  • caWorkBench Columbia
  • Customizable, configurable graphical user
    interface
  • Visualization analytical components can be
    plugged in
  • interoperable based on published interfaces
  • Scripting support caScript
  • Java-like programmatic access to components from
    GUI

31
Year 3
  • More focused
  • example informatics for translational research
  • Other new projects
  • BioConductor
  • GeneConnect
  • DAS2 plugin for caCORE
  • possibly a new datatype for caArray
  • Continuing work
  • RProteomics
  • Grid enablement of caWorkBench and GenePattern
  • Proteomics LIMS

32
caBIG/ICR and Other NCI Initiatives
  • caIntegrator Clinical Genomics Object Model
    (CGOM)
  • SNP500Cancer
  • Mouse Proteomics Biomarker Discovery Initiative
  • CGEMS
  • TCGA

33
List of Tools
  • caBIGProgram Update March 2006
  • This issue spotlights caBIG products currently
    available and pending release in 2006 - 2007, and
    highlights the release of caGrid Software Version
    0.5
  • cabig.nci.nih.gov/Program_Updates/cabig_
    March_2006_Program_Update.pdf

34
How can my research benefit from caBIG Tools?
  • Everything developed by the program is open
    source and freely available
  • Training is available at https//cabig.nci.nih.gov
    /training
  • The latest versions of all the software developed
    as part of the project can be obtained from the
    caBIG CVS site
  • http//cabigcvs.nci.nih.gov/viewcvs/viewcvs.cgi/
  • Commercial-grade documentation is provided as
    part of the project, which will be located at the
    project gforge site
  • http//gforge.nci.nih.gov

35
How can I get support for these tools?
  • NCICB Applications Support will coordinate
    support for caBIG tools
  • Live Support Monday Friday 8 am 8 pm Eastern
    Time
  • Telephone support is available Monday to Friday,
    8 am 8 pm Eastern Time, excluding government
    holidays.
  • You may leave a message, send an email or submit
    a support request via the Web at any time.
  • Email ncicb_at_pop.nci.nih.gov
  • Phone 301-451-4384
  • Toll-free 888-478-4423
  • Web http//ncicbsupport.nci.nih.gov

36
caBIG Getting Involved
  • To get involved with caBIG
  • Track caBIG activities on the NCIs caBIG
    website, https//cabig.nci.nih.gov/
  • Attend caBIG Annual Meeting, April 9-11, 2006,
    Hyatt Regency Crystal City, Arlington, Virginia
  • Learn about the existing bioinformatics
    infrastructure, caCORE, at https//ncicb.nci.nih.g
    ov/core
  • Download currently available caBIG tools from
    the caBIG website at https//cabig.nci.nih.gov/in
    ventory
  • Sign up for the caBIG mailing list at
    http//list.nih.gov/archives/cabig_announce.html
  • Please visit the main caBIG website for more
    information https//cabig.nci.nih.gov/

37
Save the Date!
  • The caBIGTM 2006 Annual Meeting
  • April 9-11, 2006
  • Hyatt Regency Crystal City, Arlington, Virginia
  • Plenary sessions 35 break out sessions dozens
    of demonstrations, and posters exhibits
    hackathon
  • Tailored sessions for newcomers April 9 and
    throughout the conference
  • https//cabig.nci.nih.gov/2006_Annual_Meeting

38
Contact Information
  • Carl Schaefer, Ph.D
  • Director for Biomedical Informatics
  • NCI Center for Bioinformatics
  • National Cancer Institute
  • 6116 Executive Blvd., Suite 403
  • Rockville, MD  20852
  • tel 301-435-1535
  • schaefec_at_mail.nih.gov
Write a Comment
User Comments (0)
About PowerShow.com