Title: Joel Saltz
1(No Transcript)
2- Joel Saltz
- Chair and Professor
- Biomedical Informatics Department
- Professor Computer and Information Science,
Pathology - The Ohio State University
3Overview
- Grid based clinical research infrastructure
- OSU Grid Software Infrastructure
- CALGB and cardiac clinical research studies
4Grid based clinical trials supportWorldwide
Scope for Clinical Research Studies
1000s of potential clinical research sites,
different studies involve different subsets of
sites Different sites can use different names for
the same entity Semantic grid, SNOMED,
LOINC Support for authentication, encryption,
anonymization, role based data access Support for
grid data aggregation Grid based coordination of
clinical studies Must leverage pre-existing
medical IS systems each of these are complex
trigger based federated systems
5Clinical Research Grid Types of Information
- Radiological Studies
- Pathology
- Molecular (Proteomics, gene expression)
- Genetic, Epigenetic (SNPs, haplotype analysis)
- Laboratory, pharmacy, outcome data
6Aggregation of Data in Virtual Information
Warehouses
Virtual Information Warehouses
Tissue bank Lab Data
Clinical Genomic Data
Clinical Data
Clinical Data
7Clinical Research GridMore than just data
aggregation
- Define iteratively define clinical protocols
- Changes arise from scientific, institutional
review and with ongoing analysis of study data - Patient accrual
- Identify suitable patients, obtain patients
consent for study - Execution of protocol
- Maintain and execute rule base in order to carry
out treatment and testing specified by protocol - Ongoing assessment of patient data determines
patient treatment, what data will be obtained,
which specimens will be collected, how the
specimens will be processed, which tests will be
carried out - Patient safety and protocol optimization
- Ongoing analysis of data from overall study and
of data from individual patient
8Analysis Prediction of patient outcome,
effectiveness of treatment relationship of
genomic data to pathophysiological measurements,
outcome
Drives accrual, protocol changes, choice of
laboratory, imaging, genomic testing
Data streamed to Analysis
Analysis subscribes for data updates
Workflow Execution of rule-based protocols,
execution of algorithms that specify tests and
treatments, coordination of patient consenting,
specimen collection and analysis
Generates requests for data
Data Diagnosis, Treatment, Laboratory, Imaging,
Proteomic, Gene Expression, Gene Sequence
Data driven algorithms -patient accrual,
clinical, laboratory, genomic testing
9Overview of Clinical Research Grid
- Customized access control
- Institutions and patients decide what data to
share - Ad-hoc data warehouses
- Each research project and consortium can maintain
its own data view - Institutional databases linked to grid
- Grid based molecular dataset and image analysis
- Images as first class objects
OSU Information Warehouse
10Clinical Research Environment at Single Site OSU
IS Infrastructure
11Components of Local Information System
- Electronic medical record
- Clinician order entry, clinical protocol
specification and tracking - Laboratory System
- Digital Radiology (PACS)
- Datagate triggers invoked by message monitoring
- Appointments, billing
- Logistics scheduling people and resources
- Information warehouse
- Security, single sign-on
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Infrastructure for Clinical Trials Support at OSU
Layered on OSU Information Warehouse
17CPR Decision SupportFacilitate Best Practice
through POE
Order sets designed to support evidence based
clinical guidelines
Standard templates/defaults for complex orders
18Ohio Clinical Trial Research Consortium
19(No Transcript)
20Federate Emerging Databases
Infrastructure to relate, combine produce
meta data
Deformation Segmentation Quantification
Slide courtesy of Arthur Toga
21Two groups have developed BIRN project
partnerships, so far
- Mouse BIRN - Animal Models of Disease / Multi
Scale/Multi Method - MS Mouse and DAT KOM (a
schizophrenic and otherwise interesting mouse
animal model) - Brain Morphology BIRN - Targets neuroanatomical
correlates of neuropsychiatric illness (Unipolar
Depression, mild Alzheimer's Disease (AD), mild
cognitive impairment (MCI)
22OSU Grid Software Infrastructure
- Support for optimized query, processing, analysis
of distributed datasets - Integration of software with NSF PACI software
suite (Globus, SRB, Network Weather Service) - Collaboration with BIRN
23Software Support for Data Driven Applications
- DataCutter Component Framework for Combined
Task/Data Parallelism - Filtering/Program coupling Service Distributed
C component framework - GridDB Lite Large Data Query Layered on
DataCutter - Indexing Multilevel hierarchical indexes based
on R-tree indexing method. - Data Cluster/Decluster/Range Query
- Active Proxy G Active Semantic Data Cache
- Employ user semantics to cache and retrieve data
- Store and reuse results of computations
24DataCutter
- Flow control between components
- Schedulers place filters on grid processors
(scheduler API) - Stream based communication
- MetaChaos data descriptor, data mapping support
for inter-component data transfers - Data aggregation implemented as a component
- NPACkage
Download at www.datacutter.org
25Integrating DataCutter with existing Grid
toolkits SRB, Globus, NWS
- SRB integration Subset and filter datasets
- Globus integration DataCutter uses Globus
resource discovery, resource allocation,
authentication, and authorization services. - Network Weather Service (NWS) integration NWS
for used for system monitoring.
Distributed by NPACI as NPACKage
26GridDB LiteSelect Operation on Grid Data
?Distributed Array
27Query Planning
Query
Data Source
Filter
Index
Distribution Generation Service
Data Source
Filter
Distributed Program
28Query Execution
Data Mover Service
Distribution Generation Service
Partition
partition
Data Mover Service
Distributed Program
29Multi-Query OptimizationActive Proxy G
q1
- Goal minimize the total cost of processing a
series of queries by creating an optimized access
plan for the entire sequence Kang, Dietz, and
Bhargava - Approach minimize the total cost of processing a
series of queries through data and computation
reuse - IPDPS2002,SC2002,ICS02
q2
This blue slab is the same as in q1
We have seen the pieces of q3 computed for other
queries in the past
q3
30Grid Based Image Analysis Toolkit
- Framework to support distributed image processing
applications - Use DataCutter, VTK, and ITK
- Provide a standard framework for describing image
processing workflow and its data in order to
enable creation of image processing Grid Services.
- DataCutter Distributed workflow system used for
building applications that can operate in a
cluster computing environment. - VTK The visualization toolkit used for creating
visualization applications of all kinds which
will manipulate and view image data. - ITK Insight segmentation and registration
toolkit is quickly becoming the standard toolkit
for the archival and invention of image analysis
algorithms.
31NPACI Telescience, BIRN and MicroscopySupport
for Telescience Portal using VTK, ITK
40,000 pixels
- Goal
- Remote access to and processing of subsets of
large, distributed images. - Even single images can be very large (a few
hundred MB to tens of GB per image for montaged
digitized microscopy images). - Support by DataCutter for
- Basic database operations Indexing, querying,
and subsetting of large images and image
datasets. - Image processing supported by VTK (Visualization
Toolkit) and ITK (Insight Segmentation and
Registration Toolkit) layered on DataCutter. - Use of heterogeneous, distributed clusters for
data processing. - DataCutter is part of NPACKage which also
includes SRB, Globus, and Network Weather Service
as an integrated suite of tools.
40,000 pixels
Query
DataCutter
Telescience Portal
32Telescience Portal and DataCutterDemo at NPACI
2003 All Hands Meeting (March 03)
Compute Cluster
Storage Systems
Globus
Storage Resource Broker (SRB)
DataCutter Filter
VTK/ITK
DataCutter
Globus
Compute Cluster
Globus
- Middleware Tools
- DataCutter -- subsetting, filtering, and
processing of data in a distributed environment - Globus -- Authentication, resource allocation,
and remote process execution. - SRB -- file I/O to different storage systems.
- With DataCutter
- Some of the processing can be done near data
sources to reduce volume of data. - Compute intensive operations can be executed on
collections of compute clusters.
33Radiology Clinical Studies using Dynamic
Contrast Imaging
- 1000s of dynamic image sets per clinical study
- Iterative investigation of image quantification,
image registration and image normalization
techniques - Assess techniques ability to correctly
characterize anatomy and pathophysiology - Biopsy results
- Changes in tumor structure and activity over time
with treatment - Images from many sites including NIH, Heidelberg,
Oklahoma, Ohio State - Collaboration with Michael Knopp, MD
34(No Transcript)
35Large Scale DataCutter Testbed Analysis of Oil
Reservoir Simulation Data
- Evaluate geologic uncertainty and production
strategies simultaneously - Multiple realizations of multiple geostatistical
models - Multiple production strategies (number, location
of wells) - Dataset Size 5TB
- 500 simulations, selected from several
Geostatistics models and well patterns - Each simulation is 10GB
- 2,000 time steps, 65K grid elements, 8 scalars
3 vectors 17 variables - Stored at
- SDSC HPSS and 30TB Storage Area Network System
- UMD 9TB disks on 50 nodes PIII-650, 128MB,
Switched Ethernet - OSU 7.2TB disks on 24 nodes PIII-900, 512MB,
Switched Ethernet - Data Analysis
- Economic model assessment
- Bypassed oil regions
- Representative Realization Selection for more
simulations
36Clinical Studies
37Leukemia Correlative ScienceCommittee (LCSC)
- Michael A. Caligiuri, MD
- Cancer and Leukemia Group B
38CALGB Leukemia Tissue Bank (80,000 Vials)
Bioinformatics
39LCS Committee Administration
- Michael A. Caligiuri, MD (Chair, 1999-)
- John Byrd, MD (Vice Chair, 1999-)
- Thomas Look, MD (Vice Chair, 2000-)
- Stephen George, PhD (Faculty Statistician)
- Jennifer Shoemaker, PhD (Faculty Statistician)
40CALGB LCSC History
- 27th year of existence, substantial impact
- Focus molecular classification and molecular
targeting - For last 18 years, the LCSC grant has been
submitted as a separate U10 rather than as part
of the Chairs grant - Critical for initiating and implementing a strong
laboratory program in correlative science - Greater flexibility in managing the reference
laboratories - Provides direct access and control of funds by
the scientists - Allows rigorous peer review of laboratory science
and guarantees a base of support for the LCSC - Alleviates additional administrative strain on
the Central Office
41CALGB LCSC Purpose
- Pursue the biologic basis for malignant
transformation in hematopoietic cells - To identify the cytogenetic, cellular, and
molecular aberrations of transformation - To understand the relationship of these processes
to clinical outcome in order to significantly
increase the fraction of patients cured of
leukemia
42Summary of LCSC Activity
- Eight LCSC protocols open
- Eleven Leukemia Tissue Bank studies opened
- Six LCSC Protocols completed or closed
- Eight LCSC Protocols in development
- 4,391 accruals to LCSC Protocols ( 71 ? )
- 41 Peer-reviewed manuscripts
- 43 Peer-reviewed abstracts presented
- 14 Primary peer-reviewed manuscripts submitted
43LCSC Overview of Major Accomplishments
- LCSC identified distinct groups of leukemia
patients whose cytogenetic or molecular
characteristics proved to be predictive of
clinical outcome - Implemented clinical trials whose treatment
paradigms included stratification based on the
presence or absence of markers identified as
relevant to prognosis by the LCSC
44The Hemizygous FLT3 ITD Genotype Predicts Poor
Prognosis in AML Patients lt 60 Years of Age with
Normal Cytogenetics
Time (Years)
Similar results Thiede et al. Blood
994326-4335, 2002
45LCSC Research Themes and Aims
- Novel Molecular Markers in AML (Dr. Gilliland)
- To prospectively evaluate the prognostic
significance of - the hemizygous Flt3 genotype
- BAALC gene expression
- predictive value of additional novel molecular
markers following preliminary analyses
9621 9720 19808 10102
46LCSC Research Themes and Aims
- Molecular Profiling in Leukemia (Dr. Staudt)
- Develop a genetic and/or epigenetic expression
profile or signature in AML, T-ALL, B-ALL, and
CLL cases that lack specific indicators of
clinical outcome relevance to clinical outcome.
9420 9621 9720 19808 10101 10103
47(No Transcript)
48What are the best predictors of outcome?
CALGB LTB Provision of Patient Material
Clinical
Correlation
Statistical Ccnter Bioinformatics Unit
Microarrays For An Integrative Genomics
Storage QC/QA Validation Algorithms
49Pharmacogenetics-Pharmacogenomics
- OSU Program in Pharmacogenomics
- Director W. Sadee
- Focus on genetics of complex diseases
- and therapy
- Cardiovascular
- CNS
- Cancer
- Chemogenomics
50(No Transcript)
51Treatment of Coronary Artery Disease How to
Exploit Genetic Information for Optimizing
Therapy
- Atherosclerotic process components
- Cholesterol metabolism and transport
- Inflammation
- Coagulation
- Drug response
- Statins
Glen Cooke, Heart Lung Research Institute Joel
Saltz, Biomedical Informatics Heifeng Wu,
Coagulation Clinics Bo Yuan, Genomics Wolfgang
Sadee, Audrey Papp, Julia Pinsonneault,
Pharmacology Clay Marsh, Heart Lung Research
Institute Xiaotong Shen and Dan Dougherty,
Biomathematical Sciences Institute
52- Overall Approach CAD study
- Large number of candidate genes
- Haplotype multiple phased sequence variants
- (SNPs) obtain genotype
- Gene dosage
- 4. Allele-specific mRNA analysis
- 5. Associations with clinical phenotype
- In vitro analysis of proteome, transcriptome
- (plasma, monocytes, plaque tissue)
- Functional assays (e.g., monocytes)
53Cholesterol Lowering Therapy with Statins
Prediction of therapeutic efficacy Pravastatin,
simvastatin, lovastatin Inhibitors of HMG-CoA
synthase
Kuivenhoven et al. - variant of the cholesteryl
ester transfer protein gene in the progression
of coronary atherosclerosis. NEJM
199833886-93 intronic common SNP function?
54CETP Genomic Structure
- Taq1A/B in linkage disequilibrium with promoter
polymorphisms - I5045V frequent SNP in coding region
- (marker SNP for allele-specific mRNA analysis)
- Exon 9 splice variant lacking exon 9 yields
dominant negative CETP - Approach measure specifically CETP and exon9-
mRNA, - each by allele-specific assays
- (allele-specific PCR or allele-specific ligation
PCR)
55CETP LinkageDisequilibrium
56Cholesterylester transfer protein gene (CETP)
OSU CAD study (Glen Cooke) 950 patients and 500
controls
Possible number of haplotypes 2416
57Associations with Risk Factors
58Major Adverse Coronary Events
59Lipid Profiles comparinggenotypes
60Collaborators OSU College of Medicine Glen
Cooke Joel Saltz Catalin Barbacioru Dan
Cowden William Abraham UCSF Plasma
Membrane Transporter Group Vera Rhakmanova Ed
Bilsky Gordon L. Amidon Patsy Babbitt Carmine
Coscia John Weinstein Kimberly Bussey
Acknowledgements
Sadee Lab Pascale Anderle Jung-eun Lim Danxin
Wang Julie Lucas Julia Pinsonneault Audrey
Papp Xiaochun Sun Ying Huang Ying Zhang Daniel
Dougherty Zunyan Dai
61Thanks to
- State of Ohio BRTT
- National Institutes of Health
- National Science Foundation
- OSU Comprehensive Cancer Center
- Department of Energy
- DARPA