Title: Informatics Tools for Molecular-Based Specimen Banks
1Informatics Tools for Molecular-Based Specimen
Banks
- Current Resources
- CaTIS Database
- VGSR Concept
Mark A. Watson, M.D., Ph.D.
2Informatics Limitations of Specimen Banks
- Inability to completely and accurately annotate
specimens with clinical and pathological data. - Inefficient tracking of specimens and specimen
quality to multiple different research projects. - Inability to perform real-time queries of
readily available samples from secure data
servers. - Lack of biological (research) data annotation
for specimens.
- Inability to completely and accurately annotate
specimens with clinical and pathological data. - Inefficient tracking of specimens and specimen
quality to multiple different research projects. - Inability to perform real-time queries of
readily available samples from secure data
servers. - Lack of biological (research) data annotation
for specimens.
3SCC Tissue Procurement Core Overview
- Established 1997
- 24 studies
- Archival bank
- Institutional studies
- American College of Surgeons Oncology Group
(ACOSOG) - 21,000 specimens / 8,000 patients (sporadic
cancer) - Frozen Tissue / Paraffin Blocks / Serum
- 6,300 DNA and RNA samples
- 112 Sample distributions
- Intra- and extramural investigators
4SCC Tissue Procurement Bioinformatics
- Mark A. Watson, M.D., Ph.D.
- Dir. Siteman Cancer Center Tissue Procurement
Core Facility - Dir. ACOSOG Central Specimen Bank
- Dir. Siteman Cancer Center Multiplexed Gene
Analysis Facility - Rakesh Nagarajan, M.D., Ph.D.
- Asst. Dir. Siteman Cancer Center Bioinformatics
Core Facility - Jeff Milbrandt, M.D., Ph.D.
- Dir. Siteman Cancer Center Bioinformatics Core
Facility - Richard Wilson, Ph.D.
- Dir. Washington Univ. Genome Sequencing Center
- (5) Laboratory FTEs
- Persistent Systems (Software Development)
5SCC Tissue Procurement Core Capabilities
- Specimen Collection
- Multiple institutional sites
- Outside institutions
- Specimen Storage
- LN2, -80C, 4C
- Specimen Processing and QA
- DNA and RNA samples
- Sample Arraying
- Tissue / DNA / RNA
- Data Management (Tracking)
6A Paradigm for Specimen Data Utilization
Specimen Data
Multi-Dimensional Data Space
7CaTIS Specimen Database
- MS Access Based
- Non-networked
- Scalable
- Rapid customization
- Functionality
- Patient / Specimen / Sample Accession
- Storage / QA data (tissues and samples)
- Pathology data (manual entry)
- Distribution data
- Mapping to sample arrays / robotics
- Tracking through to outside institutions (e.g.
WU-GSC) - Mapping to experimental results
8CaTIS Specimen Database
9CaTIS Specimen Database
10CaTIS Specimen Database
11CaTIS Specimen Database
12CaTIS Specimen Database
13CaTIS Specimen Database
14CaTIS Specimen Database
Specimen Data Patient Code Number Specimen Code
Number Specimen Info / QC Submission Data Path
Data
Sample Data Specimen Code Number Sample Code
Number Sample QC Investigator Distribution Experim
ental Results
Patient Data Patient Code Number Demographics Stud
y / IRB
15Aim 1 Commercialize and Distribute Key CaTIS
Components
- Data security scheme
- Migrate to web based data entry / query
- Migrate to platform independence
- Open interface to other pathology data systems
- Open interface to genomics data systems (e.g.
CHIPDB) - Use of caBIG-defined common data elements
- Pathology
- Molecular
- Additional tools (e.g. pedigrees) suggested by
adopters
16Virtual Genomic Sample Repository (VGSR)
- Most specimen resources do not allow direct
querying to sample detail. - Most specimen resources do not allow querying by
biological (experimental) data. - Caucasian male / gt 60 YO / T2N0 NSCLC
- T2N0 NSCLC / p53- / 50 ug DNA / U133 Array
4567 - Single specimens may be used for multiple
studies at the genome / transcriptome / proteome
level - Maximum specimen utility
- Integrative systems biology
17Virtual Genomic Sample Repository (VGSR)
Goal To develop an informatics system and
scientific culture to facilitate the sharing of
molecular biospecimens and biological data
associated with their use.
18 VGSR Conceptual Data Flow
Institutional Tissue Banks
19VGSR - Components
- Policy / Governance
- Can samples be shared (IRB / MTA) ?
- Will samples be shared (ownership / IP) ?
- How will samples be shared (review /
prioritization) ? - Sample QA / Data standards
- Central Database Server
- Web-based Applications
- Sample Registration
- Sample Query / Request
- Sample Manager
- Data Registration
20VGSR Key Features
- No other system currently meets this need
- Cooperative Group Tissue Banks
- Pathology Database Tools
- Unique opportunity to test the Virtual Specimen
Bank concept - Numerous points of interoperability with caBIG
systems - Based on caBIG Architecture Workspace
recommendations - Use of caBIG vocabularies and common data
elements - Integrated with NCICB tools (caIMAGE)
- Works with other software developed in
Integrative Cancer Research Tools Workspace (e.g.
Function Express, Mutation Viewer)
21CaTIS / VGSR Required Resources
- Developmental Requirements
- Consensus building / SOPs / CDEs
- Personnel
- 1 Architect (Shared with Architecture Workspace)
- 1 Project manager (50- Shared with Integrated
Cancer Research Tools Workspace) - 1 DBA (50- Shared with Integrated Cancer
Research Tools Workspace) - 2 Programmers
- Hardware
- Enterprise-class server / programmer
workstations - Software
- Rational Rose/XDE, ClearCase (CVS), ClearQuest
- JBuilder, C Builder, Ant
- RDBMS (e.g. Oracle, DB2, MSSQL, etc.)
22CaTIS Project Timeline
- Month 1-6
- Conversion of CaTIS to Commercialized CaTIS
- Month 7-12
- Deploy CaTIS API to extramural sites (adopters)
23VGSR Project Timeline (1-12 months)
- Month 1-8
- Consensus / Policy building
- Architecture design / CDEs and vocabularies
- Month 9-12
- Database building
- Web tools design
24CaTIS / VGSR Outcome Measures
- Utilization of CaTIS by adopter sites
- Successful registration of samples to VGSR
- Utilization of VGSR samples by new collaborative
research teams for translational cancer research - Funding opportunities
- Publications track record