Title: University of Iowa Holden Comprehensive Cancer Center caBIG Proposal Overview
1University of Iowa Holden Comprehensive Cancer
CentercaBIG Proposal Overview
Holden Comprehensive Cancer Center (HCCC) George
Weiner, Director Center for Bioinformatics and
Computational Biology (CBCB) Thomas Casavant,
Director Coordinated Laboratory for Computational
Genomics (CLCG) Terry Braun, Director The
University of Iowa Iowa City, Iowa
USA. http//www.uihealthcare.com/depts/cancercente
r/ http//genome.uiowa.edu Meeting participants
Thomas Casavant, Terry Braun, Todd Scheetz,
Andrew Williams, Ramon Lawrence, Jill Kuennen,
Erin Brothers
2Iowas Trenches NCI Cancer and Bioinformatics
Centersexperiences for Community Resource
Development
- Study design assistance and execution
- Permit and promote data sharing through
interoperability - Resource development and integration at multiple
sites - Data curation and integration
3OverviewExisting Projects Integral to caBIG
Goals
- Transcript Annotation Prioritization and
Screening System - (TrAPSS)
- Clinical and Expression Database
- (CED)
- Integrated Expression Environment
- (IEE)
- Custom Sequence Annotation
- (CSA)
- Population Study/Disease Locus Association System
- (GenoMap)
4TrAPSS in a Shared Architecture Environment
- Transcript Annotation Prioritization and
Screening System (TrAPSS) - Accelerates mutation identification
- laboratory and data automation
- quantitative method to infer pathogenicity
- established software system with user base,
testing, and feedback - future development for candidate prioritization
5TrAPSS System Architecture
6PrimerViewer(A TrAPSS component)
The two highest scoring PAR regions for this gene.
Annotation
Primers
Text-based view of a gene structure.
Graphical view of a gene structure.
The PAR Graph
Gene Structure
PAR graph for this gene.
PrimerViewer is a program written to aid in the
selection of oligo pairs for mutation screening.
It also calculates PAR to prioritize regions of
importance for genes.
7Clinical and Expression Database (CED) The
Challenge
- Expression Data
- Multiple computers
- Multiple formats
- Clinical Data
- Paper records
- Electronic Documents
- e.g., pathology, histology
- Diagnoses, prognoses, etc
- Analysis
- Time consuming
- Manual operations
- Difficult to repeat
- Inaccessible to others
- Difficult for collaborators
Analysis
8CED Key Goals
- Develop extensible storage system for both
clinical and expression data - Accessible to lab members and collaborators
- Web-based Interface
- Secure data access
- Support multiple, diverse projects
- Allows user management
- Performs basic data analysis with extensive
annotation - Provides searching by clinical and expression
parameters simultaneously
9CED Clinical and Expression Synergy
- Simultaneous analysis of data from multiple
methods - EST libraries
- SAGE
- Microarray
- Analysis utilizing multiple annotations
- Unigene
- LocusLink
- Gene Ontology (GO)
- Enzyme Classification (EC)
- Metabolic pathways
- Genomic location
- Allow searches for genes and tissue samples based
on clinical and expression data - List all grade 2 chondrosarcoma tissues where
subject has undergone radiation therapy and p21
expression is more than 5 different than control
samples - Find genes expressed at greater than 5-fold
difference in grade 3 chondrosarcoma tissues than
in normal cartilage tissues
10IEE - Integrated Expression Environment
Infrastructure
For large and/or heterogeneous datasets
Raw Data
Analysis Suite
File Server
Results (gene lists, graphs, etc)
link
Http Server
For small and/or homogeneous datasets
PC User
cgi
java
Web browser
GeneSpring
11IEE Web-based ยต-array Design, Management, and
Analysis
12CSA Custom Sequence Annotation
- CSA Goals
- Online Access
- User-customized pipeline
- Modular components for pipeline
- Flexible design
13CSA Custom Sequence Annotation(An Illustrative
Example EST-based Gene Discovery Pipeline)
14An Integrated Bioinformatics System Design for
ExpressionCED, IEE and CSA
Clinical Database
Expression Database(s)
Analytical Tools/Database
(Hierarchy)
(Data Sources)
Expressed Sequence Tags (ESTs)
Species Subject Tissue (e.g., tumor) Tissue Sample
- Statistical Analyses
- Quality assessment
- Significance tests
- Sub-classification
Demo- graphics
Pathology Histology
Serial Analysis of Gene Expression (SAGE, MPSS)
Cell line
- Algorithmic Classification
- Pathway identification
- Annotation search
- Transcript analysis
Lab protocol and material control
Micro-arrays (cDNA, Affy, Oligo, etc)
mRNA cDNA Library
RT-PCR, QPCR, etc
Database Integration and Support for Complex
Queries
15Disease Gene Isolation GenoMap
- Driven by large collaborations
- GenoMap goals
- Web-based cooperating heterogeneous users
- Portable, intuitive interface
- Share information among multiple distributed
clients - Avoid replication of data to prevent coherency
problems - Provide security
- Fundamental Information Components Managed
- pedigree information (familial relationships)
- clinical observations of disease (phenotype)
- sets of known informative genetic probes (genetic
markers) - genotypes (inherited genetic patterns)
16UI-Holden ToolscaBIG Interoperability
caBIG Data
Clinical
GenoMap
Expression
CED
IEE
CSA
TrAPSS
Annotation
caBIG Data
17Iowas Trenches NCI Cancer and Bioinformatics
Centersexperiences for Community Resource
Development
- Study design assistance and execution
- Permit and promote data sharing through
interoperability - Resource development and integration at multiple
sites - Data curation and integration
18CBCB Infrastructure HW/SW
- 190 Computer Systems, 224 processors,
- 115 GigaBytes RAM, 2.7TeraBytes of Disk
- 5 dedicated clusters
- 32 CPUs, 64 GB RAM, 1.0 Gigabit copper Ethernet
N/W, Linux - 32 CPUs, 20 GB RAM, 1.0 Gigabit fiber Ethernet
N/W, Linux - 18 CPUs, 9 GB RAM, 2.4 Gbit Multistage N/W, Linux
- 17 CPUs, 8GB RAM, ATM N/W, SUNOS
- 8 CPUs, 1 GB RAM, 100 Mbit and ATM N/Ws, Linux
- 9 compute, database, and file servers
- 40 Development systems
- 2000 Sq.ft. lab space (100 Mbit and Gbit
Ethernet, - and 802.11a/b wireless system).
- Extensive installed base of Software for genome,
expression, and linkage study, and other analyses.
19CBCB Faculty Affiliates
- 5 Colleges
- More than 25 Departments/Programs
- Dr. William Ballard (Biological Sciences)
- Dr. Terry A. Braun (Biomedical Engineering)
- Dr. Adrian H. Elcock (Biochemistry)
- Dr. Caroline S. Harwood (Microbiology)
- Dr. Jian Huang, (Statistics and Biostatistics)
- Dr. Kenneth P. Murphy (Biochemistry)
- Dr. Todd E. Scheetz (Ophthalmology)
- Dr. Deborah L Segaloff (Biophysics)
- Dr. Alberto M. Segre (CS, Applied Math, CSG)
- Dr. Val C. Sheffield (Pediatrics, Genetics)
- Dr. Edwin M. Stone (Ophthalmology, Genetics)
- Dr. Debashish Bhattacharya (Biology, Genetics)
- Dr. Kevin Campbell (Physiology and Biophysics)
- Dr. James F. Cremer (Computer Science)
- Dr. Robin Davisson (Anatomy and Cell Biology)
- Dr. Beverly Davidson (Internal Medicine,
Genetics) - Dr. Connie Delaney (Nursing, Health Informatics)
- Dr. Robert Deschenes (Biochemistry, Genetics)
- Dr. John Donelson (Biochemistry)
- Dr. David Eichmann (Library/Information Science)
- Dr. Jan Fassler, (Bilogical Sciences, Genetics)
- Dr. G. F. Gebhart (Pharmacology)
- Dr. E. Peter Greenberg (Microbiology)
- Dr. Ramon Lawrence (Computer Science)
- Dr. Matthew Howard (Neurosurgery)
- Dr. Khalid Kader (Biomedical Engineering)
- Dr. Robert J. Linhardt (Pharmacy, Chemical
Engineering) - Dr. Michael Mackey (Biomedical Engineering)
- Dr. Paul McCray, (Pulmonary Pediatrics)
- Dr. William Nauseef (Internal Medicine)
- Dr. John P. Robinson (ECE)
- Dr. Curtis Sigmund (Physiology and Biophysics)
- Dr. M. Bento Soares (Physiology, Biochemistry,
Genetics) - Dr. Padmini Srinivasan (Library/Information
Science) - Dr. Mark Stinski (Microbiology)
- Dr. John B. Stokes (Internal Medicine)
- Dr. Jerrold Weiss (Internal Medicine and
Microbiology) - Dr. Michael J. Welsh (Internal Medicine)
20CBCB/CLCG People
- Post-Docs/Sr. Computational Scientists
- Dr. Todd E. Scheetz, Dr. Tom Bair, Dr. Vladimir
Leontiev - Staff
- Erin Brothers, Hakeem Abdulkawy, Jason Grundstad,
Gregg Webster, Dr. Bartley Brown, Dylan Tack,
Jason Laffin, Rhett Sutphin - Students
- Annie Chiang, Nishank Trivedi, Jesse Walters,
Brian OLeary, LaVonne Mangin, Paul Song, Steve
Davis, Jared Bischof, Brian Mokrzycki, Barry
Gackle, Rani Kalari, Kevin Jenner, Chris Moressi,
Mike Smith