For Nautilus: To OLAP or not to OLAP? - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

For Nautilus: To OLAP or not to OLAP?

Description:

REMBRANDT Empowering Translational Research REpository of Molecular BRAin Neoplasia DaTa HL7 Clinical Genomics SIG Atlanta, September 04 Agenda Translational ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 32

Provided by: Happy96

Learn more at: http://www.hl7.org

Category:

more less

Transcript and Presenter's Notes

Title: For Nautilus: To OLAP or not to OLAP?

1
REMBRANDTEmpowering Translational Research
REpository of Molecular BRAin Neoplasia DaTa
HL7 Clinical Genomics SIG Atlanta, September04
2
Agenda

Translational Research Why do we care?
GMDI How we got here?
Conceptual Model
Gene Expression Use Case analysis
Gene Expression Data analysis
Wire Frames
System Architecture
Object Model
Data warehouse design

3
Translational Research Why do we care?

Iressa Drug Case Study (at Harvard Medical
School)
Targeted towards lung cancer
Phase II trial A minority of patients showed
dramatic tumor shrinkage
Phase III randomized trial No survival
improvement.
Patients with mutations in Iressas target, EGFR,
showed response to the drug.
Pharmacogenomics future is based on translational
research
Reference Clinical Pharmacogenomics Almost a
reality Modern Drug Discovery, August 2004

4
Scientific goals of GMDI

Develop a molecular classification schema that is
both clinically and biologically meaningful,
based on gene expression and genomic data from
tumors (Gliomas) of patients who will be
prospectively followed through natural history
and treatment phase of their illness

5
Rembrandt Knowledgebase
Better understanding Better treatments
6
REMBRANDT Project Goals

Produce a national molecular/genetic/clinical
database of several thousand primary brain tumors
that is fully open and accessible to all
investigators (including intramural and
extramural)
Provide informatics support to molecularly
characterize a large number of adult and
pediatric primary brain tumors and to correlate
those data with extensive retrospective and
prospective clinical data

7
Functional genomics data in the knowledge-base
RNA
Protein
DNA
100K SNP array
Tissue Arrays for ISH
ArrayCGH
Tissue Arrays (IHC)
Proteomics (Mass Spec)
Gene Expression Analysis
Copy No.
LOH
Affy/Oligo Arrays
cDNA/GenePix Arrays
Real time RTPCR
8
Conceptual Model
Prior_Therapy
Demographics
Survival
Outcome
Time course
C3D
Patient
Trial
Pathology
User Input
Sample
CaCore
Expr_Expt
CGH_Expt
SNP_Expt
Change_Status
Map_Location
caArray
Abnorm_Status
Gene
BAC_ID
SNP
E-value
Abnorm_Status
Call
9
REMBRANDT will Leverage NCICB and caBIG
Infrastructure Components

Aligns with caBIG principles
Open source
Open access
Syntactic and Semantic interoperability
Federated data
NCICB Infrastructure
caARRAY gene expression data repositories and
analysis tools
Cancer Genome Anatomy Project (CGAP) genomic
tools
C3D Clinical Informatics System
caCORE Infrastructure (caBIO, EVS, caDSR)
caBIG Infrastructure being delivered by caBIG
workspaces

10
Typical Rembrandt Search

Show me the tumors (Tumor samples) that have
amplification and over-expression of Genes EGFR
Cyclin D1.
Restrict the search to cases with
amplification confirmed by SNP Chip and CGH,
and over-expression confirmed by Oligo and cDNA
Arrays
Presentation of Results
Which genes are under-expressed respect to
normal?
Do this subset of tumors have a better survival?
Do they segregate to a certain age group,
geographical area or ethnicity?

11
True Measure ofTranslation Research

To present the all DOWN Regulated Genes within
each sample in the result set, we have to pivot
the result set on its Gene Expression axis.
All Translational Queries should allow the
ability to easily pivot between
Disease View
Patient / Sample View
Experiment/ Annotations View
Time Course View

12
High-level Search Use cases
13
Gene Expression Search Use cases
14
Gene Expression data analysis
Binary chp files from GCOS
15
cDNA data handling
Technical Replicates
Pearson Correlation between one spot across all
arrays and another spot for the same clone
across all arrays
For each array, calculate the average of
expression measurement
Yes
Is Correlation gt 0.7
No
inconsistent call is made and no e-value
Computed for that clone
16
UI Wire Frames
17
UI Wire Frames
18
Architecture
19
Object Model

DomainElement
Represents the basic elements involved in
translational research space.
All queries, views and presentation objects are
composed of domain elements
Provides strong type checking and validations

20
Database Schema

Star schema
Is a generic, query optimized schema
A star schema consists of Fact tables and
dimensions
Provides a highly de-normalized view of the data
Provides a data neutral framework from which
queries can be executed with very fast results
Prototype usage will help us validate our approach

21
Database Schema

Fact Table
Contains key performance indicators
Helps eliminate expensive joins from queries
In the future, if multi-dimensional measures are
required, then our schema is extensible to allow
us to perform OLAP queries
Dimension
Dimensions are the categories of data analysis
When a report is requested "by" something, that
something is usually a dimension.
For example, in a gene expression query, the two
dimensions needed are genes (GENE_DM) and samples
(BIOSPECIMEN _DM)

22
Database Schema

23
Problem we are trying to solve

A typical Rembrandt data portal search
Show me all tumor samples that have amplification
of 13q11.3, deletion of 10p21, D7S522 and the
FHIT region confirmed by SNP chips and CGH
analysis.
Display regions with LOH for these samples.
Which genes are under-expressed in these tumor
samples with respect to normal?
Do this subset of tumors have a better survival?
Do they segregate to a certain age group,
geographical area or ethnicity?

24
To solve this problem

Fact Cancer develops as a result of Chromosomal
aberrations
Duplications
Deletions
Somatic Mutations
We need to measure chromosomal aberrations

Chrom N, Copy 1
Chrom N, Copy 2
Complete Loss
Duplication
LOH
25
How to measure aberrations?

CGH
SNP Arrays
Have higher resolution than CGH
Analyze chromosomal copy number and genotype in
one experiment
SNP arrays help determine the following between
normal blood sample and Tumor sample
Heterozygous to Homozygous Loss of one allele
Heterozygous to No Call Partial Loss of one
allele/No Call
Homozygous to Homozygous Unchanged/Loss of one
allele

26
Genotype model for Rembrandt

Model basic science
Model SNPs in relation to chromosomal aberrations
and as markers on the genome
Model to include annotations and external
cross-references
Model Experimental observations
Capture observations such as LOH in relation to
SNPs and chromosomal aberrations (CGH data)
Capture expression value for SNP elements on
arrays to correlate with DNA copy number

27
Translational Research use case

The Clinical Genomics model should serve the
translational research use case
Model should allow for associations between
Basic science / molecular observations (Gene
expression, SNP, pathway etc)
Clinical science (Prior therapy, outcome,
demographics etc) data.

28
Translational Research Space
29
Next Steps

Reviewing the HL7 Re-usable genotype R-MIM as a
starting point to build a clinical genomics
object model
Translating the genotype R-MIM into UML to
establish relationships and cardinalities between
various scientific observations
For REMBRANDT, Extending the caBIO Object Model
Developing a data warehouse infrastructure for
REMBRANDT to define relevant translational spaces
and relationships between them
Future We plan to merge our clinical objects
with the HL7 Clinical model

30
The Rembrandt Team!