For Nautilus: To OLAP or not to OLAP? - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

For Nautilus: To OLAP or not to OLAP?

Description:

REMBRANDT Empowering Translational Research REpository of Molecular BRAin Neoplasia DaTa HL7 Clinical Genomics SIG Atlanta, September 04 Agenda Translational ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 32
Provided by: Happy96
Learn more at: http://www.hl7.org
Category:

less

Transcript and Presenter's Notes

Title: For Nautilus: To OLAP or not to OLAP?


1
REMBRANDTEmpowering Translational Research
REpository of Molecular BRAin Neoplasia DaTa
HL7 Clinical Genomics SIG Atlanta, September04
2
Agenda
  • Translational Research Why do we care?
  • GMDI How we got here?
  • Conceptual Model
  • Gene Expression Use Case analysis
  • Gene Expression Data analysis
  • Wire Frames
  • System Architecture
  • Object Model
  • Data warehouse design

3
Translational Research Why do we care?
  • Iressa Drug Case Study (at Harvard Medical
    School)
  • Targeted towards lung cancer
  • Phase II trial A minority of patients showed
    dramatic tumor shrinkage
  • Phase III randomized trial No survival
    improvement.
  • Patients with mutations in Iressas target, EGFR,
    showed response to the drug.
  • Pharmacogenomics future is based on translational
    research
  • Reference Clinical Pharmacogenomics Almost a
    reality Modern Drug Discovery, August 2004

4
Scientific goals of GMDI
  • Develop a molecular classification schema that is
    both clinically and biologically meaningful,
    based on gene expression and genomic data from
    tumors (Gliomas) of patients who will be
    prospectively followed through natural history
    and treatment phase of their illness

5
Rembrandt Knowledgebase
Better understanding Better treatments
6
REMBRANDT Project Goals
  • Produce a national molecular/genetic/clinical
    database of several thousand primary brain tumors
    that is fully open and accessible to all
    investigators (including intramural and
    extramural)
  • Provide informatics support to molecularly
    characterize a large number of adult and
    pediatric primary brain tumors and to correlate
    those data with extensive retrospective and
    prospective clinical data

7
Functional genomics data in the knowledge-base
RNA
Protein
DNA
100K SNP array
Tissue Arrays for ISH
ArrayCGH
Tissue Arrays (IHC)
Proteomics (Mass Spec)
Gene Expression Analysis
Copy No.
LOH
Affy/Oligo Arrays
cDNA/GenePix Arrays
Real time RTPCR
8
Conceptual Model
Prior_Therapy
Demographics
Survival
Outcome
Time course
C3D
Patient
Trial
Pathology
User Input
Sample
CaCore
Expr_Expt
CGH_Expt
SNP_Expt
Change_Status
Map_Location
caArray
Abnorm_Status
Gene
BAC_ID
SNP
E-value
Abnorm_Status
Call
9
REMBRANDT will Leverage NCICB and caBIG
Infrastructure Components
  • Aligns with caBIG principles
  • Open source
  • Open access
  • Syntactic and Semantic interoperability
  • Federated data
  • NCICB Infrastructure
  • caARRAY gene expression data repositories and
    analysis tools
  • Cancer Genome Anatomy Project (CGAP) genomic
    tools
  • C3D Clinical Informatics System
  • caCORE Infrastructure (caBIO, EVS, caDSR)
  • caBIG Infrastructure being delivered by caBIG
    workspaces

10
Typical Rembrandt Search
  • Show me the tumors (Tumor samples) that have
    amplification and over-expression of Genes EGFR
    Cyclin D1.
  • Restrict the search to cases with
  • amplification confirmed by SNP Chip and CGH,
  • and over-expression confirmed by Oligo and cDNA
    Arrays
  • Presentation of Results
  • Which genes are under-expressed respect to
    normal?
  • Do this subset of tumors have a better survival?
  • Do they segregate to a certain age group,
    geographical area or ethnicity?

11
True Measure ofTranslation Research
  • To present the all DOWN Regulated Genes within
    each sample in the result set, we have to pivot
    the result set on its Gene Expression axis.
  • All Translational Queries should allow the
    ability to easily pivot between
  • Disease View
  • Patient / Sample View
  • Experiment/ Annotations View
  • Time Course View

12
High-level Search Use cases
13
Gene Expression Search Use cases
14
Gene Expression data analysis
Binary chp files from GCOS
15
cDNA data handling
Technical Replicates
Pearson Correlation between one spot across all
arrays and another spot for the same clone
across all arrays
For each array, calculate the average of
expression measurement
Yes
Is Correlation gt 0.7
No
inconsistent call is made and no e-value
Computed for that clone
16
UI Wire Frames
17
UI Wire Frames
18
Architecture
19
Object Model
  • DomainElement
  • Represents the basic elements involved in
    translational research space.
  • All queries, views and presentation objects are
    composed of domain elements
  • Provides strong type checking and validations

20
Database Schema
  • Star schema
  • Is a generic, query optimized schema
  • A star schema consists of Fact tables and
    dimensions
  • Provides a highly de-normalized view of the data
  • Provides a data neutral framework from which
    queries can be executed with very fast results
  • Prototype usage will help us validate our approach

21
Database Schema
  • Fact Table
  • Contains key performance indicators
  • Helps eliminate expensive joins from queries
  • In the future, if multi-dimensional measures are
    required, then our schema is extensible to allow
    us to perform OLAP queries
  • Dimension
  • Dimensions are the categories of data analysis
  • When a report is requested "by" something, that
    something is usually a dimension.
  • For example, in a gene expression query, the two
    dimensions needed are genes (GENE_DM) and samples
    (BIOSPECIMEN _DM)

22
Database Schema
     
23
Problem we are trying to solve
  • A typical Rembrandt data portal search
  • Show me all tumor samples that have amplification
    of 13q11.3, deletion of 10p21, D7S522 and the
    FHIT region confirmed by SNP chips and CGH
    analysis.
  • Display regions with LOH for these samples.
  • Which genes are under-expressed in these tumor
    samples with respect to normal?
  • Do this subset of tumors have a better survival?
  • Do they segregate to a certain age group,
    geographical area or ethnicity?

24
To solve this problem
  • Fact Cancer develops as a result of Chromosomal
    aberrations
  • Duplications
  • Deletions
  • Somatic Mutations
  • We need to measure chromosomal aberrations

Chrom N, Copy 1
Chrom N, Copy 2
Complete Loss
Duplication
LOH
25
How to measure aberrations?
  • CGH
  • SNP Arrays
  • Have higher resolution than CGH
  • Analyze chromosomal copy number and genotype in
    one experiment
  • SNP arrays help determine the following between
    normal blood sample and Tumor sample
  • Heterozygous to Homozygous Loss of one allele
  • Heterozygous to No Call Partial Loss of one
    allele/No Call
  • Homozygous to Homozygous Unchanged/Loss of one
    allele

26
Genotype model for Rembrandt
  • Model basic science
  • Model SNPs in relation to chromosomal aberrations
    and as markers on the genome
  • Model to include annotations and external
    cross-references
  • Model Experimental observations
  • Capture observations such as LOH in relation to
    SNPs and chromosomal aberrations (CGH data)
  • Capture expression value for SNP elements on
    arrays to correlate with DNA copy number

27
Translational Research use case
  • The Clinical Genomics model should serve the
    translational research use case
  • Model should allow for associations between
  • Basic science / molecular observations (Gene
    expression, SNP, pathway etc)
  • Clinical science (Prior therapy, outcome,
    demographics etc) data.

28
Translational Research Space
29
Next Steps
  • Reviewing the HL7 Re-usable genotype R-MIM as a
    starting point to build a clinical genomics
    object model
  • Translating the genotype R-MIM into UML to
    establish relationships and cardinalities between
    various scientific observations
  • For REMBRANDT, Extending the caBIO Object Model
  • Developing a data warehouse infrastructure for
    REMBRANDT to define relevant translational spaces
    and relationships between them
  • Future We plan to merge our clinical objects
    with the HL7 Clinical model

30
The Rembrandt Team!
  • Internal Advisors
  • Ken Buetow
  • Peter Covitz
  • Sue Dubman
  • Mervi Heiskanen
  • Carl Schaefer
  • Christo Andonyadis
  • Scott Gustafson
  • Sharon Settnek
  • External Advisors
  • Jean-Claude Zenklusen
  • Yuri Kotliarov
  • Howard Fine
  • Tracy Lugo
  • Bob Finkelstein
  • Ram Bhattaru
  • James Luo
  • Alex Jiang
  • Prashant Shah
  • Ryan Landy
  • Kevin Rosso
  • Jyotsna Chilukuri
  • Dana Zhang
  • Nick Xiao
  • Smita Hastak
  • Himanso Sahni
  • Subha Madhavan

31
  • I am done?
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com