Title: BioRDF Breakout
1BioRDF Breakout
- Introduction Kei Cheung
- Mage-tab Michael Miller
- vOID Jun Zhao (remote)
- aTag Matthias Samwald (remote)
- Discussion All
2BioRDF Breakout Microarray Use Case
- Kei Cheung, Ph.D.
- Associate Professor
- Yale Center for Medical Informatics
HCLS IG Face-to-Face Meeting, Santa Clara,
California, November 2-3, 2009
3Introduction
- Whole-genome expression profiling has created a
revolution in the way we study disease and basic
biology. - DNA microarrays allow scientists to quantify
thousands of genomic features in a single
experiment - Since 1997, the number of published results based
on an analysis of gene expression microarray data
has grown from 30 to over 5,000 publications per
year - Major public microarray data repositories have
been created in different countries (e.g., NCBI
GEO, EBI ArrayExpress, and CIBEX)
4Microarray Workflow
5An Example of differentially expressed genes
6Importance of Integrating Microarray Data
- Due to the high cost and low reproducibility of
many microarray experiments, it is not surprising
to find a limited number of patient samples in
each study, - Very few common identified marker genes among
different studies involving patients with the
same disease. - It is of great interest and challenge to merge
data sets from multiple studies to increase the
sample size, which may in turn increase the power
of statistical inferences. - The integration of external information resources
is essential in interpreting intrinsic patterns
and relationships in large-scale gene expression
data
7Microarray Data Standards
- MGED
- MIAME
- MAGE-ML
- MAGE-TAB
8Some Examples
- Joint analysis of two microarray gene-expression
data sets to select lung adenocarcinoma marker
genes (Jiang et al. 2004 BMC Bioinformatics) - Large-scale integration of cancer microarray data
identifies a robust common cancer signature (Xu
et al. 2007 BMC Bioinformatics) - What about neurosciences?
9Access to and Use of Microarray data in
Neuroscience
- NIH Neuroscience Microarray Consortium
- Public repositories such as GEO and ArrayExpress
(including data generated from neuroscience
microarray experiments) - Brain atlases (e.g., Allen Brain Atlas and GenSAT)
10Ontology-Based Integration
Microarray experiment 1
Microarray experiment 2
11Example Federated Queries
- Retrieve a list of differentially expressed genes
between different brain regions (e.g.,
hippocampus and entorhinal cortex) for normally
aged human subjects. - Retrieve a list of differentially expressed genes
for the same brain region of normal human
subjects and AD patients. - Using these lists of genes one can issue
(federated) queries to retrieve additional
information about the genes for various types of
analyses (e.g., GO term enrichment).
12Microarray Experiment Descriptions
E-GEOD-3296 Transcription profiling of primary
mouse embryonic fibroblasts (MEFs) from
C57B1/6x129/Sv F2 e14.5 embryos that contain a
deletion in the CH1 domain of three of four
alleles of CBP and p300 The CH1 protein
interaction domain of the transcriptional
coactivators p300 and CBP is thought to interact
with HIF-1alpha and this interaction is thought
to be critical to the expression of HIF-1alpha
target genes in response to hypoxia. Trichostatin
A (TSA), an inhibitor of histone deacetylases,
has been reported to repress the expression of
HIF-1alpha target genes. To test the requirement
of the CH1 domain and TSA for gene expression in
response to dipyridyl (a hypoxia mimetic),
primary mouse embryonic fibroblasts (MEFs) were
generated from C57Bl/6x129/Sv F2 e14.5 embryos
that contain a deletion in the CH1 domain of
three of four alleles of CBP and p300. The
remaining allele of p300 or CBP was a conditional
knock out allele. Control MEFs with only a single
conditional knockout allele of p300 or CBP were
also generated. At passage 3 MEFs were infected
with Cre Adenovirus and grown until they had
expanded at least 100 fold. Subconfluent MEFs
were treated with ethanol vehicle or 100ng/ml TSA
with 5 carbon dioxide at 37 C in a humid chamber
for 30 min., followed by ethanol vehicle or 100
umdipyridyl (DP) for an additional 3hrs.
Immediately after treatment, cells were lysed in
Trizol for RNA extraction. E-GEOD-3327
Transcription profiling of different regions of
mouse brain to study adult mouse gene expression
patterns in common strains. Adult mouse gene
expression patterns in common strains. Experiment
Overall Design six mouse strains and seven brain
regions were analyzed E-GEOD-358 Transcription
profiling of rat whole brain samples from animals
with repeated exposure to the anaesthetic
isoflurane 12 Controls, 3 5-exposures, 3
10-exposures. Rats were exposed to 90 minutes of
1.0 isoflurane twice a day for a total of 5 or
10 exposures. Animals did not require intubation.
All exposures and hybridizations were performed
at the Univ. of Pennsylvania
13Open Biomedical Annotator
14Some Results
- Two microarray experiments (E-GEOD-4034,
E-GEOD-4035) contain the following set of terms
fear, hippocampus, mouse. - These microarray experiments study the role of
hippocampus in fear using mouse as the model.
15Analysis tools
- BioConductor
- GenePattern
- Genespring
16Intercommunity collaboration
- HCLS (BioRDF)
- MGED (ArrayExpress)
- NIF (NeuroLex)
- Ontology community (NCBO)
17Web of silos
18Semantic Web Brilliant Web!
19The End
20Discussion
- What is the RDF structure
- Extension of SPARQL to empower data analysis
- Workflow and provenance
- Visualization
- How to integrate database and literature
- Integration of other types of data
- Inter-community collaboration
- Translational use cases
21What should be the RDF structure?
- Experiments
- Samples
- Experimental conditions/factors
- Gene lists
- Arrays/chips
- Raw/processed data (e.g., CEL, GPR, gene matrix)
22Extension of SPARQL
- Hierarchical queries
- Statistical analyses/tests
- Enrichment analysis
23Workflow and provenance
- Taverna
- Biomoby
- Genepattern
24Visualization
25How to integrate database and literature
26Inter-community Collaboration
27What other types of data can be integrated with
microarray data
28Translational use cases