Title: Integrating -Omics
1Integrating -Omics
- Brent D. Foy, Ph.D.
- Associate Professor
- Department of Physics
- Wright State University
- Dayton, OH
2Overview
- Combining Genomic Data with Proteomic Data
- Which gene makes which protein?
- If mRNA level goes up, does the protein level go
up? - Biomolecular Network Modeling
- Issues
- State of the Field
- Our work
3Gene to Protein Identification
Partial table from Affymetrix rat gene tox chip
The J02722 is the GenBank nucleotide ID for
this gene.
4Gene to Protein Identification
- A Search for J02722 on GenBank
(http//www.ncbi.nlm.nih.gov/Genbank/) or EBI
(http//www.ebi.ac.uk/cgi-bin/emblfetch) brings
up gene information page. - Scroll down for protein id. GenBank gives link
for AA41346.1. EMBL gives links for EPD
EP31003 and Swiss-Prot P06762. Clicking on
link takes to information page on protein. - Match up Affymetrix gene id with protein id
provided by proteomics experiment. - Can do reverse, given protein id, find gene id.
5Gene to Protein Identification
- Since we have 150 identified proteins from
proteomics, and 1000 genes on Affymetrix gene
chip, we did the reverse approach (given protein,
find mRNA), and found 21 genes corresponding to
16 proteins that were present in both. - Discrepancy?
- AFFY and GenBank M25157 Rat Cu, Zn
superoxide dismutase, from Sprague Dawley, lung
cell line, 601 base pairs - AFFY and GenBank Y00404 - Rat mRNA for
copper-zinc-containing superoxide dismutase, from
Sprague Dawley, liver, 650 base pairs - Errors in public databases, or just incomplete
knowledge of mRNA or protein varieties
6Change in mRNA Expression vs Change in Protein
Expression
Ratio of expression in absence of galactose to
expression in presence of galactose
Ideker T, et al., Science, 292 929-934, 2001.
7mRNA Expression vs. Protein Level
8Time Course mRNA and Protein Levels
50 mM Hydrazine-exposed Hepatocytes
9Biomolecular Network Modeling
10Metabolic Network Modeling -Tracer studies
- Quantify activities of biochemical pathways
- For example, C-13 NMR analysis of TCA cycle and
gluconeogenesis in liver
11Genetic Regulation
- Genes expressed in distinct domains, precisely
delineated by time, state of cell, and level of
response. - This control is exerted by regulatory elements in
the promoter and enhancer regions of genes. - Field still young, but some quantitative results
are appearing.
- Feedback with other genes
12Biomolecular Network Modeling Issues
- Compared to standard modeling of kinetic
processes, challenges include - Stochastic reaction behavior due to random
diffusion processes and small numbers of
molecules - Multiple protein-protein, protein-mRNA, etc.
interactions - computational efficiency, parallelized code for
operation on multiple CPUs - Can you separate out the model for a pathway from
the whole cell?
13Biomolecular Network Modeling Task
gene A mRNA A prot A rxn A1 A2 gene B mRNA
B prot B rxn B1 B2 gene C mRNA C prot
C gene D mRNA D prot D
- Compounds other than genes are mobile
- Some of these mobile compounds affect many
reactions (e.g. ATP, ions)
14Biomolecular Network Modeling Finding the
Parameters
Use the simulation itself to narrow down on the
possibilities
1. Optimize on stability
Stable regions
Parameter 2
Parameter 1
2. Optimize on something else maximum energy
efficiency rapid cell division
15Biomolecular Network Modeling - State of the
Field
- E-Cell
- Virtual Cell
- Bio-Spice/Arkin
- Specific Laboratories Institute for Systems
Biology/Leroy Hoods group - Useful links page http//www.cds.caltech.edu/erat
o/links.html
16E-Cell
- From Laboratory for Bioinformatics, Keio
University, Japan - Attempt to integrate genes, RNA, proteins, and
metabolites of entire cell in one simulation - Freely available, http//www.e-cell.org/
17E-Cell
- Used to simulate a minimal cell based on
Mycoplasma genitalium - 127 genes
- Integrate with online databases
- Many parameters estimated
- Substances modeled include small molecules,
macromolecules, multi-protein complexes,
protein-DNA complexes - Multiple reaction types
18E-Cell, published results
Remove glucose from culture medium
ATP
Some mRNA levels
Time
Time
Tomita, M., et al. Bioinformatics, Volume 15,
Number 1, 72-84 (1999)
19Virtual Cell
- National Resource for Cell Analysis and Modeling
(NRCAM), located at University of Connecticut
Health Center - Access via internet, http//www.nrcam.uchc.edu/
- Has a graphical, biological users interface
- Compared to E-Cell
- Includes 3-d spatial information within cell
- Has not been applied to gene-gtmRNA-gtprotein-gtmetab
olites
20Virtual Cell
Define physiology, with reactions among substances
21Virtual Cell
Geometric results
22Bio-Spice
- Initiated at Berkeley National Laboratory,
http//gobi.lbl.gov/aparkin/index.html - Development of Bio-Spice is currently the subject
of a DARPA project - It will be a Simulation Program for Intra-Cell
Evaluation, like SPICE for circuit design - Intended to be a user-friendly simulation tool
that captures the network of molecular
interactions including gene-gene, gene-protein,
and protein-protein interactions.
23Institute for Systems Biology - Galactose in Yeast
Ideker T, et al., Science, 292 929-934, 2001.
24ISB - physical interaction network
Circles are genes, yellow means product affects
another genes transcription, blue means proteins
interact. Grayscale of circles is mRNA change
with galactose in medium.
Ideker T, et al., Science, 292 929-934, 2001.
25Development of Quantitative Tools - Transcription
RNA Polymerase
Activated Nucleotides
TFIII
TF_A
TF_B
DNA
B A TATA
mRNA sequence
Regulatory factors
26Development of Quantitative Tools - Transcription
(cont.)
State of Promoter kon for RNA Polymerase TATA A
B off any any 1e-99 (Mms)-1 on off off 1e-30 on
on off 5e-23 on off on 1e-99 on on on 5e-23
27Development of Quantitative Tools - Transcription
(cont.)
Gene A
B A TATA
product TF_A
Gene B
A TATA
product TF_B
Plus a first-order process for degradation of
TF_A and TF_B
28Development of Quantitative Tools - Transcription
(cont.)
Time course of binding to gene A promoter
Time course of number of TF_A
29Biomolecular Network Modeling - Future Tasks
- Ultimate goal is to provide physiological insight
on integrated genomic, proteomic, metabolic data
sets in response to toxicity interventions
- Establish contact with online databases
- Gene-gtprotein-gtmetabolite connections (KEGG,
others) - protein-protein interactions (published list,
Nature Biotech) - protein-DNA interactions (TRANSFAC, SCPD)
- Evaluate proper scale of modeling effort relevant
to task. Scale in both the level of biological
detail, and in terms of man-hours. - Choose software and gain expertise with it, or
create software as needed. - One early goal - explore minimal cell and its
stability in response to perturbation
30Collaborators
AFIT Dr. Dennis Quinn 2Lt Matt Campbell WSU Dr.
Tatiana Karpinets
AFRL Dr. John Frazier Dr. Charles Wang Dr. Victor
Chan AFOSR Dr. Walt Kozumbo
31Questions?
Integrating -omics