Title: Model-data integration . Issues of flux optimality
1Model-data integration . Issues of flux
optimality polymer mechanics of 4D cell models
DARPA BIOCOMP 23-May-2002
Thanks to Harvard/MIT Team Jake Jaffe, Kyriacos
Leptos, Matt Wright, Daniel Segre, Martin
Steffen
2Post- 300 genomes 3D structures
gggatttagctcagttgggagagcgccagactgaa
gat ttg gag gtcctgtgttcgatccacagaattcgcac
ca
3DoD Relevance Accurate Bio I/O Engineering
Over-determined Calculable Protein folding
vs. crystallography Accurate Comprehensive/Quanti
tative Bio-Systems Embrace outliers Analytic
Synthetic Useful
Computer-Aided-Design (CAD)
gtgtINTEGRATIONltlt
4Technical challenge Integrating Measures
Models
Environment
Metabolites
RNAi Insertions SNPs
Protein in vivo in vitro interactions
RNA
DNA
Replication rate
Microbes Cancer stem cells
Darwinian In vitro replication Small
multicellular organisms
5(No Transcript)
6Human Red Blood CellODE model200 measured
parameters
ADP
ATP
1,3 DPG
NADH
3PG
NAD
GA3P
2PG
2,3 DPG
FDP
DHAP
ADP
PEP
ATP
ADP
F6P
ATP
PYR
R5P
GA3P
F6P
NADH
G6P
GL6P
GO6P
RU5P
NAD
LACi
LACe
X5P
S7P
E4P
ADP
NADP
NADP
NADPH
NADPH
ATP
GLCe
GLCi
Cl-
GA3P
F6P
2 GSH
GSSG
ADP
K
NADPH
NADP
pH
ATP
Na
ADP
HCO3-
ADO
AMP
ADE
ATP
ADP
PRPP
INO
IMP
ATP
ADOe
AMP
PRPP
ATP
INOe
Jamshidi, Edwards, Fahland, Church, Palsson, B.O.
(2001) Bioinformatics 17 286.
R5P
R1P
ADEe
HYPX
(http//atlas.med.harvard.edu/gmc/rbc.html)
7Linear Programming Flux Balance Analysis
Normalized optimal growth
(vko0)
Gene deletions
8Challenge 1 Suboptimality of mutants
--integrating growth rate and flux data
Minimal Perturbation Analysis for the analysis of
non-optimal metabolic phenotypes Daniel Segre
9(No Transcript)
10This is a Quadratic Programming (QP) problem
Minimize Dist?i(xi-ai)2 given Sxb x ? 0
Standard form
Minimize (xTQx)/2 aTx given Sxb x ? 0
11c2
test for prediction of essential genes
p 410-3
Optimal (FBA)
p 10-5
Suboptimal(MPA)
12(No Transcript)
13C009-limited
200
WT (LP)
180
7
8
160
140
9
120
10
Predicted Fluxes
100
r0.91 p8e-8
11
13
14
12
3
1
80
60
40
16
20
2
5
6
4
15
17
18
0
0
50
100
150
200
Experimental Fluxes
250
250
Dpyk (LP)
Dpyk (QP)
200
200
18
7
r0.56 P7e-3
8
150
r-0.06 p6e-1
150
7
8
2
Predicted Fluxes
Predicted Fluxes
10
100
9
13
100
9
11
12
3
1
14
10
11
13
14
12
3
50
50
5
6
4
16
16
2
15
5
6
0
15
17
0
17
18
4
1
-50
-50
-50
0
50
100
150
200
250
-50
0
50
100
150
200
250
Experimental Fluxes
Experimental Fluxes
14Technical challenge Integrating Measures
Models
Environment
Metabolites
RNAi Insertions SNPs
Protein in vivo in vitro interactions
RNA
DNA
Replication rate
Microbes Cancer stem cells
Darwinian In vitro replication Small
multicellular organisms
15Challenge 1 Suboptimality of mutants
--integrating growth rate and flux data
Minimal Perturbation Analysis for the analysis of
non-optimal metabolic phenotypes
16Challenge 2 integrating proteomics in vivo
crosslinking data
Polymer mechanics of 4D cell models (Automating
integration of data)
17Mapping genome foldingDNADNA, DNAprotein,
proteinprotein in vivo crosslinks
Dekker etal. Science 2002 2951306-11 Capturing
chromosome conformation.
18In vivo crosslinking DNA-binding proteins
19Multidimensional protein and peptide separations
for MS quantitation
Optional 1st 2nd Protein dimensions
Subcellular fractions,
Sizing of native protein complexes 1st
peptide Dimension Strong
Cation Exchange Charge 2nd peptide Dimension
Reverse Phase Chromatography Hydrophobicity
m/z
Retention time
min
3rd peptide Dimension Mass Spectrometry Mass per
charge
20?.
A.
rt1
rt2
rt3
C.
D.
21Minimal Cell Projects
- The first FULL proteome model would benefit
from a small number of natural cell states
genes. - 3D-structure of a cell during replication
motility. - Genome engineering / complete synthesis.
22Small sequenced genomes (excludes
organelle/symbionts)
- Mollicutes cell-wall-less bacteria, a subgroup
of Clostridia gram-positive - Acholeplasmataceae
- Acholeplasma, Anaeroplasma, Phytoplasma
- Mycoplasmatales
- Entomoplasmataceae (florum)
- Mycoplasmataceae pulmonis urealyticum pneumoniae
genitalium (mobile) - Spiroplasmataceae
Megabases
23Motility
Species nm/ sec Replicate Temp M.
mobile 3000 5 hr 25 M. pneumoniae
300 8 37 M. florum 0 1.5 30 U.
urealyticum 0 gt10 37 E.coli 20000 0.4 37
H. sapiens 1000 gt10 37 RNA Pol /
ribosome 20 (50 nt/s) E.coli DNA Pol3 300
(1000 nt/s)
24Attachment organelle replication
Seto S, Layh-Schmitt G, Kenri T, Miyata M. J
Bacteriol 2001 1831621 Visualization of the
attachment organelle and cytadherence proteins of
Mycoplasma pneumoniae by immunofluorescence
microscopy.
25Mycoplasma pneumoniae
Regula, et al, Microbiology 1471045-57, scale
bar 100 nm
26Hypothetical mechanisms
27Proteo-genomic mapping(of peptide datain 3
forward 3 reverse frames)
28Use of proteogenomic mapping to discover B. a
new ORF. C. a new ORF delete
an inaccurately predicted ORF.
D. N-terminal extension of
an existing ORF.
29Constraints
- Replication
- Membrane-bound polyribosomes
- Other RNA and/or protein complexes
- Metabolism
- DNA Structural Forces
30Genome folding cell 3D structure
Seto Miyata (1999) Partitioning, movement, and
positioning of nucleoids in Mycoplasma capricolum
J. Bact. 1816073 Cell 0.5 m 500-800 kbp
genome Extended diameter 80 m 200 transverses
with each membrane encoding gene anchored to the
cell surface. How to segregate this?
31Paired fork model
Dingman CW. Bidirectional chromosome replication
some topological considerations. J Theor Biol
1974 Jan43(1)187-95. Sundin O, Varshavsky A.
Terminal stages of SV40 DNA replication proceed
via multiply intertwined catenated dimers. Cell.
1980 Aug21(1)103-14. Hearst JE, Kauffman L,
McClain WM. A simple mechanism for the avoidance
of entanglement during chromosome replication.
Trends Genet. 1998 Jun14(6)244-7. Bouligand,
Y, Norris V (2000) Both replication forks appear
to be part of a single complex or factory, as
first proposed by Dingman. http//wwwmc.bio.uva.n
l/texel/tekst/norris.html Roos M, Lingeman R,
Woldringh CL, Nanninga N. Biochimie 2001
Jan83(1)67-74 Experiments on movement of DNA
regions in Escherichia coli evaluated by computer
simulation.
32Constraints
- Replication
- Membrane-bound polyribosomes
- could anchor the RNA polymerase and hence the
genes DNA to within 20 nm of the cell surface. - Other RNA and/or protein complexes
- Metabolism
- DNA Structural Forces
33Side view, no replication (gene)
Origin Blue First MPN gene Green Mid gene
344 (ter) Red Last gene 688
34Off-axial view, no replicated segments,unoptimize
dmembrane
Yellow Membrane Pink Ribosomal White
Hypothetical abundant Green Misc. abundant
Blue Weak
35Axial view, no replicated segments
Yellow Membrane Pink Ribosomal White
Hypothetical abundant Green Misc. abundant
Blue Weak
36Side view, no replicated segments
Origin Yellow Membrane Pink Ribosomal White
Hypothetical abundant Green Misc. abundant
Blue Weak
37Side view, no replication (dis from ori)
Origin Blue Origin of replication Red Terminus
38Simple example cost function for chromosome
structure optimization
39Searching six helical parametersfor chromosomal
fold
s
E_final
2002_5_16_h18_42 31.5783 0.0595431
0.444777 -0.148005 -0.12554 39.676
0.007241 2002_5_16_h19_0 61.4522 0.046929
-0.0010534 -0.37642 0.64887 -7.9804
-0.1281 2002_5_16_h19_19 91.2823 0.075882
0.16159 -0.2373 1.0718 8.0774
0.076364 2002_5_16_h19_34 45.8961 0.10725
0.165795 -0.292295 -0.0370155 46.2283
0.3454 2002_5_16_h19_42 38.601 0.0410951
0.363854 0.154569 0.0889424 24.162
0.1203 2002_5_16_h20_3 35.3927 0.0355828
-0.434093 0.17439 0.0015235 -24.9479
-0.02968 2002_5_16_h20_30 36.5715 0.0495523
0.0201888 0.533363 0.04049 -11.7067
-0.0717 2002_5_16_h20_50 108.2712 -0.03419
0.366322 -0.216694 -1.30726 -23.67
0.0181 2002_5_16_h21_5 45.4948 0.022745
0.44564 -0.26902 -0.18342 -9.5072
0.27189 2002_5_16_h21_50 50.4768 0.172497
-0.282122 -0.285109 0.478558 -46.2911
0.2758 2002_5_16_h21_56 37.6382 0.0304836
0.398325 0.201159 0.0797413 17.013
-0.81 2002_5_16_h23_41 35.4194 0.0445114
0.532795 0.0134364 0.117782 -42.2785
0.451 2002_5_17_h0_2 39.8033 0.11543
-0.006943 -0.426032 -0.128618 -35.8674
-0.03049 2002_5_17_h0_10 62.7409 0.0093794
0.040845 -0.10502 0.35003 3.4834
0.23764 2002_5_17_h4_12 47.0811 0.116387
0.146311 -0.520041 -0.28928 20.3289
0.1700 2002_5_17_h4_20 33.5733 0.096
0.00628 0.547581 0.0413792 22.1782
-0.1598 2002_5_17_h4_29 41.1507 0.167149
0.422391 0.126038 0.59806 38.4758
0.1079 2002_5_17_h4_35 46.4101 0.0765229
0.106407 0.460038 0.350776 12.6997
-0.01097 2002_5_17_h4_50 31.2508 0.0209708
0.484708 -0.131666 0.0525948 17.7536
-0.07883 2002_5_17_h5_41 41.8434 0.0638499
0.411257 0.20358 0.380453 19.9535
-0.04410 2002_5_17_h5_54 31.7824 0.0219507
0.568525 -0.0296989 -0.25155 10.4541
0.01661 2002_5_17_h6_39 42.8122 0.21156
0.003633 -0.502632 0.315238 -61.1441
0.39604 2002_5_17_h6_45 31.5284 0.026136
0.52898 -0.0904436 -0.0902993 -25.0525
0.1101 2002_5_17_h7_17 44.8789 0.069805
-0.00365152 -0.539196 0.179759 -18.5657
0.0189 2002_5_17_h7_26 110.863 0.231782
0.311698 0.218959 -1.51978 11.0336
0.01407 2002_5_17_h7_34 27.5664 0.0463924
0.44446 0.077077 -0.237724 -26.988
-0.0272 2002_5_17_h7_51 43.5492 0.0300962
0.230355 0.293637 0.0425634 12.5355
-0.0275 2002_5_17_h8_15 44.922 0.107868
0.0263435 -0.554559 -0.298406 -18.3352
0.04061
40Monte carlo minimization of the model fit to
constraints.
412002_5_17_h5_54 70.5984 31.7824
422002_5_16_h20_3 95.1449 35.3927
432002_5_17_h4_20 92.7126 33.5733
442002_5_17_h4_50 749.4929 31.2508
45data_2002_5_19_h0_40
46data_2002_5_16_h18_42
47data_2002_5_16_h19_34
48data_2002_5_16_h21_50
49data_2002_5_16_h19_42
50data_2002_5_16_h21_56
51data_2002_5_16_h20_3
52data_2002_5_16_h19_0
53data_2002_5_16_h20_30
54data_2002_5_16_h21_5
55Avoidance of entanglement throughout cell cycle
Origin Blue Left replicated segment
(yelgrhigh gene) Red Right (i.e. middle)
segment Aqua unduplicated segment of the
circular genome
56M. pneumoniae genes generally point away from Ori
More significant if abundance data are
integrated Alignment of known
motors Polymerases,b ribosomes, F1 ATPase
57Biospice 2.0 Deliverables toolsets for data
integration optimality assessment
- 1QP MPA flux growth modeling
- 2 4D-model current plan
- Chromosome segregation
- Membrane-bound polysomes
- Ribosomal protein/rRNA assembly
- Motility (coordination with replication origin)
- Next few months
- Other protein complexes
- Space filling metric
- Replication entanglement metric
- In vivo crosslinking