Title: BioSystems Synthesis: New optima demand new technologies
1BioSystems Synthesis New optima demand new
technologies
17-Sep-2003 Virtual Conference on Genomics
Bioinformatics
Thanks to DOE GtL DARPA BioComp PhRMA NHLBI
2HarvardMIT DOEGtL Center
C.Ting
Collaborating PIs Chisholm, Polz, Church,
Kolter, Ausubel, Lory, Kucherlapati
3Improving Models Measures
Why model?
Killer Applications Share, Search,
Merge, Check, Design (e.g. sequence 3D
alignment)
4Biosystems Integrating Measures Models
Environment
Metabolites
RNAi Insertions SNPs
DNA
Proteins
RNA
Replication rate
interactions
Microbes Cancer stem cells
Darwinian optima In vitro replication Small
multicellular organisms
5Why improve measurements?
Human genomes (6 billion)2 1019 bp Immune
cancer genome changes gt1010 bp per time point RNA
ends splicing in situ 1012 bits/mm3
Biodiversity Environmental lab evolution
Compact storage 105 now to 1017 bits/ mm3
eventually
How? (1K per genome, 108-1013 bits/ )
- The issue is not speed, but integration.
- Cost per 99.99 bp Including Reagents,
Personnel, - Equipment/5yr, Overhead/sq.m
- Sub-mm scale 1mm femtoliter (10-15)
- Instruments should match GHz / 2K CPU
6Examples of cost bottlenecks
Affymetrix 30M? microfabricator limited by
chemical reaction rate to one set of chips per
day. (10000X CPU cost) Electrophoresis limited
to 4000 bp/capillary/day. Fixed cost ratio of
capillaries to CPUs. (1e9X CPU cost)
7Projected costs determine when biosystems data
overdetermination is feasible.
In 1984, pre-HGP (fX, pBR322, etc.) 0.1bp/,
would have been 30B per human genome. In
2002, (de novo full vs. resequencing )
ABI/Perlegen/Lynx 300M vs. 3M 103 bp/ (4
log improvement) Other data I/O (e.g. video)
1013 bits/
8Steeper than exponential growth
Instructions Per Second
1965 Moore's law of integrated circuits 1999
Kurzweils law
http//www.faughnan.com/poverty.html http//www.ku
rzweilai.net/meme/frame.html?main/articles/art018
4.html
9Why single molecules?
(1) Integrate from cells/genomes/RNAs to
data (2) Geometry, cis-ness on a molecule,
complex, or cell. e.g. DNA Haplotypes RNA
splice-forms (3) Asynchronous dNTP incorporation
10Polymerasecolonies(Polonies) along a DNAor
RNAmolecule
HMS Shendure, Zhu, Butty, Williams Wash U
Mitra Ambergen Olejnik U. Del Edwards, Merritt
11Polymerase colony (polony) PCR in a gel
Single Molecule From Library
A
Primer is Extended by Polymerase
A
1st Round of PCR
Primer A has 5 immobilizing Acrydite
Mitra Church Nucleic Acids Res. 27 e34
12 Sequence polonies by sequential, fluorescent
single-base extensions
- Hybridize Universal Primer
- Add Red (Cy3) dTTP. Wash.
- Add Green (FITC) dCTP
- Wash Scan
3
5
3
5
G
C
A
T
C
G
C
G
T
.
.
.
13Inexpensive, off-the-shelf equipment
Automated slide fluidics 4K
MJR in situ Cycler 10K
Microarray Scanner 26K-100K
14Human HaplotypeCFTR gene45 kbp
Rob Mitra Vincent Butty Jay Shendure Ben Williams
15Quantitative removal of Fluorophores
Rob Mitra
16Sequencing multiple polonies
Template ST30 3' TCACGAGT Base added
(C) A G
T (C)
(A) G (T) C
(A)
3' TCACGAGT AGTGCTCA
(G) T C A
Rob Mitra
17- Multiple Image Alignment
- Metric based on optimal coincidence of high
intensity noise pixels over a matrix of local
offsets - (0.4 pixel precision)
181 micron bead sequences Correct signatures are
pseudocolored red, white, yellow noise
signatures blue and guide beads green.
19Polony exclusion principle Single pixel
sequences
Mitra Shendure
20Biosystems Integrating Measures Models
Environment
Metabolites
RNAi Insertions SNPs
DNA
Proteins
RNA
Replication rate
interactions
Microbes Cancer stem cells
Darwinian optima In vitro replication Small
multicellular organisms
21CD44 Exon Combinatorics (Zhu Shendure)
Alternatively Spliced Cell Adhesion
Molecule Specific variable exons are
up-or-down-regulated in various
cancers Controversial prospective diagnostic /
prognostic marker (gt1000 papers) Can full
isoforms resolve controversy and/or act as
superior markers? Eph4 murine mammary
epthithelial cell line Eph4bDD stable
transfection of Eph4 with MEK-1 (tumorigenic)
22Algorithm for RNA Polony Finding
1. Search Signature Image for qualified
objects a. gt 50 connected pixels with same
signature value b. solidity of gt 0.50 c. long
axis / short axis ratio lt 3 OR a. gt 25
connected pixels with same signature value b.
solidity of gt 0.80 c. long axis / short axis
ratio lt 1.5 2. Search for internal regional
maxima within each object (lest two adjacent
polonies with same signature get counted as
one) 3. Assign centroid locations as qualified
individual polonies
23RNA exon polony examples
24RNA exon examplesauto-regridded quan-titated
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
25Summary of Counts (RNA isoforms)
Eph4 murine mammary epthithelial cell
line Eph4bDD stable transfection of Eph4 with
MEK-1 (tumorigenic)
Jun Zhu
26Polony Flavors
- Replica plating of DNA images Mitra et al. NAR
1999 - Alternative RNA splicing combinatorics Zhu et
al. Science 2003 - Long range haplotyping Mitra et al. PNAS 2003
- Precise SNP-mutant mRNA ratios Merritt et al.
NAR 2003 - Fluorescent in situ Sequencing (FISSEQ) Mitra et
al. An.Bioch2003 - Tumor LOH Butz et al BMC Biotech. 2003
- Polony models Aach Church, submitted to JTB
2003 - http//arep.med.harvard.edu/Polonator/
27Biosystems Integrating Measures Models
Environment
Metabolites
RNAi Insertions SNPs
DNA
Proteins
RNA
Replication rate
interactions
Microbes Cancer stem cells
Darwinian optima In vitro replication Small
multicellular organisms
28Comparison of predicted with observed protein
properties (abundance, localization,
postsynthetic modifications)E.coli
Link et al. 1997 Electrophoresis 181259-313
(Pub)
29Multidimensional peptide measures
(Optionally protein separation steps)
3rd 2nd
30Prochlorococcus Proteogenomic Map
Numbers on top in basepairs. 1700 ORFs are
predicted . Proteomic Model is based on
Mass-spectrometry of peptides at 24h time points.
DifferenceMap indicates new peptide regions. The
6 colors represent ORFs in the 6 reading
frames .(Harvard-MIT GtL Jaffe, Church,
Lindell, Chisholm, et al. )
31Circadian time-series (Prochlorococcus) RNA
protein quantitation
RNA (3 AM)
RNA (3 AM)
R2.992 R2.635 Linear Regression
R2.1
(Harvard-MIT GtL Jaffe, Church, Lindell,
Chisholm, et al. )
32In vivo crosslinking DNA-binding proteins
33RNAs Proteomics Integration Next steps
- Detect a higher fraction of peptides
- (currently 80 proteins, 87 peptides
max, 19 average) - 2 Comparative proteomics, e.g. high vs low light
adapted) - Smoother time-series.
- Degradation
34Biosystems Integrating Measures Models
Environment
Metabolites
RNAi Insertions SNPs
DNA
Proteins
RNA
Replication rate
interactions
Microbes Cancer stem cells
Darwinian optima In vitro replication Small
multicellular organisms
35Synthetic Biology
- Test or manipulate optimality
- Program minimal cells (100kbp)
- Nanobiotechnology - new polymers
- Manage complex systems
- e.g. stem cells ocean ecology
36Suboptimality of mutants --integrating growth
rate flux data
Minimization of Metabolic Adjustment (MoMA) for
the analysis of non-optimal metabolic
phenotypes Daniel Segre, Dennis Vitkup
37MoMA/FBA REFERENCES
- Haemophilus influenzae metabolism (Schilling
andPalsson, J.Theor.Biol. 2000) -
Escherichia coli metabolic network and gene
deletions (Edwards and Palsson, PNAS 2000, BMC
Bioinf. 2000) - Helicobacter pylori (Edwards,
Schilling, Covert, Church, Palsson, J. Bact
2002) - Escherichia coli MOMA (Segre, Vitkup,
Church, PNAS 2003)
38(No Transcript)
39Fluxes include transport, a growth flux
Xiconst. ? ?vj0
Growth c1Xi c2X2... cmXm Biomass
40Biomass Composition
ATP
GLY
LEU
coeff. in growth reaction
ACCOA
NADH
FAD
SUCCOA
COA
metabolites
41FluxBalanceAnalysis core
Find maxGrowth using simplex
Null(S)v Sv0
42Can we use flux analysis to say something about
suboptimal states ?
43Flux ratios at each branch point yields optimal
polymer composition for replication
x,y are two of the 100s of flux dimensions
44Projection can leave the mutant feasible
spaceso Quadratic programming (QP) to find the
nearest point
4512C13CFluxRatio Data
46Flux Data C009-limited
200
WT (LP)
180
7
8
160
140
9
120
10
Predicted Fluxes
100
r0.91 p8e-8
11
13
14
12
3
1
80
60
40
16
20
2
5
6
4
15
17
18
0
0
50
100
150
200
Experimental Fluxes
250
250
Dpyk (LP)
Dpyk (QP)
200
200
18
7
r0.56 p7e-3
8
150
r-0.06 p6e-1
150
7
8
2
Predicted Fluxes
Predicted Fluxes
10
100
9
13
100
9
11
12
3
1
14
10
11
13
14
12
3
50
50
5
6
4
16
16
2
15
5
6
0
15
17
0
17
18
4
1
-50
-50
-50
0
50
100
150
200
250
-50
0
50
100
150
200
250
Experimental Fluxes
Experimental Fluxes
47Flux data (MOMA FBA)
48Competitive growth data
On minimal media
negative small
selection effect
C 2 p-values 4x10-3 1x10-5
Novel redundancies
Position effects
49Replication rate of a whole-genome set of mutants
Badarinarayana, et al. (2001) Nature Biotech.19
1060
50Replication rate challenge met multiple
homologous domains
thrA
1
2
3
1.1 6.7
metL
1
2
3
1.8 1.8
Selective disadvantage in minimal media
probes
51Multiple mutations per gene
Correlation between two selection experiments
Badarinarayana, et al. (2001) Nature Biotech.19
1060
52Synthetic Mini-genomes
- 90kbp genome? All 3D structures known.
- Comprehensive functional data too.
- 100X faster replication (10 sec doubling)
- selection to evolve widgets systems?
- Utility of mirror-image other unnatural
- polymers.
- Chassis power supply
53A 90 kbp mini-genome
54The in vitro assembly ( 3D structure) of the
prokaryotic ribosomes is known. (e.g. Nomura et
al. Noller et al.)
55All 30S-Ribosomal-protein DNAs mRNAs
synthesized in vitro
M 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21
DNA Template
RNA Transcript
Tian Church
56His-tagged ribosomal proteins synthesized in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 as
original constructs. RS1 required deletion of a
feedback motif in the mRNA. RS-3, 7, 8, 11, 14,
18, 19, 20 are still weakly expressed. Note that
S1, S4, S7, S8, S20, L1, L4, L10 are known to
repress their own translation (and are likely
titrated by rRNA). In progress Resynthesize
all genes with less structure.
Tian Church
57David Goodsell
58Biosystems Integrating Measures Models
Environment
Metabolites
DNA
Proteins
RNA
interactions
Microbes Cancer stem cells In vitro
replication multicellular organisms