Title: How the Genome and the Computer have Changed Biological Science
1How the Genome and the Computer have Changed
Biological Science
David Botstein NIGMS
Lewis-Sigler Institute
for Integrative Genomics Princeton University
2Genomics is a new science the concept of genome
is old
Genome (Oxford English Dictionary, 2nd
Edition) A haploid set of chromosomes the
sum-total of the genes in such a set. a. G.
genom (H. Winkler Verbreitung u. Ursache d.
Parthenogenesis (1920) iv. 165), irreg. f. gen
GENE 1chromosom CHROMOSOME . The OED gives
citations for the use of this word in 1930, 1932,
1965 and 1970.
How far understanding of the genome has
progressed is illustrated by the 1970 OED
citation 1970 Sci. Amer. Oct. 19/1. The human
genome.. consists of perhaps as many as 10
million genes.
3Origins of Genomics I
During the first half-century of genetics (ca.
1900-1960). the emphasis was on the mechanics of
inheritance
How genetic information is stored and
transmitted mutations, genes,
chromosomes, mitosis, meiosis,
recombination, mutagenesis, DNA DNA transfer.
How genetic information is expressed
transcription, translation, genetic code,
regulation, protein structure-function,
macromolecular assembly, protein
localization.
The central dogma genetic information flows
from DNA -----gt RNA -----gt Protein
4Origin of Genomics II
In the 1960s, the emphasis began to shift toward
a more ambitious goal to understand how each
and every gene of an organism contributes to
its biology.
The first programs explicitly aimed at finding
all the genes were undertaken with
bacteriophages they contain ca. 100 genes.
These programs depended on isolation and study
of conditional-lethal mutants.
The phages were attractive because of their
size and simplicity. They were the first
organisms to be well-characterized from the
point of view of the then-novel science called
molecular biology
5Cold Spring Harbor Symp. Quant. Biol. (1963)
6Origins of Genomics III
Surprisingly soon (mid-1960s) this program was
applied to simple eukaryotic organisms first
yeast and worms (C. elegans) and, later, in a
more focused way, flies (D. melanogaster) and
plants (A. thaliana).
These organisms were chosen because they are
fast- growing and easily studied by genetic
methods as well as biochemical ones. They also
have small genomes.
The founders of these new research communities
successfully adapted not only the scientific
ideas, but also the cooperative spirit that had
animated the phage group. They put much effort
into teaching.
These became the model organisms, whose
genomes were sequenced first.
7Experimental Design Two-color Fluorescent
Hybridization
Reverse transcribe each sample using a different
fluorescent nucleotide (Cy3 or Cy5)
8Extracting Data
Cy3
Cy5
9The Display Connects Expression Data with Biology
10Goals of DNA Microarray Research
Biological Insights and Discoveries.
To this end,
Design experiments that take advantage of the
ability to measure gene expression on a
genomic scale
Emphasize patterns of gene expression over
individual measurements. Correlated patterns
are more robust than individual measurements.
Emphasize hypothesis generation over hypothesis
testing, although tests involving patterns
can be very powerful.
Emphasize methods that allow ordinary,
non-mathematical biologists to browse and
manipulate all the data, not just selected
subsets.
Emphasize cumulative methods wherever possible
comparison of gene expression patterns, like DNA
sequences, is useful per se.
11The Intellectual Impact of the Genomic View
The grand unification of biology all the
functional parts of all living things are
related by lineage. Despite the diversity, the
fundamental biological mechanisms must also
ultimately be related.
Once we understand the biology of E. coli, we
will understand the biology of the elephant
---Jacques Monod, ca.1960
The challenge for the future is to understand
not just mechanisms at the individual process
level, but also the interactions among all the
processes and their mechanisms.
Genomics makes possible experiments and
analysis at the systems level. Because of the
huge combinatorial possibilites for
interactions, this means not just highly
parallel experimental methods but also
computation-intensive analysis.
12System-Level Diagram for Regulation in
Bacteriophage ?
Diagram by Ira Herskowitz, ca. 1975
13Environmental Stress Response
Gasch et al., 2000
14 Jacques Monod and Leo Szilard at Cold Spring
Harbor
15The Chemostat Continuous Culture at Steady State
Rate-limiting nutrient altered concentration in
the fresh medium input results in change in
density in the culture vessel.
16Theory of the Chemostat
- Combined relation predicts exponential growth or
washout unless terms balance
Steady state is reached when m D
17Conditions for Steady State
- Runs continuously, at a constant flow rate set by
the experimenter, producing a known dilution rate
(D). - The medium has one known limiting nutrient S,
others are in excess - The dilution rate (D) is less than the maximal
growth rate of the organism (µ)
18Growth rate accomodates to environment
Bacteria and yeast will reach steady state for
any D lt mmax under virtually any kind of
nutrient limitation regime, where mmax is the
maximum growth rate in that regime.
19Low potassium limits steady-state biomass
Steady-state cell number (millions of cells per
mL)
residual phosphate (mM)
K mM
20How low is low potassium?
Medium K
Sea Water 0.5 mM low K media 1.3
mM Blood 3.7 mM normal K media
13.0 mM
21Transcriptional Response to Low Potassium
K
Steady-state cell number (millions of cells per
mL)
K mM
fold change
22Transcription arrays implicate altered nitrogen
metabolism in low potassium media
K
Gene Fold Transports GNP1 12 gln DIP5 11
glu asp TAT2 10 trp tyr AGP1 7 asn,
gln, others PTR2 6 di, tripeptides BAP2
5 leu,ile val MUP1 4 met, cys
Gene Fold Transports MEP1 5 ammonium MEP2
30 ammonium GAP1 30 amino acids
Genes that change gt 3-fold at 1.3 mM K
23Ammonium ion is toxic in low potassium media
K
Steady-state cell number (millions of cells per
mL)
NH4 mM
24Ammonium is required for nitrogen toxicity in low
potassium media
Steady-state cell number (Klett)
Nitrogen source
25Amino Acids are Excreted into the Medium
when the growth medium contains
low potassium and high ammonium
Extracellular amino acid nitrogen (nmoles per
million cells)
K
NH4
26Hypothesis Ammonium Ions are Transported By
Potassium Channels
ion ionic radius K 1.33 Å NH4
1.44 Å
NH4 and K
Amino Acids
SPS amino acid transporters
K channel
27Constitutive Expression of Ammonium Ion
Transporters is Lethal
Genes (MEP1, 2, or 3) encoding NH4 transporters
were fused to the GAL1 promoter, which is
expressed in galactose, but not in glucose
media.
Wild type parent
MEP1 (high flux)
MEP2 (low flux)
MEP3 (high flux)
Yil028w (control)
Glucose
Galactose
28Constitutive Expression the MEP Genes Causes
Lethality Only in Media Containing Ammonium Ion
Wild type parent
MEP1 (high flux)
MEP2 (low flux)
MEP3 (high flux)
Yil028w (control)
Galactose Asparagine
Galactose NH4
Constitutive MEP expression is not lethal when
asparagine, and not ammonium ion, is the
nitrogen source.
29Overexpression of the MEP genes causes amino acid
excretion in high potassium
Extracellular amino acid nitrogen (nmoles per
million cells)
30Conclusions
Under conditions of limiting potassium and
substantial concentrations of ammonium ions, the
latter become toxic because they can enter the
cell through the potassium transporter(s),
possibly because NH4 and K ions have similar
ionic radii. Constitutive over-expression of
the MEP family of NH4 permeases causes ammonium
ion toxicity, triggering amino acid excretion,
even in the presence of excess potassium. So
ammonium toxicity is independent of potassium
limitation. Thus, like other eukaryotes, yeast
appears to have a mechanism to detoxify excess
ammonia nitrogen. Unlike the higher eukaryotes,
yeast do this directly, by excretion of amino
acids.
31Fundamental Questions aboutPhysiology in the
Chemostat
- What is relation between chemostat and batch
culture? - Is prolonged growth under nutrient limitation a
fundamentally different cellular state than batch
growth? - Are cells in the chemostat undergoing a stress
response?
32Limiting Phosphate Batch Growth Curve
Cell Count (Coulter counter)
Cell Counts (Coulter Counter)
Saldanha et al., 2004
33Cell Morphology
As the cells run out of phosphate, they arrest as
unbudded (G0/G1)
light microscopy
Unbudded Small Bud Large Bud
34Gene expression during the batch growth in a
defined limiting nutrient
Chemostat
Batch
35Overview of Global Expressionin Phospate
Limitation
910
920
mRNA was collected at intervals during the
two batch experiments, labeled with Cy5. mRNA
From chemostat-grown cultures labeled with Cy3
was the reference for all the arrays,
Batch and chemostat grown cultures are similar at
the moment in batch growth when the limiting
nutrient is exhausted.
36Average Difference in Gene Expression between
Batch and Chemostat
37Conclusions (Saldanha et al. 2004)
Cells growing exponentially at steady state in
the chemostat are in virtually the same state,
as measured by the genome- wide pattern of gene
expression, as cells in batch culture just
before the phosphate is exhausted.
Specifically
Minimal stress response before the nutrient is
exhausted
No sign of cell cycle arrest before the
nutrient is exhausted
So despite the growth limitation in chemostats,
cells are in balanced growth
Cells are Poor, not starving
38Morphological Consequences of Diverse Starvation
Regimes
natural limitations ----gt evolved, organized
response, including cell cycle arrest, that is
absent when auxotrophs starve.
39Getting to the System-Level with Genomic Tools
Homeostasis How do cells manage to adapt
to so many unpredictable changes in conditions?
Which responses are generic, and which are
specific?
Perturbations Well-defined, often transient,
changes in conditions should reveal regulatory
logic that maintains homeostasis at the
module or system level.
Experimental Evolution Subjecting populations
to a well-defined constant selective
pressure. The range of evolved responses
should be constrained by the organization of
the functional and regulatory systems.
40Environmental Stress Response or Growth Rate
Response?
Two points of view. A. Nutritional
depletion and starvation are treated like "true"
stresses, like heat shock, osmotic shock and
oxidative damage. B. The "true" stresses
cause a decrease in growth rate, which triggers
a secondary "growth rate response"
41Dilution Rate Series
Experimental design Run chemostats with a
variety of limiting nutrients (glucose,
phosphate, sulfate, ammonia and (in suitable
auxotrophs) uracil or leucine at a number of
different growth rates (m) by setting the
dilution rate (D).
Measure 1. Bud Index fraction of unbudded
cells (i.e. in G0/G1) and DNA content
(FACS). 2. Nutritional parameters (residual
glucose, ethanol, etc.) 3. Gene expression
(DNA microarrays)
42Fraction of Unbudded Cells as a Function of
Growth Rate in Chemostats Limited by Diverse
Nutrients
Glucose Leucine Uracil Ammonia
Phosphate Sulfate
slower faster
Growth rate (µ) Dilution rate (D)
43Hierarchical Clustering of Dilution-Rate Series
Expression Data
Nutrient-specific clusters Glucose
1 Leucine Phosphate Sulfate Glucose
2 Nitrogen Glucose 3 4
G N P S L U
44 Average expression of limitation-specific
clusters
G N P S L U
Glucose NH4 Phosphate Sulfate
slower faster
45Singular Value Decomposition Analysis
Condition
Ammonium
Phosphate
Uracil
Glucose
Leucine
Sulfate
1
Eigengene
Alter et al, 2000
36
46Several Orthogonal Eigengenes are Still
Correlated with Growth Rate
Eigengene Information () 1
44 2 18 3
8 4 6
G N P S L U
slower faster
47PISA (Progressive Iterative Signature Algorithm)
Modules
correlations with growth rate
Kloster, Tang Wingreen, 2004
module cross- correlations
48The growth rate signal is apparent in the
expression of many genes
49Distribution of Slopes
50The stress response
Gasch et al. (2000)
51Distribution of Slopes
52Phasogram
800 periodically expressed genes identified by a
Fourier algorithm and sorted by phase
Spellman et al., 1998
53Distribution of Slopes of Cell CycleRegulated
Genes
54Distribution of Slopes of Cell CycleRegulated
Genes
55Genes ranked by response to growth rate
Top of list
Bottom of list
56Stress response genes are highly growth rate
correlated
Top of list
Bottom of list
57Morphological Consequences of Diverse Starvation
Regimes
natural limitations ----gt evolved, organized
response, including cell cycle arrest, that is
absent when auxotrophs starve.
58Fraction of Unbudded Cells as a Function of
Growth Rate in Chemostats Limited by Diverse
Nutrients
Glucose Leucine Uracil Ammonia
Phosphate Sulfate
slower faster
Growth rate (µ) Dilution rate (D)
59(No Transcript)
60 Warburg Effect in Yeast ?
Nutrient-limited batch growth
Glucose utilization
Viktor Boer
61Survival During Starvation of a Depends on Which
Nutrient is Missing
leucine or uracil missing
62Conclusions (still a work in progress)
There is a "growth rate response" that is
distinguishable from the "environmental stress
response" the latter is seen only in actual
starvation.
The growth rate response includes a regulatory
mechanism that continuously limits entry to
the cell cycle regardless of the nature of the
nutrient limitation. This mechanism is
distinguishable from the one that causes
accumulation cells in G0/G1 during starvation
(as opposed to limitation) for phosphate or
sulfate.
The growth rate response includes regulatory
mechanism that limits fermentation of glucose at
low growth rates like the cell cycle response
to starvation, this mechanism works for
"natural" limitations but not limitations caused
by nutrients that satisfy an auxotrophic
requirement.
The uncontrolled fermentation in nutrient-limited
auxotrophs may be analogous to the Warburg
Effect seen in animal tumor cells.
63The Chemostat Continuous Exponential Growth at
Steady State
Rate-limiting nutrient altered concentration in
the fresh medium input results in change in
density in the culture vessel.
Primary selection for organisms better able to
use the rate-limiting nutrient
64Evolution in the chemostat Three cultures that
evolved independently under glucose limitation
show similar changes in gene expression
genome-wide as determined with DNA
microarrays. Glucose is redirected from
fermentation to respiration. Glucose uptake is
increased. Ferea et al., 2000
65Microarray Analysis of Gene Expression in Three
Independently Evolved Strains
(Ferea et al., 1999)
66Genome-Wide Summary
Eight independently evolved diploid
strains 3 similar breaks on 14 3
amplifications on 4 3 overlapping deletions on
15
Dunham et al. 2003
67Recurring copy number changes
local amplifications (hexose transporters)
local amplifications (sulfur transporter)
Maitreya Dunham
68Affymetrix Yeast tiling arrays provide complete
and redundant coverage of the genome
5-CTGAATATGCATTGAAATAAGATCC
ATATGCATTGAAATAAGATCCAAAC
GCATTGAAATAAGATCCAAACAGCT
TGAAATAAGATCCAAACAGCTAAGA
ATAAGATCCAAACAGCTAAGAACAG
GATCCAAACAGCTAAGAACAGGAAA 3-GACTTATACGTAACTTTAT
TCTATGTTTGTCGATTCTTGTCCTTT
probes
sample
69Comparison of two genomes to model decrease in
hybridization due to SNPs
- Polymorphic strain
- RM11-1A
- High quality sequence of wild strain
- 24,848 isolated SNPs overlapped by 123,016 probes
- Hybridization intensities reflect effect of
mismatch on maximal binding
- Nonpolymorphic strain
- S288C
- Reference sequence represented on array
- Hybridization intensities reflect maximal binding
of complementary DNA
70Hybridization decrease in presence of SNP is
related to position within probe
71Derivation of SNPscanner Algorithm
experimental observation xi
72Sensitivity and Specificity of Mutation Detection
Analysis of sequenced strain identifies gt90 of
30,690 known SNPs
0
1
3
5
7
Red numbers are prediction scores
(log-likelihood ratios Lk)
10
25
50
True Positives
100
experimental observation xi
False Positives
73Genome-wide mapping of polymorphisms at
nucleotide resolution
CTGAATATGCATTGAAATAAGATCC ATATGCATTGAAATAAGATC
CAAAC GCATTGAAATAAGATCCAAACAGCT
TGAAATAAGATCCAAACAGCTAAGA
ATAAGATCCAAACAGCTAAGAACAG
GATCCAAACAGCTAAGAACAGGAAA GACTTATACGTAACTTTATTCTA
TGTTTGTCGATTCTTGTCCTTT
TAAATCTGATGTGCGAGATTGAGAT TCTGATGTGCGAGATTGAGA
TAAAT ATGTGCGAGATTGAGATAAATAACC
GCGAGATTGAGATAAATAACCATGC
GATTGAGATAAATAACCATGCAAAA
GAGATAAATAACCATGCAAAAAAGC ATTTAGACTACACGCTCTAAGTC
TATTTATTGGTACGTTTTTTCG
TACACTAAGTTCCAGGGCAAAAGTG CTAAGTTCCAGGGCAAAAGT
GATTG GTTCCAGGGCAAAAGTGATTGCCCA
CAGGGCAAAAGTGATTGCCCAAGAA
GCAAAAGTGATTGCCCAAGAAAACC ATGTGATTCAAGGTCCCG-TTTC
ACTAACGGGTTCTTTTGG
GTATATTAGAAACCCGATAATGGCT
ACCCGATAATGGCTAAAACTTTGAT
GATAATGGCTAAAACTTTGATGGAA
ATGGCTAAAACTTTGATGGAAGCGA
CTAAAACTTTGATGGAAGCGACCCA CATATAATCTTTGGGCTATTACC
CATTTTGAAACTACCTTCGCTGGGT
32758G?C
32064C?G
31844G?T
32924?T
D. Gresham and Kruglyak laboratory
74Spontaneous Deletion in GAP1 Detected with a
Tiling Array
513197-517103 (3906bp)
AGATAATTGTTGGAT
?(513193-513209)-(517115-517131) (3922bp)
75Loss of GAP1 via LTR-LTR recombination is common
76Finding can1 Mutations Among the Thousands of
SNPs that Differ Between Strains S288C and CEN.PK
Independent CanR mutations isolated in CEN.PK and
analyzed with tiling arrays and SNPscanner.
can1 mutations in black, polymorphisms in red.
32119G?T
33346G?A
33002T?G
33169G?A
32811?C
32077T?C
32487G?T
32195C?A
32304C?A
32842T
32580C?G
31867A?G
77Selection of spontaneous mutants is associated
with 1-2 additional sequence-verified SNPs
78What is the mutational cost of transformation?
One additional SNP predicted genome-wide !
(i.e. no more, maybe less than a spontaneous
mutant)
79CEN.PK, an unsequenced strain, has high sequence
identity with S288C
chrI
80Estimating the the extent of nucleotide variation
accumulation associated with adaptive evolution
(work in progress)
Maitreya Dunham followed a haploid yeast (CEN.PK)
in a sulfate-limited chemostat for 123
generations. She isolated clones at 63
generations and 123 generations. These clones
showed amplification of a region that includes
SUL1, which encodes a high-affinity sulfate
transporter. The clones were studied for
mutations on the tiling array and the result
analyzed with SNPscanner. Since CEN.PK is not
identical to the sequenced yeast strain, S288C,
any mutations must be detected along with all
the polymorphisms that distinguish CEN.PK and
S288C.
The preliminary conclusion is that there are
surprisingly few point mutations (of the order
of 10) in the evolved strains
81SNPs associated with "evolved" strains
Haploid and diploid strains were grown in
glucose-limitation for about 200 generations.
Clones isolated from these cultures contained
genomic rearrangements, but, again, and again
remarkable few additional point mutations.
82Acknowledgments
Alok Saldanha (phosphate and sulfate limitations
in batch and chemostat) Matt Brauer (dilution
rate series in chemostat) Rachel Rosenstein
(dilution rate series in chemostat) Viktor Boer
(glucose wasting and survival after
starvation) David Hess (potassium limitation and
ammonium ion detoxification) Maitreya Dunham
(evolution in the chemostat) David Gresham
(mutation detection) Douglas Ruderfer, Stephen
Pratt Joseph Schacherer (SNPscanner
analysis) Leonid Kruglyak Joshua
Rabinowitz Ned Wingreen