Title: Many genes, but what do they do?
1Many genes, but what do they do?
6000
12,000
19,000
35,000?
2 Genome sequence enables systematic
approaches experiment Genome
sequence Experimental Experimental re
agents data (e.g.
microarrays, knockout libraries)
Experimental data Genome/proteome (e.g.
protein complex mass-spec)
Genome/proteome Computationally
Experimental derived hypothesis test
3- Systematic/High-throughput studies
- Reagents/ Libraries Datasets
- - Microarrays Websites/databases
- Yeast KO library
- C. elegans RNAi library
- - Yeast C-terminal ORF-GFP
- fusion library
- Experimental Resource for Informational
- the Scientific Community Resource for the
- Scientific Community
4- Systematic/High-throughput studies
- Reagents/ Libraries Datasets
- - Microarrays Websites/databases
- Yeast KO library
- C. elegans RNAi library
- - Yeast C-terminal ORF-GFP
- fusion library
- Experimental Resource for Informational
- the Scientific Community Resource for the
- Scientific Community
5High-throughput studies False Negative
Failure to observe a phenotype, protein-protein
interaction etc. that normally occurs in the
organism under study. False Positive Observe
a phenotype, protein-protein interaction etc.
that does not occur in the organism under study.
Quality control / verification of the genomic
reagents is important for reducing false
negatives and positives. High-throughput data
will always have False Negatives. Obviously best
if method and resulting data has low level of
False Negatives. High-throughput data is most
useful to the researcher and community if False
Positives are low decreased likelihood of
researchers chasing after false result.
6(No Transcript)
7Many genes, but what do they do?
6000
12,000
19,000
35,000?
8 Gene function (gene product function)
In what biological processes does the gene
product act? What is the biochemical function of
the gene product? Where and when does the gene
product function?
9 Biological Process (Systems Biology) To
execute the biological process, what are all the
gene products that are necessary? How do all
the gene products act together in a
pathway/network/machine to execute the biological
process?
10Approaches to function One fundamental approach
to understanding gene function is to examine the
cellular/organismal phenotype following loss
of gene function.
11Gene loss-of-function or gene inactivation is
achieved by Deletion Point mutation (nonsense
or missense) RNA mediated interference
(RNAi) Transposable element insertion
Preferable if gene activity is completely
eliminated (null).
12The loss-of-function phenotype allows one to
infer wild-type gene function. The wild-type
function is what is necessary to correct the
observed phenotype (the opposite of the mutant
phenotype). - CDC28 mutant in yeast has a G2
cell cycle arrest phenotype therefore the
wild-type of the CDC28 gene is to promote G2 cell
cycle progression. -ced-3 mutant in C. elegans
is defective in apoptosis therefore the
wild-type function of the ced-3 gene is to
promote apoptosis. The greater depth of
phenotypic characterization the greater the
specificity of the definition of wild-type
function.
13 High-throughput / systematic genetic analysis
of gene function - S. cerevisiae (yeast)
deletion collection - C. elegans RNAi screens
Provides first pass functional information on
almost all genes in the respective genomes.
More importantly, the yeast deletion collection
and C. elegans RNAi feeding bacteria library
represent lasting resources that can be used by
the community. Since the community will use the
same resource, the data from lab to lab will be
more comparable than de novo efforts by
individual labs (e.g. isogenic strain
background).
14- High-throughput genetic analysis of gene function
- For first pass functional information the
phenotypic analysis must be fast and simple. - Visible Phenotypes (e.g. alive or dead) Static
- More complex phenotypic screening that provides
more functional information is less amenable to
high-throughput - Quantitative phenotypes (growth rates, enzyme
activities, life span) - Real-time visible phenotypes (microscopy)
- Molecular marker phenotypes (presence/absence,
temporal and spatial distribution of gene product
or metabolite static, e.g. antibody staining
real-time, e.g. GFP tagged protein
15Biology is complex Many gene products act at
multiple times or continuously. - Cdc2 kinase
acts at multiple points in the cell cycle. -
Notch receptor signaling acts in multiple
developmental decisions (sequentially and
contemporaneously). Some (many?) gene
products have diverse biochemical functions. -
Yeast and mammalian mitochondrial transcription
factor B is both a transcription factor and an
adenine methyltransferase. - Hexokinase is
both an enzyme and a glucose sensor that binds
intracellular molecules to report phosphoglucose
levels. Such complexities are currently beyond
high-throughput genetic/ functional methods.
16(No Transcript)
17S. cerevisiae Knockout Collection Winzeler et
al., Science, (1999) 285901-906 Giaever et al.,
Nature (2003) 418387-391 Deletions constructed
in 5,916 genes (96.5 of total). Deletion,
from start codon to stop codon for each predicted
gene are generated by homologous recombination,
using oligo-nucleotide primers that target the
recombination to the specific gene, with a KanR
marker to select for the integration event. Each
deletion was verified by several PCR assays
(quality control). Strategy is feasible in
yeast because of the highly efficient homologous
recombination system and because yeast genes are
small, most genes have no introns or when there
are introns they are few and small.
18Gene knock-out by homologous recombination
YAL069w
YAL068c
YAL068c
Rothstein and Szostak, 1981
19KanR
45 nt upstream of gene
45 nt downstream of gene
PCR
KanR
Selectable marker flanked by sequences flanking
target gene
Baudin et al., 1993
20- Summary of first pass functional data
- - 1,105 (19) essential genes (growth in rich
glucose media) - 4811 (81) non-essential genes
- 8 of non-essential genes have a closely
related homolog - (p lt e-150) elsewhere in the genome while only
1 of the essential - genes have a closely related homolog.
- Of essential genes 4 (15/356) were previously
described as - non-essential and 0.2 (3/1620) of the
non-essential genes were - previously described as essential. (Different
results likely a result of - strain differences or in growth conditions.)
- Essential genes are more likely to have homologs
is other organisms. - - Essential genes are more highly expressed than
non-essential genes.
21- How to begin to understand the function of genes
whose - deletion is not essential for life under optimal
laboratory conditions? -
- Such a gene may have a function under specialized
growth conditions. Growth advantages or
disadvantages can be revealed by competitive
growth experiments over multiple generations. - Each gene deletion is marked with a sequence
specific barcode - (UPTAG and DOWNTAG) of 20 bases.
- The barcode scheme allows multiple deletion
strains to be analyzed in parallel in competitive
growth experiments.
22Each deletion strain is tagged with two unique
20mers
kanMX4
Tag 1
PCR
Tag 2
kanMX4
Ron Davis et al., Stanford
23 Two tags Per Deletion Strain
Deletion strains
List of 20mer tags
1U. GATTCGATAGCCGGCAAGG 1D. AGGCTGCGAGAAGGCTCCG
2U. CGATTTAGGAATGTCATAG 2D. AGCGCTATTGCGAATGGCG
3U. AGCTCATACCTAGTAACTA 3D. GTGAGTGACTGAATGGTAG
.
.
.
6,200 U D tag pairs
AGCTCATACCTAGTAACTA CAAGGCGTAGGCTAGATCG
Ron Davis et al., Stanford
24Detecting molecular tags in yeast pools
Hybridize labeled tags to oligonucleotide array
containing tag complements
PCR-amplify tags from pooled genomic DNA using
fluorescently-labeled primers
Ron Davis et al., Stanford
25Parallel Analysis
Before selection
After selection
Ron Davis et al., Stanford
26Parallel analysis of 558 mutants in one experiment
Growth in minimal media 0 hrs. growth red
6 hrs. growth green
Winzeler et al., (1999) Science 285793
27- Comparison of expression and fitness profiling
data - - Assume that if a gene is expressed under a
certain condition then the - gene is important for growth under that
condition. - Deletion of an up-regulated gene would be
expected to cause a - growth defect in that condition.
- Example - growth in 1.0 M NaCl
- - Fitness determined by competitive growth
experiment. - Plot on Y-axis, from 0 (wild-type) to 1200
(gt100 a significant defect) - Expression profiling in isogenic strain in media
or - 1.0 M NaCl. - Plot on X-axis as Log ratio expression
- (lt-0.5 significant repression, gt0.5
significant induction)
28Comparison of Expression and Fitness Profile
29Little correlation between expression and fitness
profiling data For all conditions tested 1.0
M NaCl, minimal media, galactose, 1.5 M sorbitol
and alkali. Strong evidence that expression
profiling provides only part of the picture for
a given biological process.
30(No Transcript)
31Systematic genetic analysis with ordered arrays
of yeast deletion mutants. Tong et al., (2001)
Science, 2942364-2368. Why are 80 of yeast
genes non-essential under optimal lab conditions?
Gene may only have an important function
under particular environmental or developmental
condition. Homologous gene may provide same
function (redundancy likely for 8 of the
non-essential genes). Non-homologous gene or
pathway may provide the same function
(redundancy, parallel activity or pathway) this
type of redundancy may provide buffering so that
the process occurs with higher fidelity when the
system is stressed environmentally or due to
genetic variability that occurs in natural
populations.
32Redundant functions can be uncovered by synthetic
genetic interactions double loss-of-function
mutants that enhance the original phenotype or
gives a phenotype not observed by either single
mutant. (Many examples in model
organisms.) Synthetic lethality, where the
double mutant is lethal while neither single
mutant is lethal, is often used in yeast to
identify genes involved in the same process
(Guarente, Trends Gen. 9362, 93) Synthetic
lethal interactions are identified in genetic
screens (not selections). Thus, it is difficult
to identify all the genes (saturate) that could
mutate to given synthetic lethal interaction
given the selectivity of mutagen, variable target
size of different genes and the limitations of a
screening protocol.
33- High-throughput synthetic lethal analysis
- Generate tester strain (mating type MATalpha)
where the non-essential query gene has been
deleted by homologous recombination using NatR
cassette. - Mate tester haploid strain to an ordered array
of 4700 yeast strains containing non-essential
gene deletions (from Winzeler et al. and Giaever
et al. MATa) and select for diploids that are
KanR and NatR. - Diploids are sporulated and the haploid double
mutant is obtained if viable, following selection
for KanR, NatR and haploidy (MATa). - Query gene BNI1 encodes a formin protein
family member that functions in cortical actin
assembly for polarized cell growth and spindle
orientation.
34Synthetic Genetic Array Methodology
35 Double-Mutant Array and Tetrad Analysis
36- Results from synthetic lethal/sick screen with
BNI1 query gene - - 67 potential synthetic lethal/sick
interactions found. - - 51 (75) were confirmed by tetrad analysis
(25 false positives) - - 51 synthetic lethal/sick interactors grouped
based on cellular roles - as defined by Yeast Proteome Database (YPD)
- 20 are cell polarity genes, e.g. bud
emergence genes - 18 are cell wall maintenance genes, e.g.
chitin synthase genes - 16 are mitosis genes, e.g. dynein/dynactin
spindle orientation genes - 18 are genes of unknown function
- 8/11 previously known synthetic lethal/sick
genes found. - 3 missing, 2 not among the 4700 tested, 1 not
found (false negative) - 43 new synthetic lethal/sick interactions found.
37- But what does the synthetic lethality/sick
interaction of two deletion null mutants tell us?
- Two genes with redundant activities within a
single pathway. -
- - Two genes, acting in separate pathways that are
redundant for the same biological process. - - Two separate biological processes that when
both impaired result in lethality (e.g. polarized
cell growth and cell wall maintenance). Less or
non-informative for understanding gene function -
false positive.
38- Interaction Map of synthetic lethal/sick
interactions - - Display genetic interactions as binary
gene-gene relationships. -
- - Repeat the high-throughput synthetic lethal
screen using - 1) gene of interest with unknown cellular roles
- 2) genes with well characterized cellular roles
- A gene with an unknown cellular function(s) is
expected to have - greater connectivity and to be surrounded by with
genes of similar - function.
- Map/network obtained using the BIND package
- (Bader et al., 2001, NAR 29242)
39Synthetic Lethal / Sick Genetic Interaction
Network
40Two gene products that act in the same pathway
but are not redundant with each other will not
show synthetic lethality. A B C
Biological Process gene-A gene-B Wild-type
However, if gene A gene-Q Synthetic Lethal
gene-B gene-Q Synthetic Lethal then A and B
might act in the same pathway.
41A test of whether A and B act in the same pathway
is to use both gene-A and gene-B as query genes
against the full set of deletions. If both
genes act in the same pathway/process then expect
that there will be a large degree of overlap
between the sets of synthetic lethal genes
identified for gene-A and for gene-B.
42High-throughput screen was performed with two
genes as the query that function in actin
assembly, ARC40 and ARP2, which encode subunits
of the Arp2/3 complex. ARC40 had 40
synthetic lethal/sick interactions. ARP2
had 44 synthetic lethal/sick interactions.
31/40 interactions with ARC40 and 31/44
interactions with ARP2 are shared.
43Synthetic Lethal / Sick Genetic Interaction
Network
44False negatives a) Essential genes not included
in the analysis unless a temperature sensitive
allele is generated and used (e.g ARC40). b)
Negative regulatory interactions will not be
recovered in a synthetic lethal screen. c)
Deletion strains that are slow growing, fail to
mate or sporulate are not present or confound
the interpretation.
45(No Transcript)
46Use of the yeast deletion set to identify the
gene product targets of therapeutic compounds
-Most therapeutic compounds were discovered
fortuitously, without prior knowledge of target
or mode of action. -Knowledge of target(s) and
mode(s) of action are necessary for a)
development of improved 2nd generation compounds
b) reducing drug side effects and chemical
toxicity Giaever et al., 1999, Nat. Genet.
21278-283 Lum et al., 2004, Cell 116121-137
47Compounds that bind to gene products and reduce
or eliminate activity are mimics of
loss-of-function mutations in the corresponding
gene. A diploid yeast strain that is
heterozygous for a deletion in a a therapeutic
compound target gene is thus predicted to be
hypersensitive to the compound. - Requires
that the compound is delivered at a dose that
results in moderate inhibition of growth so that
the activities of the target gene product(s) are
rate-limiting for growth. Revealing a
haplo-insufficiency for drug sensitivity. Compoun
d causes a phenocopy of the loss-of-function
mutant phenotype.
48Strategy for assessing the effect of a compound
on fitness
49Identification of compound-specific growth defects
50Identification of compound-specific growth defects
-Growth properties for each strain over a
large number of competitive growth experiments
generated a reference set that allowed
calculation of a mean performance value.
-The scatter error for these measurements was
then calculated to determine the reproducibility
of the strains performance. -To identify
strains with drug specific changes in growth
rate fitness values for a given drug were
compared to the corresponding values from the
reference set using a modified students t-test.
51Fitness profiles for 78 compounds
No compound specific changes.
Small number of strains with significant effects
on fitness.
Large number of strains with significant effects
on fitness.
52Fitness profiles of Group II compounds
Compounds with previously reported targets.
Compounds with no previously reported targets.
53Molsidomine is a potent vasodilator used
clinically to treat angina and also lowers
cholesterol in humans and rats. Fitness
profiling revealed a significant molsidomine
induced haplo-insufficiency for ERG7, which
encodes the highly conserved lanosterol synthase
enzyme. Validation Genetic
Biochemical -molsidomine causes the
accumulation of the Erg7p substrate.
-molsidomine metabolite inhibits purified
lanosterol synthase.
54Mode of molsidomine action
metabolites
validation
55- False negatives
- Compound target is not encoded in the yeast
genome (or mutant is - absent from the deletion library).
- Compound effects are masked by other proteins
with redundant - activities.
- Compound was not correctly metabolized by the
cell. - Compound was not able to enter the cell.
- Compound causes an altered function of target
(gain-of-function). - f) Failure to obtain high-quality data for both
UPTAG DOWNTAG hybridizations due to tag
mutations or defects generated during deletion
generation.
56- False positives
- Deletion of overlapping ORF or position effect of
deletion. - Background mutations unrelated to the deletion.
- Cross-hybridizing tags.
- Deletion of drug pumps or detoxifying enzymes.
- Note that non-protein components of the cell that
are the target - e.g. DNA (actinomycin D) or small molecule
(ergosterol and its - therapeutic compound amphotericin) appear not to
suitable for - this approach.
57(No Transcript)