Title: Bioinformatics and Evolutionary Genomics High throughput
1Bioinformatics and Evolutionary GenomicsHigh
throughput functional data / functional
genomics / Omics
2High-throuhput data on gene function
- What do I mean omics, microarray, chip-on-chip
- Why are people generating these data?
- post-genomic era / systems biology the challenge
to understand the roles of the e.g. 6,000 gene
products in yeast and how they interact to create
a eukaryotic organism. - Because they can apply automation also to other
areas of molecular biology beyond sequencing - To have screens for the research question at
hand rather than to have to test each guess at a
time - What about evolutionary genomics?
- Yeast
- Accuracy / noise
3HTP data
- What do they mean experimental knowledge, but
still what do they in terms of e.g. function? - A deluge
- Bioinformatics is needed for basic data handling
and has IMHO only scratched the surface in terms
of coming up with biological questions with which
we can probe this data
4Microarray data
5Microarray data
two conditions often used for screens
6(Correlated) mRNA expression
- mRNA levels are systematically measured under a
variety of different cellular conditions, and
genes are grouped if they show a similar
transcriptional response to these conditions.
7Hughes et al. 2000Cell
- Profile Similarity Identifies Sterol-Pathway
Disturbance Resulting from Deletion of
Uncharacterized ORF YER044c (ERG28) and from
Dyclonine Treatment - Prominent gene clusters responding to
interference with ergosterol biosynthesis, - Comparison of the transcript profile of an erg28?
strain to that of an erg3? strain. - (C) Sterol content of wild-type (left) and erg28?
(right) strains.
8Ihmels et al. 2002 Nature Genetics
Conventional hierarchical clustering of
co-expression data could fail, because genes can
play a role in multiple cellular processes and
their common regulatory element can only be
detected in a subset of experiments. detect
genes that are co-expressed under a subset of
conditions. a comprehensive set of overlapping
transcriptional modules
9Citric acid cycle? Different activity under
different experimental conditions
10Rapid divergence in expression between duplicate
genes inferred from microarray promotor data
0.1 3.2 My
11Clustering conditions where the conditions are
genes yet another way to get to functional
links
12Yeast-2-hybrid
Pairs of proteins to be tested for interaction
are expressed as fusion proteins ('hybrids') in
yeast one protein is fused to a DNA-binding
domain, the other to a transcriptional activator
domain. Any interaction between them is detected
by the formation of a functional transcription
factor.
13- Examples from the original Ito publication
- A autophagy
- B spindle pole body function
- C and vesicular transport
- Arrows orientation of two-hybrid interaction,
beginning from the bait to the prey.
14Accuracy of Y2H and how to improve it
b
15Improving reliability using protein complexes
reasoning /internal consistency
Internal filtering!
16Accuracy of Y2H and how to improve it
B
17Mass spectrometry of purified complexes.
- Individual proteins are tagged and used as
'hooks' to biochemically purify whole protein
complexes. These are then separated and their
components identified by mass spectrometry.
18(No Transcript)
19b
20(No Transcript)
21Exosome
Ski
socio-affinity indices dotted lines, 510
dashed lines, 1015 plain lines, gt15. Bait
proteins are shown in bold and shaded circles
around groups of proteins indicate cores and
modules.
Stages in mRNA degradation
22Cellular Function
Phylogenetic profile
pdb
Y2H
23Protein interactions literature databases
- Literature derived, normally manually curated (as
opposed to text mining) - Biased?
- No new knowledge
- Useful for benchmarking for the study of the
evolution of e.g. protein complexes - For example Munich Informatation center for
Protein Sequences (MIPS) - Databases that contain literature and omics
Database of Interacting Proteins (DIP),
Biomolecular INteraction Database (BIND),
24Systematic screening for lethality of knockouts
on a rich medium
- The functions of many open reading frames (ORFs)
identified in genome-sequencing projects are
unknown. New, whole-genome approaches are
required to systematically determine their
function. A total of 6925Â Saccharomyces
cerevisiae strains were constructed, by a
high-throughput strategy, each with a precise
deletion of one of 2026Â ORFs Of the deleted ORFs,
17Â percent were essential for viability in rich
medium.
Winzeler et al. 1999 Science
25Genetic interactions (synthetic lethal/sick)
- Two nonessential genes that cause lethality when
mutated at the same time form a synthetic lethal
interaction. Such genes are often functionally
associated and their encoded proteins may also
interact physically.
Tong et al. 2001 Science
26(No Transcript)
27One thing we can do with synthetic lethals
- Ideker protein interactions
28What do to with synthetic lethals?
Kelley and Ideker 2005 Nature Biotech
29(No Transcript)
30ChIP-on-chip
- Tagged strains (one strain for each regulator).
- Micro-array for a strain to see which pieces of
DNA are found in excess if you isolate the
regulator plus bound DNA.
b
31Gfp localization
- Mating of fluorescent protein markers specific
for organelles plus fluorescent protein tags for
each gene
32Other functional genomics data the omes
- quantitative proteomics
- Kinome
- PTMome
- (almost) All of these data is freely and publicly
available - Take home message wow this exists !!!
33Bioinformatics for Benchmarking Integration
purified complexes TAP
Purified Complexes HMS-PCI
genomic context
mRNA co-expression
two methods
synthetic lethality
Coverage
combined evidence
fraction of reference set covered by data
yeast two-hybrid
three methods
raw data
filtered data
parameter choices
Accuracy
fraction of data confirmed by reference set
34Advanced integration
B