Day 7: Using genomics to predict new pathways - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Day 7: Using genomics to predict new pathways

Description:

Deletion of IND1 in Y. lipolytica specifically affects complex I activity in Y. ... Deletion of IND1, or mutation of IND1 conserved cysteines affects complex I ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 55
Provided by: huynenm
Category:

less

Transcript and Presenter's Notes

Title: Day 7: Using genomics to predict new pathways


1
Day 7 Using genomics to predict new pathways
2
  • Genome sequences
  • Allowing us to interpret the function of proteins
    within the context in which they occur
  • Reverse this process predict the function of a
    protein from the context in which it tends to
    occur ? prediction of protein function/pathways
    from genome sequences

3
what on earth does the ketoglurateferredoxin
oxidoreductase do in P. abyssi when there are no
connecting enzymes of the citric acid cycle ?
4
2-ketoglutarate likely derived from glutamate
5
Succinyl-CoA can be broken down via
Methyl-malonyl CoA
6
Instead of interpreting, actually predicting
protein function using genomic association
deoxycitidine
Cdd
deoxyuridine, deoxythimidine
DeoA
Glyceraldehyde-3-p, acetaldehyde
deoB
deoC
deoxyribose-1-P
deoxyribose-5-P
DeoD
purine deoxyribonucleosides
deoB ?
M.genitalium M.tuberculosis
deoD deoC deoA cdd pmm

7
  • Prediction that the cdd gene encodes a protein
    that (also) functions as a phosphoribomutase is
    based on
  • Genomic association (operon) with genes involved
    in the nucleoside salvage pathway.
  • Conservation of this association among distantly
    related species.
  • Substrate specificity is less conserved than
    catalytic function ? conserved is the mutase
    function, altered is the substrate specificity
    from a mannose/glucose to a ribose.
  • A phosphoribose mutase is required, and otherwise
    absent from the genome
  • Such predictions of course have to be confirmed
    by experimental research

8
Annotatie via guilt by association
Zoek eiwitten in bacteriele genomen die er vaak
naast liggen
nieuw, onbekend gen
gen dat in een bekend proces is betrokken, b.v.
aminozuur synthese
Extrapoleer de pathway
9
Define distantly related species..
Remember the rapid shuffling of genomes (compared
to 16S rRNA identity)
10
Variations in the genome rearrangements dependent
on the relative direction of transcription ?
hints to the operon organization of genes in
prokaryotes
11
Except for the theoretical argument proteins
that are not only encoded in the same operon, but
this organization is actually conserved in
evolution, we also need experimental benchmarks
(compare the protein sequence similarity ?
homology benchmarking via the structure) Dandekar
, Snel, Huynen and Bork, TIBS 1998. Conservation
of gene order a fingerprint of proteins that
physically interact
12
..Benchmarking..
13
Conservation of the Tryptophane synthesis operon
among the compared genomes
14
Types of Genomic Association for the Prediction
of Functional Interaction
  • I gene fusion/fission
  • II conservation of gene order (operons)
  • III co-occurrence of genes in genomes
  • IV shared regulatory elements
  • V coexpression data

15
All the genes in the tryptophane biosynthesis
pathway are linked via gene fusions. These
fusions do not give the order of the enzymes in
the pathway
16
Gene fission in the evolution of carbamoyl
phosphate synthase B (carB)
17
Predicting functional interactions between
proteins by the co-occurrence of their genes in
genomes.
Distribution of four M.genitalium genes among 25
genomes MG299 (pta) 0 0 0 1 1 0 0 0 0 1 1 0 1 0
1 1 0 0 0 1 0 1 1 1 1 MG357(ackA) 0 0 0 1 1 0 0 0
0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 MG019(dnaJ) 0 0
1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1
1 MG305(dnaK) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0
0 1 1 1 1 1 1
Using the mutual information between genes as a
scoring heuristic for their co-occurrence. M(pta,
ackA)0.69 (phospotransacetylase, acetate
kinase) M(dnaJ, dnaK)0.55 (heat shock
proteins) M(dnaJ, ackA)0.19
18
Gene co-occurrence/phylogenetic profiling
Distribution of 2 M.genitalium genes in 25
genomes, 1 implies that the gene is present, 0
that it is absent MG299 (pta) 0 0 0 1 1 0 0 0 0
1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 MG357(ackA) 0 0 0
1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1
Phosphotransacetylase (pta)
Acetate kinase (AckA)
Ack and pta are in the same pathway, explaining
their co-occurrence
19
Distribution of 2 M.genitalium genes in 25
genomes, 1 implies that the gene is present, 0
that it is absent MG019(dnaJ) 0 0 1 1 1 1 1 1 0
1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 MG305(dnaK) 0 0 1
1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1
Models for the interaction of DnaK, DnaJ with
their substrate (unfolded and misfolded
Proteins). DnaK and DnaJ interact with each
other, explaining their genomic co-occurrence
20
..entropy and mutual information
H (i) - Si Pi log Pi
H (j) - Sj Pj log Pj
H (i,j) - Si,j Pi,j log Pi,j
Entropy (H) is the disorderdness of the system,
is maximal when all states occur with equal
frequency, minimal when one state dominates the
distribution. In terms of the distribution of
genes,it is maximal when genes occur with 50
frequency.
M (i,j) H (i) H (j) - H (i,j)
Mutual information (M) is the sum of the
individual entropies minus the combined entropy.
It is maximal when individual entropies are
maximal (P0.5) and the combined entropy is
minimal (of the four possibilities, 0 0, 0 1, 1 0
and 1 1, only two are occupied 0 0 and 1 1 or 0
1 and 1 0)
21
Applicability of using Genomic context
information for

M.genitalium genes
Gene-order 215
Fusion 27
480 genes in total
Co-occurrence 54
22
Selectivity of Genomic Context for function
prediction
23
Correlation between the strength of the genomic
and functional associations (operon)
24
Correlation between the strength of the genomic
and functional associations (fusion)
25
Correlation between the strength of the genomic
and functional associations (co-occurrence)
26
The stronger the evolutionary conservation of the
genomic co-occurrence the more likely the
interaction
1
0.8
0.6
Kans dat de genen in dezelfde pathway liggen
0.4
Fusion
Gene Order
0.2
Co-occurrence
0
0
0.2
0.4
0.6
0.8
1
Evolutionaire conserverings score (hoe vaak
liggen de genen naast elkaar, zijn ze gefuseerd,
of in hoeverre zijn hun phylogenetische
verdelingen gelijk)
27
Genomic context vs. homology based function
prediction in M.genitalium

Context 238
Homology 368
21
26
Added info from genomic context
28
Combining homology information with genomic
association for function prediction
Repeated occurrence of MG009, one of the most
widespread enzymes on earth, encoding a
phosphohydrolase, with thymidilate kinase (tmk)
suggests a role of MG009 in pyrimidine metabolism.
29
Conservation of gene order of the hypothetical
gene MG134 with dnaX, RecR suggests physical
interaction between their gene products
30
(No Transcript)
31
From pairwise interactions to functional modules,
pathways
32
The first iteration of trpB in M. jannaschii
(MJ1038) retrieves trpA (MJ1037), with which trpB
physically interacts
33
(No Transcript)
34
Genomic context indicates a link between the
Shikimate and Tryptophane synthesis pathways
tyrA
aroB
asd
truA
aroE
aroC
hemK
hyp
trpF
trpC
trpE
Shikimate pathway
trpG
trpA
trpD
trpB
Tryptophane synthesis pathway
hyp
2c-rr
35
Biochemical pathways vs functional modules
Coverage gt70
Specificity ca 90
Von Mering et al. PNAS 100 (2003) 15428
36
Limited Relevance of Gene Order for Functional
Interaction in eukaryotes
  • operons in Nematodes
  • Gene-order conservation of co-expressed genes
    between the fungi of C.albicans and S.cerevisiae

37
Blumenthal, 2004
38
Finding Interaction Partners for a Human Disease
Gene frataxin
  • Friedreichs ataxia
  • No (homolog with) known function
  • No gene fusion or gene order conservation

39
(No Transcript)
40
Iron-Sulfur (2Fe-2S) cluster in the Rieske
protein (Iwata et al, Structure 1996)
41
Ancestor Proteobacteria
fdx
IscS
IscU
IscR
RnaM
(time)
42
The mitochondrial HSP70 protein that is involved
in iron-sulfur cluster (isc) assembly in yeast is
derived from DnaK, rather than from HscA (the
proteobacterial isc HSP70), indicating a
paralogous switch in isc assembly from the
proteobacteria to the eukaryotes.
43
Mitochondrial iron-sulfur assembly
Arh1/fpr
Atm1
Cys
NifS
e-
fdx
e-
S
2Fe2S
Ala
e.g. fdx, Complex I
Fe
NifU
HscA/SSQ1, HscB frataxin ?
44
Prediction
Confirmation
45
IND1, an FeS protein required for complex I
assembly
Katrine Bych Janneke Balk Stefan Kerscher
Klaus Zwicker Ulrich Brandt Daili J A Netz,
Antonio J Pierik Roland Lill,
PHILIPPS-UNIVERSTÄT MARBURG
Martijn Huynen
46
Ind1 an Mrp-like NTPase
Ind1 has sequence is homologous to the cytosolic
proteins Cfd1 and Nbp35 in S. cerevisiae, which
can function as Fe-S scaffold proteins (Netz et
al. 2007 Nat. Chem. Biol. 3, 278-286).
47
Ind1 has a mitochondrial location
Cells fractionated in mitochondria and
postmitochondrial natant (pms)
48
Phylogenetic relationship between NBP35, CFD1,
CF101, ApbC and IND1 indicates various
independent origins from bacteria and archaea
49
Co-evolution of IND1 with Complex I
CI FeS proteins
loss of IND1
75kD
gain of mito targeting signal
IND1
TYKY
PSST
51kD
24kD
MTS
Fungi (17)
Saccharomyces s.l. (5)
S.pombe
E.cuniculi
Vertebrata (16)
Insects (7)
Nematodes (3)
S.purpuratus
D.discoideum
E.histolytica
Plants, Algae (5)
Apicomplexa (10)
MTS
Ciliates (2)
Euglenozoa (5)
G.lamblia
T.vaginalis
50
Deletion of IND1 in Y. lipolytica specifically
affects complex I activity in Y.lipolytica with
an alternative NADH dehydrogenase
51
Deletion of IND1, or mutation of IND1 conserved
cysteines affects complex I activity in
mitochondrial membranes Similar reduction of
NADHDAR (oxidation of NADH by FMN on the 51 Kd
subunits) and dNADHDBQ (oxidation of NADH
coupled to reduction of ubiquinone) activity
suggests less complex I rather than impaired
complex I
52
Deletion of IND1 reduces complex I abundance
cWT
ind1?
75-kDa NUAM
VD and VM dimeric and monomeric forms of complex
V I complex I S incompletely characterized
supercomplex that contains complex III IIID
dimeric form of complex III.
53
Verified function predictions Making predictions
is easy, testing them is another matter.
Protein Context type of interaction function
ref
Mt-Ku gene order physical interaction double-stra
nded DNA repair 56 GnlK gene order physical
interaction signal transduction for ammonium
transport 57,58 PH0272 gene order metabolic
pathway methylmalonyl-CoA racemase
59 PrpD gene order metabolic
pathway 2-methylcitrate dehydratase 22,60 arok
gene order metabolic pathway shikimate
kinase 61 ComB gene order metabolic
pathway 2-phosphosulfolactate phosphatase 62 K
ynB gene order metabolic pathway kynurenine
formamidase 63 PvlArgDC gene
order metabolic pathway arginine decarboxylase
64 FabK gene order metabolic
pathway enoyl-ACP reductase 65 FabM gene
order metabolic pathway trans-2-decenoyl-ACP
isomerase 66 COG0042 gene order tRNA
modification tRNA-dihydrouridine synthase
67 Yfh1 co-occurrence process iron-sulfur
cluster assembly 68,69 YchB co-occurrence metab
olic pathway terpenoid synthesis
70 SmpB co-occurrence process trans-translat
ion 5,71 ThyX complementary enzymatic
activity thymidilate synthase 14,72 ThiN com
plementary enzymatic activity thiamine phosphate
synthase 73,74 ThiE complementary enzymatic
activity thiamine phosphate synthase
74 Prx fusion pathway peroxiredoxin 75
YgbB fusion/ gene order metabolic
pathway terpenoid synthesis 76 SelR fusion./or
der/co-o. enzymatic activity methionine sulfoxide
reductase 14,22,77 FadE reg
. sequence metabolic pathway acyl CoA
dehydrogenase 78,79 TogMNAB reg.
sequence metabolic pathway Oligogalacturonide
transport 80,81 MetD reg. sequence metabolic
pathway Methionine transport 82
54
Further Reading
  • Genomic context Huynen M, Snel B, Lathe W 3rd,
    Bork P. (2000) Predicting protein function by
    genomic context quantitative evaluation and
    qualitative inferences.Genome Res.
    10(8)1204-10.
  • Genomic context Gabaldon T, Huynen MA. (2004)
    Prediction of protein function and pathways in
    the genome era. Cell Mol Life Sci. 2004 61
    930-44.
Write a Comment
User Comments (0)
About PowerShow.com