Title: Comparative genome analysis
1Proteome analysis in silico
Part IIProtein interactions and networks
Peer Bork EMBL MDC Heidelberg Berlin
bork_at_embl.de http//www.bork.embl-heidelberg.de/
2II. Protein network analysis
Genomic context analysis Interaction predictions
Genomic context analysis Interaction predictions
Building and destroying interaction networks
STRING a framework for network analysis
Towards spatial and temporal network aspects
3Genomic context methods to predict protein
interactions
Dandekar et al. TIBS 98
Enright et al. Nature 99
Marcotte et al. Science 99
Overbeek et al. PNAS 99
Pellegrini et al. PNAS 99
Korbel et al., Nat. Biotechn. 04
Morett et al., Nat. Biotechn. 03
4Prediction of analogous enzymes by
anti-correlation of gene occurrences
Species A B C D
Gene a - - -
Gene b - -
Application thiamine-PP biosynthesis
Collaboration with Enrique Morett et al., Mexico
Morett et al., Nature Biotech. 21(03)790
5Gene neighbourhood conservation at evolutionary
time scales
Conservation of divergently transcribed gene
pairs reveal functional constraints
6The more conserved divergently transcribed
neighboring genes are, the higher is their level
of co-expression
The resulting prediction method can reliably
predict associations betweengt2500 pairs of
genes ca 650 of which are supported by other
methods
Korbel, Jensen, von Mering, Bork Nat. Biotechnol.
2004, July
7Transcriptional regulators comprise the majority
of conserved divergently transcribed gene pairs
They are all Self- Regulatory !
8Coverage Homology vs. context
(80 accuracy level, taken from STRING COG mode)
Huynen, Snel, von Mering and Bork .
Curr.Opin.Cell.Biol. 15(03)191
9II. Protein network analysis
Genomic context analysis Interaction predictions
Genomic context analysis Interaction predictions
Building and destroying interaction networks
Building and destroying interaction networks
STRING a framework for network analysis
Towards spatial and temporal network aspects
10Three context methods to predict functional
interactions
combined and quantified in STRING
Von Mering et al. NAR 31(03)258
11Biochemical pathways vs functional modules
comparative genomics functional modules
purine biosynthesis
histidine biosynthesis
www.string.embl-heidelberg.de
pathway representation
12Giant component of gene context network
High local connectivity, (c0.6) hence lot
of substructure
The more conservation (red) the higher the
number of connections
13Biochemical pathways vs functional modules
purine biosynthesis
histidine biosynthesis
pathway representation
unsupervised clustering
comparative genomics functional modules
Coverage gt70
Specificity ca 90
Von Mering et al. PNAS 100 (2003) 15428
14Biological discoveries
- Functional assignment of gt3000 hypothetical
proteins
- Missing enzymes in known pathways
- Target for transcription regulators,
transporters etc.
- Pathways links (CoA and nucleotide biosynth.)
- Independent modules within known pathways
- Potentially novel pathways/processes/complexes
15Synergies between homology and context based
methods
Query protein Known transcriptional regulator
PyrR
STRING annotations
known
Doerks et al. TIG, 2004
16Biological discoveries
- Functional assignment of gt3000 hypothetical
proteins
- Missing enzymes in known pathways
- Target for transcription regulators,
transporters etc.
- Pathways links (CoA and nucleotide biosynth.)
- Independent modules within known pathways
- Potentially novel pathways/processes/complexes
17Functional modules in E.coli
(Only modules with gt3 nodes shown)
About 650 modules predicted (120 metabolic)
About 140 modules dominated by hypotheticals
18II. Protein network analysis
Genomic context analysis Interaction predictions
Building and destroying interaction networks
Building and destroying interaction networks
STRING a framework for network analysis
STRING a framework for network analysis
Towards spatial and temporal network aspects
19Functional associations between proteins 80.000
from large-scale approaches in yeast
20Counting functional associations
Binary interactions vs. groups of interacting
proteins
SHS1
TAP purification
CDC10
CIN2
CDC12
HMS-PCI purification
CDC3
GIN4
two-hybrid interaction
CDC11
annotated member
of septin complex
ARC1
SPR28
LPD1
21Distribution of interacting proteins (TAP
complexes)
energy production aminoacid metabolism other
metabolism translation transcription transcription
al control protein fate cellular
organization transport and sensing stress and
defense genome maintenance cellular
fate/organization uncharacterized
interaction density
0
10
(actual interactions per 1000 possible pairs)
22Reference interactions
manually annotated protein complexes MIPS / YPD
high-throughput interaction data OVERLAP OF 2
METHODS
2455 interactions
10907 interactions
23Protein interaction datasets
purified complexes (TAP)
purified complexes (HMS-PCI)
genomic associations
18027 interactions
7446 interactions
33014 interactions
synthetic lethals
yeast two-hybrid
mRNA synexpression
16496 interactions
886 interactions
5125 interactions
24A probabilistic approach for function prediction
Benchmarking high-throughput interaction data
Von Mering.C, Krause. R, Snel, B., Oliver, S.G.,
Fields, S. and Bork, P Nature 417(2002)399
100
purified complexes TAP
Purified Complexes HMS-PCI
genomic associations
10
mRNA synexpression
two methods
fraction of reference set covered by data (
log scale)
Coverage
synthetic lethality
combined evidence
yeast two-hybrid
1
1
three methods
raw data
filtered data
parameter choices
0.1
0.1
1
1
10
100
Accuracy
fraction of data confirmed by reference set (
log scale)
25STRING known and predicted functional links
Please show me the functional context of these
proteins?
ATP1
QCR2
26STRING known and predicted functional links
ATP synthase
Ubiquinol-Cyt.C reductase
27II. Protein network analysis
Genomic context analysis Interaction predictions
Building and destroying interaction networks
STRING a framework for network analysis
STRING a framework for network analysis
Towards spatial and temporal network aspects
Towards spatial and temporal network aspects
28EMBLs Structural and Computational Biology unit
From molecules to organisms
NMR
Xray
EM
Computational Biology
3D tomography
Protein/DNA
Complex
Synchrotons
Subcellular structure
Gene expression
Cell
Cell Biology
Core facilities
Developmental Biology
Organism
In red other EMBL units
29From interactions to 3D protein complexes Large
scale modeling and EM mapping
(exosome case study Aloy et al., EMBO Rep, 2002)
Characterise the domains
TAP (Cellzome)
x300
30Structure-based assembly of protein complexes
Analysis of 101 yeast complexes and their
interactions
From functional associations to three
dimensional assemblies
Aloy, P., Boettcher B., Ceulemans, H., Leutwein,
C., Mellwig, C., Fischer, S., Gavin, A.-C.,
Bork, P., Superti-Furga, G., Serrano, L. and
Russell, R.B. Science 303 (2004) 2026
314D
Dynamic complex formation duringthe 90 min yeast
cell cycle
Multiple arrays reveal 600 periodically expressed
genes
Projection to interaction data identifies novel
assemblies
Details on the time dependent formation in some
assemblies revealed
Some unknown proteins detected in well-studied
cell cycle assemblies
Color periodically expressed proteins
Lichtenberg, Larsen et al
32Losses/Gains of Functional Associations
M. pneumoniae M. genitalium
M. pneumoniae only
(Linked by conserved neighborhood or fused
proteins, combined score gt0.95)
33Comparison of the interaction networks in three
mollicutes
Differential analysis
ribose/xylose sugar-transport
Gene present ()
M. pneumoniae
fructose-specific phosphotransferase system (plus
assoc. enzymes)
U. parvum
M. pulmonis
urease enzyme complex
glycerol metabolism
ABC-type phosphate transport system (incl.
regulator)
34TCA cycle
Modification of functional modules at
evolutionary time scales
Huynen et al TIM 1999
35Summary (network analysis)
Gene context and other concepts for interaction
predictions not only complement homology
approaches, but are about to offer more
functional information than blast et al.
Gene context methods have already ca 90
specificity/70 sensitivity in predicting
functional modules in prokaryotes
In eukaryotes, accurate prediction of networks
and modules is still difficult and heterogenous
expermental data have to be integrated
Spatial and temporal aspects of protein networks
have a great potential although data are still
limited
36(No Transcript)
37Credits
Context methods
Enrique Morett et al. (Mex)
Functional modules
Christos Ouzounis et al. (EBI)
STRING
Berend Snel, Martijn Huynen (Nejm)
Networks in 3D
Rob Russell, Pattrick Aloy, Bettina Boettcher
(EMBL) Cellzome AG
Network in4D
Ulrik de Lichtenberg, Soren Brunak (CBS)
Chicken international sequencing and
analysis consortium
all other group members many experim.
collaborators