Title: Protein Interaction Networks
1Protein Interaction Networks
Feb. 21, 2013
Aalt-Jan van Dijk Applied Bioinformatics, PRI,
Wageningen UR Mathematical and Statistical
Methods, Biometris, Wageningen University aaltjan.
vandijk_at_wur.nl
2My research
- Protein complex structures
- Protein-protein docking
- Correlated mutations
- Interaction site prediction/analysis
- Protein-protein interactions
- Enzyme active sites
- Protein-DNA interactions
- Network modelling
- Gene regulatory networks
- Flowering related
3Overview
- Introduction protein interaction networks
- Sequences networks predicting interaction
sites - Predicting protein interactions
- Sequence and network evolution
- Interaction network alignment
4Protein Interaction Networks
hemoglobin
Obligatory
5Protein Interaction Networks
hemoglobin
Mitochondrial Cu transporters
Obligatory
Transient
6Experimental approaches (1)
7Experimental approaches (2)
- Affinity Purification mass spectrometry (AP-MS)
8Interaction Databases
- STRING http//string.embl.de/
9Interaction Databases
10Interaction Databases
- STRING http//string.embl.de/
- HPRD http//www.hprd.org/
11Interaction Databases
12Interaction Databases
- STRING http//string.embl.de/
- HPRD http//www.hprd.org/
- MINT http//mint.bio.uniroma2.it/mint/
13Interaction Databases
14Interaction Databases
- STRING http//string.embl.de/
- HPRD http//www.hprd.org/
- MINT http//mint.bio.uniroma2.it/mint/
- INTACT http//www.ebi.ac.uk/intact/
15Interaction Databases
16Interaction Databases
- STRING http//string.embl.de/
- HPRD http//www.hprd.org/
- MINT http//mint.bio.uniroma2.it/mint/
- INTACT http//www.ebi.ac.uk/intact/
- BIOGRID http//thebiogrid.org/
17Interaction Databases
18Some numbers
Organism Number of
known interactions H. Sapiens 113,217 S.
Cerevisiae 75,529 D. Melanogaster 35,028 A.
Thaliana 13,842 M. Musculus 11,616
Biogrid (physical interactions)
19Overview
- Introduction protein interaction networks
- Sequences networks predicting interaction
sites - Predicting protein interactions
- Sequence and network evolution
- Interaction network alignment
20Binding site
21Binding site prediction
22Binding site prediction
- Applications
- Understanding network evolution
- Understanding changes in protein function
- Predict protein interactions
- Manipulate protein interactions
23Binding site prediction
- Applications
- Understanding network evolution
- Understanding changes in protein function
- Predict protein interactions
- Manipulate protein interactions
- Input data
- Interaction network
- Sequences (possibly structures)
24Sequence-based predictions
25Sequences and networks
- Goal predict interaction sites and/or motifs
26Sequences and networks
- Goal predict interaction sites and/or motifs
- Data interaction networks, sequences
27Sequences and networks
- Goal predict interaction sites and/or motifs
- Data interaction networks, sequences
- Validation structure data, motif databases
28Motif search in groups of proteins
- Group proteins which have same interaction
partner - Use motif search, e.g. find PWMs
Neduva Plos Biol 2005
29Correlated Motifs
30Correlated Motifs
- Motif model
- Search
- Scoring
31Predefined motifs
32Predefined motifs
33Predefined motifs
34Predefined motifs
35Predefined motifs
36Correlated Motif Mining
Find motifs in one set of proteins which interact
with (almost) all proteins with another motif
37Correlated Motif Mining
- Find motifs in one set of proteins which interact
with - (almost) all proteins with another motif
- Motif-models
- PWM so far not applied
- (l,d) with llength, dnumber of wildcards
-
- Score overrepresentation, e.g. ?2
38Correlated Motif Mining
- Find motifs in one set of proteins which interact
with - (almost) all proteins with another motif
- Search
- Interaction driven
- Motif driven
39Interaction driven approaches
Mine for (quasi-)bicliques ? most-versus-most
interaction Then derive motif pair from
sequences
40Motif driven approaches
Starting from candidate motif pairs, evaluate
their support in the network (and improve them)
41D-MOTIF
Tan BMC Bioinformatics 2006
42(No Transcript)
43IMSS application of D-MOTIF
protein Y
protein X
Test error
Number of selected motif pairs
Van Dijk et al., Bioinformatics 2008 Van Dijk et
al., Plos Comp Biol 2010
44Experimental validation
protein Y
protein X
Van Dijk et al., Bioinformatics 2008 Van Dijk et
al., Plos Comp Biol 2010
45Experimental validation
protein Y
protein X
Van Dijk et al., Bioinformatics 2008 Van Dijk et
al., Plos Comp Biol 2010
46Experimental validation
protein Y
protein X
Van Dijk et al., Bioinformatics 2008 Van Dijk et
al., Plos Comp Biol 2010
47SLIDER
Boyen et al. Trans Comp Biol Bioinf 2011
48SLIDER
- Faster approach, enabling genome wide search
- Scoring Chi2
- Search steepest ascent
-
49Validation
- Performance assessment on simulated data
- Performance assessment using using protein
structures -
50Extensions of SLIDER
- Extension I better coverage of network
-
Boyen et al. Trans Comp Biol Bioinf 2013
51Extensions of SLIDER
- Extension I better coverage of network
- Extension II use of more biological information
52bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
53bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
54bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
accessibility
55bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
accessibility
Thresholds for conservation and
accessibility Extension of motif model amino
acid similarity (BLOSUM)
56bioSLIDER
DGIFELELYLPDDYPMEAPKVRFLTKI
conservation
accessibility
Using human and yeast data for training and
optimizing parameters
0.5 0.4 0.3 0.2 0.1 0.0
Interaction-coverage
No conservation, no accessibility Conservation
and accessibility
0.0 0.3 0.6
0.0 0.3 0.6
Motif-accuracy
Leal Valentim et al., PLoS ONE 2012
57Application to Arabidopsis
Input data 6200 interactions, 2700
proteins Interface predictions for 985 proteins
(on average 20 residues)
Arabidopsis Interactome Mapping Consortium,
Science 2011
58Ecotype sequence data (SNPs)
SNPs tend to avoid predicted binding sites In
263 proteins there is a SNP in a binding site ?
these proteins are much more connected to each
other than would be randomly expected
59Summary
- Prediction of interaction sites using protein
- interaction networks and protein sequences
- Correlated motif approaches
60Overview
- Introduction protein interaction networks
- Sequences networks predicting interaction
sites - Predicting protein interactions
- Sequence and network evolution
- Interaction network alignment
61Protein Interaction Prediction
Lots of genomes are being sequenced (www.genomeso
nline.org) Complete Incomplete ARCHAEA 182 2
64 BACTERIA 3767 14393 EUKARYA 183 2897 TOTAL
4132 17514
62Protein Interaction Prediction
Lots of genomes are being sequenced
(www.genomesonline.org) Complete Incomplete AR
CHAEA 182 264 BACTERIA 3767 14393 EUKARYA 183
2897 TOTAL 4132 17514 But how do we know
how the proteins in there work together?!
63Protein Interaction Prediction
- Interactions of orthologs interologs
- Phylogenetic profiles
- Domain-based predictions
A 1 0 1 1 0 0 1
B 1 0 1 1 0 0 1
64Orthology based prediction
65Orthology based prediction
66Phylogenetic profiles
A 1 0 1 1 0 0 1
B 1 0 1 1 1 0 1
C 1 0 1 1 1 0 1
D 0 1 0 1 0 0 1
67Domain Based Predictions
68Domain Based Predictions
69Overview
- Introduction protein interaction networks
- Sequences networks predicting interaction
sites - Predicting protein interactions
- Sequence and network evolution
- Interaction network alignment
70Duplications
71Duplications and interactions
Gene duplication
72Duplications and interactions
Gene duplication
73Duplications and interactions
Gene duplication Interaction loss
0.1 Myear-1
0.001 Myear-1
74Duplications and interaction loss
Duplicate pairs share interaction partners
75Interaction network evolution
Science 2011
76Overview
- Introduction protein interaction networks
- Sequences networks predicting interaction
sites - Predicting protein interactions
- Sequence and network evolution
- Interaction network alignment
77Network alignment
Local Network Alignment find multiple, unrelated
regions of Isomorphism Global Network Alignment
find the best overall alignment
78PATHBLAST
Kelley, PNAS 2003
79PATHBLAST scoring
homology
interaction
Kelley, PNAS 2003
80PATHBLAST results
Kelley, PNAS 2003
81PATHBLAST results
For yeast vs H.pylori, with L4, all resulting
paths with plt0.05 can be merged into just five
network regions
Kelley, PNAS 2003
82Multiple alignment
Scoring Probabilistic model for interaction
subnetworks Sub-networks bottom-up search,
starting with exhaustive search for L4 followed
by local search
Sharan PNAS 2005
83Multiple alignment results
Sharan PNAS 2005
84Multiple alignment results
Applications include protein function
prediction and interaction prediction
Sharan PNAS 2005
85Global alignment
Singh PNAS 2008
86Global alignment
Singh PNAS 2008
87Global alignment
Alignment greedy selection of matches
Singh PNAS 2008
88Network alignment the future?
Sharan Ideker Nature Biotech 2006
89Summary
- Interaction network evolution mostly
comparative, not much mechanistic - Approaches exist to integrate and model network
analysis within context of phylogeny (not
discussed) - Outlook combine interaction site prediction with
- network evolution analysis
90Exercises
The datafiles arabidopsis_proteins.lis and
interactions_arabidopsis.data contain
Arabidopsis MADS proteins (which regulate various
developmental processes including flowering), and
their mutual interactions, respectively.
91Exercise 1
- Start by getting familiar with the basic
Cytoscape features described in section 1 of the
tutorial http//opentutorials.cgl.ucsf.edu/index.p
hp/TutorialIntroduction_to_Cytoscape - Load the data into Cytoscape
- Visualize the network and analyze the number of
interactions per proteins which proteins do
have a lot of interactions?
92Exercise 2
Write a script that reads interaction data and
implements a datastructure which enables further
analysis of the data (see setup on next
slides). Use the datafiles arabidopsis_proteins.l
is and interactions_arabidopsis.data and let
the script print a table in the following
format PROTEIN Number_of_interactions Make a
plot of those data
93 two subroutines input filename output
list with content of file sub read_list my
infile_0 YOUR CODE return
_at_newlist input protein list and interaction
list output hash with proteins ? list of
their partners sub combine_prot_int() my
(plist,intlist) _at__ YOUR CODE return
inthash
94 reading input data my _at_plist
read_list(ARGV0) my _at_intlist
read_list(ARGV1) obtaining hash with
interactions inthashcombine_prot_int(\_at_plist,\_at_i
ntlist) YOUR CODE loop over all proteins and
print their name and their number of interactions
95(No Transcript)
96Exercise 3
In orthology_relations.data we have a set of
predicted orthologs for the Arabidopsis proteins
from exercise 1. protein_information.data
describes a.o. from which species these proteins
are. Finally, interactions.data contains
interactions between those proteins. Use the
Arabidopsis interaction data from exercise 1 to
predict interactions in other species using the
orthology information. Compare your predictions
with the real interaction data and make a plot
that visualizes how good your predictions are.