Title: Proteins,%20interactions,%20complexes:%20A%20computational%20approach
1Proteins, interactions, complexesA
computational approach
- Haidong Wang
- Department of Computer Science
- Stanford University
2Motivationfrom protein to pathway
Protein-protein interaction
DKPALAKPPKV
V
Complex
Pathway
3Challengenoisy data and their integration
- Large amount of proteomic data available
- Localization, microarray expression,
protein-protein interaction, transcription
regulation, sequence, genetic interaction, Gene
Ontology, trans-membrane, growth fitness, protein
abundance, - High throughput data are noisy
- Measurement weakly correlate with objective
- Integrate multiple datasets in a principled way
- Reduce noise
- Combine weak signals
4Outline
Protein-protein interaction
DKPALAKPPKV
Complex
Pathway
5Outline
Protein-protein interaction
DKPALAKPPKV
Wang et al. 2004 Wang et al. 2007
Complex
Pathway
6Proteins interact at small region
- Physical bindings between amino acids
- Chemically attractive
- Structurally complement
- Target by mutation, virus, and drug
- ? Disrupt interaction
- ? Disease or cure
7Challengefew data for interaction site
- Co-crystallization
- Costly
- Time-consuming
- Many proteins do not crystallize
- Physics-based simulation
- Computationally intensive
- Low throughput
- Docking
- Require known structure
8Our approach, InSite the intuition
B
A
a
b
b
C
D
E
d
e
c
DKPALAKPPKV
PPK
GAPDKLLPPKAK
PPK
- Sequence motifs from existing database
- Evolutionarily conserved
- Cover more than 70 of interaction sites
- Explain PPI by interactions between motifs
9Bayesian network to integrate data
a
c
A
d
C
b
- Evidence for protein-protein interactions
- Y2H
- TAP-MS
- Gavin, Krogan
- Co-expression
- Same function
- Evidence for motif-motif interactions
- Domain fusion
- Same function
Probability of fusion P(Ef1)
- Motif-motif interaction non-deterministic
- Interaction site outside motifs
- Small-scale reliable interactions
- Sparse
Fusion
GO
Eg
Ef
Ef
Bac
Bad
Bbc
Bbd
S
T( ) P(B1) 0.15
OR
AC
I
Oe
Og
Gavin score
Co-expression
Gavin score
10Sharing of the parameters
Fusion
GO
Eg
Ef
Ef
Bac
Bad
Bbc
Bbd
S
OR
AC
I
Oe
Og
Gavin score
Co-expression
11Learning with EM
M-step
E-step
- In E-step, we exploit OR structure
- Closed-form solution in a large and dense network
12Predict interaction site intuition
- Is motif on protein C the interaction site
with protein A? - Target for disrupting C-A interaction
a
b
A
B
b
C
E
e
c
c
d
D
d
- Not allow on C to bind to A
- Evaluate how well the protein-protein interaction
network is explained now
13Predict interaction site model
A
a
b
c
Eg
Ef
Ef
d
C
Interaction site?
B
B
B
B
S
OR
AC
- Not allow motif d on C to bind protein A
- Re-train the model
- Compute change in likelihood
I
Oe
Og
14Prediction interaction and its site
- Protein-protein interaction predictions
- Interaction site predictions
A
a
b
c
d
C
A
a
b
c
d
C
15Related work
- Predict protein-protein interactions using motifs
- Graphical model Deng et al. 2002, Liu et al.
2005 - Attraction-repulsion model Gomez et al. 2003
- Affinities between motif types, not specific to
protein pairs - Graphical model, exclusion analysis (DPEA) Riley
et al. 2005 - LP formulation of parsimony explanation of PPI by
MMI Guimaraes et al. 2006 - Expected number of MMI integrated with domain
fusion, and etc. Lee et al. 2006 - Our improvement
- Predict interaction site specific to protein
pairs - Integrate evidences for proteins and motifs
16Better interaction prediction
- 10 fold cross-validation on Yeast to predict PPI
- Compare with Gavin/Krogan for proteins in their
assays
TAP-MS (Gavin)
TAP-MS (Krogan)
True interactions in top pairs
True interactions in top pairs
Area under ROC
x 104
False interactions in top pairs
False interactions in top pairs
17Better interaction prediction
- Prediction over all proteins
- Pfam works better than Prosite
True interactions in top pairs
False interactions in top pairs
18Better interaction site prediction
- Verify interaction site prediction against PDB
- PDB co-crystallized proteins ? known interaction
sites
Pfam
PDB interaction sites
Area under ROC
PDB non-interaction sites
19Cancer mutation in interaction site
Cancer mutation
- Top prediction SH2 on FYN binds to VAV1
- Verified Michel et al. 2007
- VAV1 and FYN both implicated in carcinoma
- Hypothesis
- FYN mutation
- gt disrupt FYN-VAV1 interaction
- gt cancer
SH2 (green) on FYN interaction site to VAV1
20OMIM, human genetic disordertop 10 predictions
- OMIM database for mutations in human genes that
are related to genetic diseases
Protein Partner Binding site OMIM disease Status
PROC PROS1 PS01187 Protein C deficiency Validated
PROC PROS1 PS50026 Protein C deficiency Validated
BAX BCL2L1 PS01259 Leukemia Validated
MMP2 BCAN PS00142 Winchester syndrome Consistent
STAT1 SRC PS50001 STAT1 deficiency Consistent
VAPB VAMP2 PS50202 Amyotrophic lateral sclerosis Consistent
VAPB VAMP1 PS50202 Amyotrophic lateral sclerosis Consistent
MMP2 BCAN PS00546 Multicentric osteolysis, Wrong
PLAU PLAT PS50070 Alzheimer disease No info
UCHL1 S100A7 PS00140 Parkinson disease No info
21Conclusion
- Probabilistic method for prediction of
- Protein-protein interaction
- Interaction sites from sequence motifs
- High quality
- Genome-wide
- Generate testable hypotheses for disease
mechanisms based on interaction site predictions - How does a disruption of interaction leads to
disease?
22Outline
Protein-protein interaction
DKPALAKPPKV
Wang et al., in preparation
Complex
Pathway
23TAP-MS detects complexes
- Tandem Affinity Purification with Mass
Spectrometry (TAP-MS) - Gavin et al. 2006
- Krogan et al. 2006
- Relatively high quality, genome wide
- Purifications -gt pairwise Purification Enrichment
(PE) score - Collins et al. 2007
- Likelihood being in the same complex
Protein
Protein
Prey
Prey
Protein
Bait protein
Protein
Protein
Prey
Prey
PE score
24Prior workidentify complex from PE score
- Purifications ? PE scores
- ? clusters as complexes
- Clustering algorithm
- Hierarchical agglomerative clustering (HAC)
- Collins et al. 2007
- Markov Clustering (MCL)
- Hart et al. 2007, Pu et al. 2007
- No overlap
But PE score still noisy
Protein
Protein
Prey
Prey
Protein
Bait protein
Protein
Protein
PE score
Prey
Prey
25Complex prediction,our contribution
- Integrate PE score with indirect evidence
- Yeast two-hybrid, co-expression, localization,
- Geared toward complex identification
- Overlapping complexes to improve accuracy
26Related work data integration
- Identify pathway, functional module,
co-expression cluster - Chen and Yuan 2006, Lee et al. 2007, Marcotte et
al. 1999, Schlitt et al. 2003, Strong et al.
2003, von Mering et al. 2003, Yanai and DeLisi
2002, Yellaboina et al. 2007 - Indirect evidence correlated with pathway or
functional similarity - Predict pairwise affinity scores
- Zhang et al. 2004, Jensen et al. 2003
- Do not reconstruct complexes
- Non-scalable algorithm
- Our method gear toward complex reconstruction
27Our approach
- Larger reference set for training and validation
- 340 complexes, double the size of others
- LogitBoost to learn affinity
- Integrate evidence into co-complex likelihood
- Complex identification
- Cluster pairwise affinity graph by HAC
Protein
Protein
Protein
Protein
Y2H
Protein
PE score
co-expression
Protein
Protein
co-localization
affinity
membrane?
28Limitation of HAC
- RAD23 merge with PNG1 first,
- but slightly lower affinity with RAD4
- RAD23 and RAD4 form a reference complex NEF2
- In HAC, RAD23 stuck with PNG1
- Solution reuse RAD23
- Overlapping clusters
RAD23
PNG1
HAC
NEF2
RAD4
29HAC overlap (HACO) algorithm
- Merge 2 non-overlapping sets with least distance
(d) - Add merged set to the pool
- HAC remove the 2 sets
- Universal cutoff, single granularity
- Stuck to early mistake
- No overlap
- HACO remove any set A from the pool if d - M(A)
gt ? - M(A) distance when A 1st merged
- Cut tree into clusters
- Cutoff level by cross-validation
- Optimize for complex
lt??
30HACO vs. HAC
- RAD23 merge with PNG1 first,
- but slightly lower affinity with RAD4
- RAD23 and RAD4 form a reference complex NEF2
- In HAC, RAD23 stuck with PNG1
- In HACO, RAD23 reused to merge with RAD4
HACO
NEF2
RAD23
PNG1
HAC
RAD4
HACO
31HACO recovers more reference complexes
reference complexes
Hart
Pu
HAC PE
HAC all
HACO
32More biologically coherent
Regulator overlap
Good
Fitness correlation
Bad
Hart
Pu
HACO
Reference
Random
Hart
Pu
HACO
Reference
Random
Bad
- Compare proteins within same complex
Abundance
Good
33Newly discovered complex?
- Discovered previously uncharacterized six-protein
complex, involving four phosphatases - Consistent with genetic interaction data
- Six proteins cluster together
- Have positive genetic interactions
34Essential proteins are hubs?
- Jeong et al. Nature 2001 Essential proteins
are hubs in protein-protein interaction network
35Its all about complexes
- Larger complexes more likely to be essential
- Complex size a better predictor of essentiality
than hubness
36Conclusion
- Our contribution
- Integrated PE score with indirect evidences
- Gear toward identifying complexes
- Developed HACO to allow overlap
- Applicable to other clustering problem
- Our predicted set of complexes
- Matches better with reference set
- More biologically coherent
- Identifying unknown complexes
- Provide biological insight
37Outline
Protein-protein interaction
DKPALAKPPKV
Complex
Pathway
38Complexes interact to coordinate a pathway
- Signaling pathways
- Activate, deactivate
- Modification, eg. phosphorylation
- Protein degradation pathway
- 11S activator 20S proteasome
- ? active unit
- degrade short peptides
- More transient
- Specific time, location, condition
Protein degradation
39Few data and studies
- Difficult to measure
- Interactions are more transient in nature
- Lack of a comprehensive set of complexes
- Prior work on protein-protein interactions
- Graphical model Deng et al. 2002, Liu et al.
2005 - Attraction-repulsion model Gomez et al. 2003
- SVM Bock and Gough 2001
- Our work predict interaction at the level of
complexes
40Extract signals for protein pairs between two
complexes
- Features used for predicting complex
- Genetic interactions
- Sharing of transcription factors
- InSite interaction probability
- Integrates multiple evidence
- Correlates well with complex-complex interaction
- Co-expression across active conditions
- Best for predicting complex-complex interaction
11S
20S
Active Proteasome
With stimulus
Without stimulus
41Experiments
- Feature aggregate protein-level signal between
complexes - Eg. between complex X and Y min P(A, B) A ?
X, B ? Y - Complexes our predictions
- More comprehensive
- More biological coherent than reference set
- Reference complex-complex interactions
- 59 hand-labeled complex pairs by biologists
- 82 complex pairs enriched for reliable PPIs
- 133 total unique CCIs
- Naïve Bayes with hidden variables for unknown
pairs - Learn use EM
42Accuracy on reference CCIs
interacting reference CCIs
Area under ROC curve
non-interacting reference CCIs
43Interacting complexes likely in the same
functional category
- Top 500 predicted pairs
- gt half of the proteins in a complex in a category
- ? the complex assigned to the category
Proportion of complex pairs in the same category
44Conclusion
- Our predicted set of complex-complex interactions
- High accuracy
- Functionally coherent
- Builds upon previous two stages
- InSite interaction probability as a feature
- Our predicted complexes as interaction candidates
45Summary of the talk
Protein-protein interaction
DKPALAKPPKV
- Unsupervised learning
- Bayesian network with EM
46Summary of the talk
- Supervised learning
- LogitBoost
- Clustering
Complex
47Summary of the talk
- Semi-supervised learning
- Naïve Bayes with EM
Complex
Pathway
48Summary of the talk
Protein-protein interaction
DKPALAKPPKV
Data integration
Complex
Pathway
49Contributions and resources
- List of predicted PPIs and interaction sites
- http//dags.stanford.edu/InSite/
- List of predicted complexes
- http//dags.stanford.edu/HACO/
- List of predicted CCIs
- http//dags.stanford.edu/CCI/
- InSite code
- http//dags.stanford.edu/InSite/software.html
- HACO for clustering with overlap
- http//dags.stanford.edu/HACO/software.html
50Future work
- Reconstruct pathways and functional modules
- Different types of interactions
- Phorsphorylation, ubiquitination
- Specific time, location, and condition
51Acknowledgment
- Daphne Koller
- Serafim Batzoglou, Douglas Brutlag, Jean-Claude
Latombe, Andrew Ng - DAGS members
- Collaborators Eran Segal, Asa Ben-Hur, Qianru
Li, Marc Vidal, Sean Collins, Nevan Krogan - My family and friends