Title: Untangling graphs: denoising protein-protein interaction networks
1Untangling graphs denoising protein-protein
interaction networks
- Quaid Morris
- (joint work with Brendan Frey)
2Motivation
- High-throughput graph data is noisy, e.g.,
- protein-protein interaction networks
- synthetic lethal interaction networks
- Real world graphs are highly structured
Idea Use prior knowledge about structure to
denoise graphs
3Protein-Protein interaction network
Jeong et al, Nature 2001
4Overview
- Illustrative example
- Model and inference algorithm
- Protein-protein interaction network denoising
5Example spy rings
Suspects
Phone Records
Call
- Spies call exactly two other spies
- Suspects may call other suspects
- Phone records may be lost
6Example spy rings cont.
Phone Records
Possible Rings
7Denoising example
Noise assumptions
- No lost calls
- Rare, independent social calls
Possible Rings
8Denoising example
Noise assumptions
- No lost calls
- Rare, independent social calls
Possible Rings
9Denoising example
Noise assumptions
- No lost calls
- Rare, independent social calls
Possible Rings
10Untangling example
Telemarketing Example
Noise assumptions
- Lost calls and rare social calls
- Telemarketing
Possible Decompositions
11Summary
- With structured noise, observed graph is composed
of different graphs, each with their own
properties.
12Graph generative model
E
E
1
2
- Sample hidden graphs
- E from P(E )
i
h
h
j
2) Sample x from P(x e , e )
1
2
i,j
i,j
i,j
i,j
X
13Model and inference
Joint
1
2
H
h
1
2
H
P(X, E , E , , E ) P P(E ) P P(x e ,
e , , e )
i,j
i,j
i,j
i,j
h
igtj
Posterior marginal
h
P(e , X)
i,j
h
P(e X)
i,j
Generally Intractable Sums
P(X)
Probability of evidence
1
2
H
P(X) S S S
P(X, E , E , , E )
H
2
1
E
E
E
14Three tricks for tractability
- Degree-based graph priors
- Sum-product approximate inference
- Dynamic programming trick
15Degree-based graph priors
h
h
h
P(E ) P f (d ) / Z
i
i
Degree of vertex i
Degree potential for graph h
h
i
h
d S e
i,j
i
j
- Real-world network structure captured
- Nice sum-product (loopy belief prop) algorithm
- Introduces dummy degree variable, d
16Two types of random graphs
Exponential
Scale-free
Jeong et al, Nature 2000
17Random graph degree distributions
Scale-free
Exponential
-p
f(k) Ck , p gt 1
Poisson(ltkgt)
Jeong et al, Nature 2000
18Other real-world structure
- Small-worldness
- Degree correlations
- Clustering
-
Google Mark Newman Michigan for more info
19Factor graph for denoising
e
x
1,2
1,2
e
x
d
1,3
1,3
1
d
e
x
2
1,4
1,4
d
x
e
3
2,3
2,3
e
d
x
2,4
4
2,4
e
x
3,4
3,4
20Factor graph for denoising
e
x
1,2
1,2
e
x
d
1,3
1,3
1
d
e
x
2
1,4
1,4
d
x
e
3
2,3
2,3
e
d
x
2,4
4
2,4
e
x
f(d)
degree potentials
3,4
3,4
21Factor graph for denoising
I(d , Se )
Indicator functions
e
x
j
i,j
1,2
1,2
i
e
x
d
1,3
1,3
1
d
e
x
2
1,4
1,4
d
x
e
3
2,3
2,3
e
d
x
2,4
4
2,4
e
x
f(d)
3,4
3,4
22Factor graph for denoising
I(d , Se )
e
x
j
i,j
1,2
1,2
i
e
x
d
1,3
1,3
1
d
e
x
2
1,4
1,4
d
x
e
3
2,3
2,3
e
d
x
2,4
4
2,4
e
x
f(d)
3,4
3,4
P(xe)
Likelihood functions
23Factor graph for denoising
I(d , Se )
e
x
j
i,j
1,2
1,2
i
e
x
d
1,3
1,3
1
d
e
x
2
1,4
1,4
d
x
e
3
2,3
2,3
e
d
x
2,4
4
2,4
e
x
f(d)
3,4
3,4
P(xe)
24Sum-product approximate inference
- Two types of binary messages
- edge variables a constraint nodes
- constraint nodes a edge variables
25Calculating edge a constraint messages
e 0 or 1
edge e -gt degree I messages
i,j
i
m (e) m (e) m (e)
e -gt I
I -gt e
x -gt e
j
i
i,j
i,j
i,j
i,j
likelihood message
constraint -gt edge message
P(x e e)
i.e.
i,j
i,j
26Calculating constraint a edge messages
degree I -gt edge e messages
i,j
j
m (e) S S f(d) P m (ek)
e -gt I
I -gt e
k j
i
i,k
d
j
i,j
e1, e2, , eN s.t. ej e, and S ek d
degree prior
intractable sum? No, use dynamic programming
27Dynamic programming solution
d
j
I
s
s
s
j
1
2
N
e
e
e
d
1,j
2,j
N,j
j
e
e
e
1,j
2,j
N,j
N
2
O(2 ) time
O(N ) time
28Dynamic programming solution
Constraint
d
j
s
s e
i1
i
i1,j
I
s
s
s
j
1
2
N
e
e
e
d
1,j
2,j
N,j
j
e
e
e
1,j
2,j
N,j
N
2
O(2 ) time
O(N ) time
29Inference for untangling
- Message passing same as denoising, except
likelihood message needs to be recalculated. - Likelihood message incorporates edge
information from other hidden graphs
30Factor graph for untangling
1
2
f (d)
P(xe ,e )
f (d)
1
2
e
1
1
d
e
d
2
2
1,2
1
1,2
1
x
1,2
e
1
1
d
e
d
2
2
1,3
2
1,3
2
x
1,3
e
d
e
d
1
1
2
2
2,3
3
2,3
3
x
2,3
I(d, ee)
I(d, ee)
Graph 1
Graph 2
31Protein-protein interaction network denoising
Von Mering et al (2002) dataset
- Eight PPI networks consisting of
- Low quality direct evidence (high-throughput)
- Indirect evidence
- Gold standard
- A small set of confirmed interactions
32Empirical degree distributions
33Methods
- Split 6k ORFs into training and test set
- On training set
- Fit degree priors to both true graph and
false graph. - Construct likelihood function out of all
observations using Naïve Bayes - Every observed interaction must be placed in
exactly one of the two hidden graphs.
34Results
Untangling
Baseline
35Summary
- Generative model for observed graphs, composed of
many hidden graphs - Sum-product approximate inference algorithm for
degree-based priors - Application to protein-protein interaction
network noise removal