Title: Network Motifs: Simple Building Blocks of Complex Network
1Network Motifs Simple Building Blocks of Complex
Network
2Introduction
- Recently, it was found that biochemical and
neuronal network share a similar property they
contain recurring circuit elements which occur
more often far more than that in randomized
networks. - We call such simple building blocks network
motifs.
3Introduction
- In the case of biological regulation networks, it
has been suggested that network motifs play key
information processing roles.
4Introduction
Three major network mortifs were found in the
transcription network of bacteria and yeast.
One of these the feed-forward loop, has been
shown theoretically to perform information
processing tasks such as sign-sensitive
filtering, response acceleration and
pulse-generation.
5Introduction
6Introduction
Red dashed line indicate edges that participate
in the feedforward loop motif, which occur five
times in the real network.
7Introduction
- Applications in other network
- Ecology (food web)
- Neurobiology (neuron connectivity)
- Engineering (electronic circuit, WWW)
8Introduction
- Some remarks
- The solution we get is closely related to the
randomized network model. So a reasonable select
of randomized network model is very important. - Some functional-important but less-frequent
building block will be missed no matter how we
select our model. To find this type of things
need specific knowledge and information which are
beyond the sweep of graph theory approach.
9Related Problems
- Theoretical Perspective
- efficiently counting cycle.
- counting spanning trees.
- number of nonisomorphic graphs
- testing isomorphism
- approximating perfect matching.
- approximating frequent subgraphs based on the
regularity lemma. -
10Related Problems
- Data mining perspective.
- Mining frequent subgraphs.
- Mining a given subgraph.
- Mining subgraphs in sparse network.
- Graph-based substructure pattern
mining(gSpan)
11Related Problems
- Random network.
- Generating randomized network with prescribed
degree sequence. - Estimating subgraphs in random networks.
12Related Problems
- Random network.
- Erdos model
- -the distribution of the number of edges per node
exhibit a Poissonian distribution. - Scale-free model
- -the distribution of the number of edges per node
exhibit a exponential distribution.
13Randomized Network
- Generating randomized network
- Here we only give a simple algorithm.
- We employed a Markov-chain algorithm, based on
starting with the real network and repeatedly
swapping randomly chosen pairs of connections
(X1-gtY1, X2 -gtY2 is replaced by X1-gtY2, X2-gtY1)
until the network is well randomized. - Switching is prohibited if the either of the
connections X1-gtY2 or X2-gtY1 already exist.
14Randomized Network
- Controlling for Appearances of (n 1)-Node
Motifs - We generate a series of randomized network
ensembles, each of which has the same (n
1)-node subgraph count as the real network, as a
null hypothesis for detecting n-node motifs. - This is done to avoid assigning high significance
to a structure only because of the fact that it
includes a highly significant substructure.
15Randomized Network
- Controlling for Appearances of (n 1)-Node
Motifs - Metropolis Monte-Carlo approach
- Vreal,k be the number of appearances of each of
the kth (n-1)-node subgraphs in the real network
and Vrand,k be the corresponding vector in the
randomized network. - We define an energy
- E ?k(Vreal,k Vrand,k/(Vreal,k
Vrand,k)). - The energy E is zero only when all the three-node
subgraph counts of the real and randomized graphs
are equal.
16Randomized Network
- Controlling for Appearances of (n 1)-Node
Motifs - start by fully randomizing the network according
to first algorithm. - Then, we generate a random switch (X1-gtY1, X2-gt
- Y2 to (X1-gtY2, X2-gtY1), and similarly for double
edges, as described above). - If this switch lowers E, it is accepted.
- Otherwise, it is accepted with probability exp(M
E/T), where ME is the difference in energy before
and after the switch and T is an effective
temperature.
17Graph Theoretical Results
- Controlling for Appearances of (n 1)-Node
Motifs - This process is repeated, with a simulated
annealing regiment to lower T slowly until a
solution with E 0 is obtained. - This can be readily generalized to form (n
1)-node null-hypothesis networks
18Algorithm Counting
- Goal find all n-node network motif
- Method
- Do the following for both real network and
randomized network - Simply enumerate all the possible n node
subgraphs, classify them into non-isomorphic
class. - Count the number of subgraphs in each class.see
all types of 3,4node nonisomorphic graphs
19Algorithm Counting
- Efficiently count all connected n-node subgraphs
in a connectivity matrix M -
- main
- for all rows i
- for each nonzero element (i, j)
- search (i,j)
-
- search(i,j)
-
- for each k such that Mik 1 and k!j
- if an n-node subgraph is obtained then
record it and return - else search (i,k)
-
- do similar things for each Mki 1, Mkj 1, Mjk
1 -
-
20Algorithm Counting
- A table is formed that counts the number of
appearances of each type of subgraph in the
network, - This process is repeated for each of the
randomized networks. The number of appearances of
each type of subgraph in the random ensemble is
recorded, to assess its statistical significance.
21Algorithm Counting
- Criteria for Network Motif Selection
- (i) The probability that it appears in a
randomized network an equal or greater number of
times than in the real network is smaller than P
0.01. - (ii) The number of times it appears in the real
network with distinct sets of nodes is at least
4. - (iii) The number of appearances in the real
network is significantly larger than in the
randomized networks Nreal Nrand gt 0.1Nrand.
This is done to avoid detecting as motifs some
common subgraphs that have only a slight
difference between Nrand and Nreal but have a
narrow distribution in the randomized networks.
22Algorithm Counting
CiNi/?i Ni Z-scores Z (Creal
Crand)/Varrand (note the inequality P(X-E(x))
gtZVar(x)lt1/Z2 ) High Z-scores indicate the
event is quit unlikely.
23Algorithm Sampling
- A clever trade-off between accuracy and
efficiency. - The counting algorithm can exactly enumerate the
number of subgraph, but to detect network motifs,
we only need to know which type of subgraph occur
more frequently in real network than in
randomized network.
24Algorithm Sampling
- Using random sampling method can do pretty good
estimation. - Random sampling has many applications.
- -approximating dense subset
- -approximating P-complete problem
- -mechine learning
25Algorithm Sampling
- This algorithm does not enumerate subgraphs
exhaustively but instead samples subgraphs in
order to estimate their relative frequency. - The runtime of the algorithm asymptotically does
not depend on the network size. - Surprisingly, few samples are needed to detect
network motifs reliably. - The sampling method is useful for analyzing very
large networks or for detection of high-order
motifs, which are beyond the reach of exhaustive
enumeration algorithms.
26Algorithm Sampling
- DefinitionEs is the set of picked edges
- Vs is the set of all node that are touch be the
edges in Es - ALGORITHM Sampling
- Initiate Vs? and Es ?
- 1.Pick a random edge e1(vi,vj),update
Ese1,Vsvi,vj - 2.Make a list L of all neighboring edges of Es,
omit all edges between Vs.if L? return to 1 - 3.pick a random edge e(vk,vl)from L. Update
EsEs U e, VsVs U vk,vl - 4.Repeat steps 2-3 until completing n-node
subgraph S. - 5.Calculate the probability P to sample S.
27Algorithm Sampling
- The probability of sampling the subgraph is the
sum of the probabilities of all such possible
ordered sets of n-1 edges - Where Sm is a set of all (n-1)-permutations of
the edges from the specific subgraph edges that
could lead to a sample of the subgraph. Ej is the
j -th edge in a specific (n-1)-permutation (s).
28Algorithm Sampling
29Algorithm Sampling
- Add score W 1/P to the accumulated score, Si ,
of the relevant subgraph type i Si Si W.
After ST samples, assuming we sampled L different
subgraph types, we calculate the estimated
subgraph concentrations - Ci Si/?k1L Sk
30Algorithm Sampling
- Z-scores is calculated as before.
- Z (Creal ltCrandgt)/Varrand
- where Creal is the concentration in the real
network, ltCrandgt and Varrand are the mean and SD
in the randomized networks.
31Algorithm Sampling
Sampling method versus exhaustive enumeration,
Highlighted subgraphs were found to be network
motifs.
32Algorithm Sampling
- Algorithm convergence
- The subgraph concentrations calculated by the
sampling algorithm converged to the fully
enumerated concentrations. Different numbers of
samples were required for achieving good
estimations for different subgraphs and in
different networks. - All of the simulations we performed, on a variety
of networks, showed that the results converge
toward the real values within ST 105 samples or
less.
33Algorithm Sampling
- Algorithm convergence
- It is seen that even with a small number of
samples one can estimate reliably concentrations
as low as C 10-5. - It is possible to use convergence studies in
order to decide the required number of
samples.(adaptive sampling method,using
instantaneous convergence rate to decide how many
samples are enough)
34Algorithm Sampling
- The sampling method allows accurate counting of
rare, high-order subgraphs and motifs
35Some discuss and Future attempt
- We focus on comparing between the real network
and the randomized network with prescribed degree
sequence. So our question is whether some real
frequent building block are caused by the degree
sequence. - If so, so what we have done will miss this type
of building block. Some other randomized network
model (rather than the ones with prescribed
degree sequence) could be introduced to deal with
such case.
36Some discuss and Future attempt
- Embedding the graph to euclidean space, and
considering the subgraph with no only topological
properties but also geometric properties.
37THANKS