Title: Statistical physics of complex networks
1Statistical physics of complex networks
- Sergei Maslov
- Brookhaven National Laboratory
2Short history complex systems before after
networks
- Statistical physics of complex systems was active
in 80s-90s (following the chaos boom of 70s) - Fractals (Mandelbrot and many others)
- Self-Organized Criticality (Per Bak and
co-authors) ? sandpiles ? granular systems - Complexmultiple time and length scales (e.g.
avalanches) ? Cult of power-laws - Cellular automata (mostly in real spacetime)
- Examples
- earthquakes
- disordered moving interfaces
- (co)-evolution of species
- agent-based modeling (ants)
- By the end of 90s breakup of the community and
specialization - Biology
- Economics and finance
- Internet
- Social sciences
3Networks in complex systems
- Complex systems
- Large number of components interacting with each
other - All components and/or interactions are different
from each other (unlike in traditional physics
where 1023 electrons are all the same!) - Paradigms
- 104 types of proteins in an organism,
- 106 routers in the Internet
- 109 web pages in the WWW
- 1011 neurons in a human brain
- The simplest property who interacts with whom?
can be visualized as a network - Complex networks are just a backbone for complex
dynamical processes
4Why study the topology of complex networks?
- Lots of easily available data thats where the
state of the art information is (at least in
biology) - Large networks may contain information about
basic design principles and/or evolutionary
history of the complex system - This is similar to paleontology learning about
an animal from its backbone
5 6- A small part of a metabolic network the citric
acid cycle
7Metabolic pathway chart by ExPASy
8Protein binding networks
Bakers yeast S. cerevisiae (only nuclear
proteins shown)
Nematode worm C. elegans
9Transcription regulatory networks
Single-celled eukaryote S. cerevisiae
Bacterium E. coli
10GENOME
protein-gene interactions
PROTEOME
protein-protein interactions
METABOLISM
bio-chemical reactions
slide after Reka Albert
11- Between cells in a multi-cellular organism
12Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
13C. elegans neurons
14 15Freshwater food web by Neo Martinez and Richard
Williams
16Sexual contacts M. E. J. Newman, The structure
and function of complex networks, SIAM Review 45,
167-256 (2003).
17 18High school dating Data drawn from Peter S.
Bearman, James Moody, and Katherine Stovel
visualized by Mark Newman
19Network of actor co-starring in movies
20Networks of scientists co-authorship of papers
21Webpages connected by hyperlinks on the ATT
website circa 1996 visualized by Mark
Newman Citation networks are similar to the WWW
but time-ordered
22 23Internet as measured by Hal Burch and Bill
Cheswick's Internet Mapping Project.
24(No Transcript)
25transportation networks airlines
26transportation networks railway maps
Tokyo rail map
27- Lecture 1 General introduction into networks
- Node degrees, its distribution, and correlations
- Simple models
- preferential attachment and Simon model
- Growth model for protein families
- Percolation transition on networks
- Clustering coefficient
- Lectures 2-3 Biomolecular (mostly protein)
networks - Regulatory and signaling networks
- How many regulators? Bureaucratic collapse
- Network motifs in directed (e.g. regulatory)
networks - Protein binding networks
- Broad degree distributions in protein binding
networks and possible explanations - Evolutionary (duplication-divergence)
- Biophysical (stickiness)
- Functional
- Beyond degree distributions How it all is wired
together? Correlations in degrees - Randomization of networks
- Law of Mass Action and propagation of
perturbations
28Degree (or connectivity) of a node the of
neighbors
Degree K2
Degree K4
29Directed networks havein- and out-degrees
In-degree Kin2
Out-degree Kout5
30- Degree distributions in random and real networks
31Degree distribution in a random network
- Randomly throw E edges among N nodes
- Solomonoff, Rapaport, Bull. Math. Biophysics
(1951)Erdos-Renyi (1960) - Degree distribution Binominal ? Poisson
- K???? with no hubs(fast decay of N(K))
32Degree distribution in real protein binding
network
- Histogram N(K) is broad most nodes have low
degree 1, few nodes high degree 100 - Can be approximately fitted with N(K)K-?
functional formwith ?2.5
33Many real world networkshave broad degree
distributions
34Basic BA-model
- Very simple algorithm to implement
- start with an initial set of m0 fully connected
nodes - e.g. m0 3
- now add new vertices one by one, each one with
exactly m edges - each new edge connects to an existing vertex in
proportion to the number of edges that vertex
already has ? preferential attachment - easiest if you keep track of edge endpoints in
one large array and select an element from this
array at random - the probability of selecting any one vertex will
be proportional to the number of times it appears
in the array which corresponds to its degree
1 1 2 2 2 3 3 4 5 6 6 7 8 .
35generating BA graphs contd
- To start, each vertex has an equal number of
edges (2) - the probability of choosing any vertex is 1/3
- We add a new vertex, and it will have m edges,
here take m2 - draw 2 random elements from the array suppose
they are 2 and 3 - Now the probabilities of selecting 1,2,3,or 4 are
1/5, 3/10, 3/10, 1/5 - Add a new vertex, draw a vertex for it to connect
from the array - etc.
36The tale of linear vs exponential growth
- Linear growth Barabasi-Albert model with ?3 is
a version of the Simons word usage model ?2? - dnk/dt(k-1)nk-1/(t?t)-knk/(t?t)
- Exponential growth Protein duplication-deletion
model ?2?/(?dup-?del) - dnk/dt?dup (k-1)nk-1- (?dup?del )knk?del
(k1)nk1 NF?knk also grows exponentially
dNF/dt ? NG ? ?kknk
37Preferential attachment with fitness
- Bianconi-Barabasi (2001)
- Attractiveness of a node to new edges is given by
fiki/?rfrkr - For uniform ?(f) Pk k-(1C)/ln(k), where
C1.255 - Generally C depends on ?(f)
- Some ?(f) result in Bose-Einstein condensation
in which super-hubs emerge
38- Percolation transition in networks
39Why should we care?
- The most important property of a network. It
quantifies how broken-up is a network - Below the percolation threshold many small
components - At the percolation threshold scale-free
distribution of component sizes P(S)S-2.5 - Above the percolation threshold giant connected
component and a few small ones? - Determines the propagation of perturbations which
affect neighbors with probability p (e.g.
infections)
40Naïve (and wrong) argument
- An average node has ltKgt first neighbors, ltKgtltK-1gt
second neighbors, ltKgtltK-1gtltK-1gt third neighbors - We neglect overlap between e.g. second and first
neighbors in random networks a small effect 1/N - If ltK-1gt ? 1 a single node is connected to a
finite fraction of all nodes in the network
41Where is it wrong?
- Probability to arrive at a node with K neighbors
is proportional to K! - All averages have to be modified ltF(K)gt ? ltF(K)
Kgt/ltKgt - The right answer ltK(K-1)gt/ltKgt ? 1 a
perturbation would spread - In directed networks it is ltKinKoutgt/ltKingt ? 1
- Correlations between degrees of neighbors and an
abnormally large number of triangles (clustering)
would affect the answer
42How many clusters?
- If ltK(K-1)gt/ltKgt ltlt 1 there are only small
clusters - If ltK(K-1)gt/ltKgt ? 1 cluster sizes S have a
scale-free distribution P(S)S-2.5. - If ltK(K-1)gt/ltKgt gtgt 1 there is one giant
cluster and a few small ones - Perturbation which affects neighbors with
probability p propagates if pltK(K-1)gt/ltKgt ? 1 - For scale-free networks P(K)K-? with ?lt3,
ltK2gt? ? perturbation always spreads in a large
enough network
43Diameter and mean cluster size are determined by
ltk(k-1)gt/ltkgt
- Mean diameter L 1ltkgt ltkgtltk(k-1)gt/ltkgt
ltkgt(ltk(k-1)gt/ltkgt)LN ? L ?
log(N/ltkgt)/log(ltk(k-1)gt/ltkgt)1 - Mean cluster size below pcltSgt1ltkgt/(1-ltk(k-1)gt/
ltkgt)
44Amplification ratios
- A(dir) 1.08 - E. Coli, 0.58 - Yeast
- A(undir) 10.5 - E. Coli, 13.4 Yeast
- A(PPI) ? - E. Coli, 26.3 - Yeast
45Clustering coefficient C?
- C?3 N?/?knk k(k-1)/2
- Could be defined for individual nodes or as a
function of k C?(k)3 N?(k)/nk k(k-1)/2 - C?1 could not be realized if k is heterogeneous
- Needs to be compared to its value in randomized
networks with the same degree sequence
46End lecture 1
47Lecture 2
48 49Places to learn molecular biology
- Molecular Biology of the Cell. Fourth Edition.
Bruce Alberts, Alexander Johnson, Julian Lewis,
Martin Raff, Keith Roberts, Peter Walter. Garland
Science. 2002. - DNA from the beginning. http//www.dnaftb.org/
- Online Biology Book. http//gened.emc.maricopa.edu
/bio/bio181/BIOBK/BioBookTOC.html - Kimballs Biology Pages. http//www.ultranet.com/
jkimball/BiologyPages/ - Gene expression. http//vlib.org/Science/Cell_Biol
ogy/gene_expression.shtml - Human Genome Project. http//www.ornl.gov/hgmis/
- Microarrays. http//www.gene-chips.com/
From Prof. Michael Hallett (McGill) online
lectures
50Protein networks
- Nodes proteins
- Edges interactions between proteins
- Metabolic (protein enzymes on sharing common
metabolites are connected) - Physical (binding interactions)
- Regulatory and signaling (transcriptional
regulation, protein modifications) - Co-expression networks from microarray data
(connect genes with similar expression
(abundance) patterns under many conditions) - Genetic interactions e.g. synthetic lethal
protein pairs (removal of any one of the two
proteins doesnt kill the cell, but removal of
both proteins does) - Etc, etc, etc.
51Sources of data on protein networks
- Genome-wide experiments
- Binding two-hybrid (Y2H) and mass-spec (MS)
high-throughput techniques - Transcriptional regulation ChIP-on-chip, or
ChIP-then-SAGE - Expression, disruption networks microarrays
- Lethality of genes (including synthetic lethals)
- Gene knockout yeast
- RNAi worm, fly
- Many small or intermediate-scale experiments
- All stored in public databases BIOGRID, DIP,
BIND, YPD (no longer public), SGD, Flybase,
Ecocyc, etc.
52Pathway ? network paradigm shift
53Images from ResNet3.0 by Ariadne Genomics
MAPK signaling
Inhibition of apoptosis
54- Transcription regulatory networks
55Transcription factors bind DNA
56Activators and repressors
- Depending on the position of the binding site
(operator) with respect to the RNA-polymerase
binding site (promoter) Transcription Factors
could either activate or repress the production
of mRNA from a given gene (transcription) and
thus affect the abundance of a protein product
57Transcription regulatory networks
58Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
59- How many transcriptional regulators are out
there?
60Fraction of transcriptional regulators in bacteria
61Figure from Erik van Nimwegen, TIG 2003
62Complexity of regulation grows with complexity of
organism
- NRltKoutgtNltKingtnumber of edges
- NR/N ltKingt/ltKoutgt increases with N
- ltKingt grows with N
- In bacteria NRN2 (Stover, et al. 2000)
- In eucaryots NRN1.3 (van Nimwengen, 2002)
- Networks in more complex organisms are more
interconnected then in simpler ones
63Complexity is manifested in Kin distribution
E. coli vs H. sapiens
64Table from Erik van Nimwegen, TIG 2003
65Toolbox model
- NTFAN2 ? dNTF2ANdN ? dN/dNTF2A/N
- In small genomes 100 genes per TF. In large ones
only 4! - A toolbox (e.g. metabolic network) grows linearly
with N. To handle a new condition (NTF?NTF1) one
needs fewer and fewer new tools. - S. Maslov, S. Krishna, K. Sneppen in preparation
66How is it all connected? (beyond degree
distribution)
67What is unusual about topology of a given network?
- Look for a number of occurrences of a certain
topological pattern - Compare with a randomized network
- What patterns to look for?
- Number of edges connecting nodes with given
degrees (degree-degree correlations) - Motifs small subgraphs of 3-4 nodes (in
undirected networks clustering or the triangles) - Overrepresentation Nature needs them for some
function - Underrepresentation they are detrimental and
nature avoids them
68- How to construct a proper random network?
69Randomization of a network
70Stub reconnection algorithm
- Break every edge into two halves (stubs)
- Randomly reconnect stubs
- Watch for multiple edges!
- For example, in the AS-Internet two largest hubs
would end up being connected with 50 edges (sic!)
- Not adaptable to conserve other low-level
topological properties of the network
71Local rewiring algorithm
- R. Kannan, P. Tetali, and S. Vempala, Random
Structures and Algorithms (1999) - SM, K. Sneppen, Science (2002)
- Randomly select and rewire two edges
- Repeat many times
72Metropolis rewiring algorithm
energy E
energy E?E
SM, K. Sneppen cond-mat preprint
(2002),Physica A (2004)
- Randomly select two edges
- Calculate change ?E in energy function
E(Nactual-Ndesired)2/Ndesired - Rewire with probability pexp(-?E/T)
73- Degree-degree correlations
74Central vs peripheral network architecture
random
A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys.
Rev. Lett. 92, 17870, (2004)
75What is the case for protein interaction network
SM, K. Sneppen, Science 296, 910 (2002)
76Correlation profile
- Count N(k0,k1) the number of links between
nodes with connectivities k0 and k1 - Compare it to Nr(k0,k1) the same property in a
random network - Qualitative features are very noise-tolerant with
respect to both false positives and false
negatives
77(No Transcript)
78Correlation profile of the protein interaction
network
R(k0,k1)N(k0,k1)/Nr(k0,k1)
Z(k0,k1) (N(k0,k1)-Nr(k0,k1))/?Nr(k0,k1)
Similar profile is seen in the yeast regulatory
network
79Hubs may act within a module, or connect modules
- Party hub
- simultaneous interactions
- tends to be within the same module
- Date hub
- sequential interactions
- connect different modules
Han et al, Nature 443, 88 (2004)
80(No Transcript)
81Correlation profile of the yeast regulatory
network
R(kout, kin)N(kout, kin)/Nr(kout,kin)
Z(kout,kin)(N(kout,kin)-Nr(kout,kin))/
?Nr(kout,kin)
82Some scale-free networks may appear similar
In both networks the degree distribution is
scale-free P(k) k-? with ?2.2-2.5
83But correlation profiles give them unique
identities
Internet
Protein interactions
84- Small network motifs(Uri Alon and his group)
85All 3 node motifs
86Motifs can overlap in the network
motif to be found
graph
motif matches in the target graph
http//mavisto.ipk-gatersleben.de/frequency_concep
ts.html
87Detection of important network motifs
- Technique
- construct many random graphs with the same number
of nodes and degree distribution - count the number of motifs in those graphs
- calculate the Z score the probability that the
same or larger number of motifs in the real world
network could have occurred in a random one - Software available
- http//www.weizmann.ac.il/mcb/UriAlon/
88What the Z score means
m mean number of times the motifappeared in
the random graph
the probability observing a Z score of 2 is
0.02275 In the context of motifs Z gt 0, motif
occurs more often than for random graphs Z lt 0,
motif occurs less often than in random
graphs Z gt 1.65, only a 5 chance of random
occurrence
s standard deviation
of times motif appeared in random graph
x - mx
zx
sx
89Examples of network motifs (3 nodes)
- Feed forward loop
- Found in many transcriptional regulatory
networks
90Possible functional role of a coherent
feed-forward loop
- Noise filtering short pulses in input do not
result in turning on of the Z - To function needs time-delay (about 0.5hrs for
bacterial transcription)
91All 4 node subgraphs (computational expense
increases with the size of the graph!)
92Higher-order motifs
- 4-node motifs contain some 3-node motifs
- One needs to be careful when calculating
over-representation - Alon co-authors use our Metropolis algorithm to
generate networks with a given number of
low-level motifs
93Table 1 from R Milo, S Shen-Orr, S Itzkovitz, N
Kashtan, D Chklovskii U Alon, Network Motifs
Simple Building Blocks of Complex Networks
Science, 298824-827 (2002)
94Examples of network motifs (4 nodes)
- Parallel paths are over represented
- Neural networks
- Food webs
95Finding classes on graphs based on their motif
profiles
96THE END