Statistical physics of complex networks - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

Statistical physics of complex networks

Description:

– PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 97
Provided by: cmth
Category:

less

Transcript and Presenter's Notes

Title: Statistical physics of complex networks


1
Statistical physics of complex networks
  • Sergei Maslov
  • Brookhaven National Laboratory

2
Short history complex systems before after
networks
  • Statistical physics of complex systems was active
    in 80s-90s (following the chaos boom of 70s)
  • Fractals (Mandelbrot and many others)
  • Self-Organized Criticality (Per Bak and
    co-authors) ? sandpiles ? granular systems
  • Complexmultiple time and length scales (e.g.
    avalanches) ? Cult of power-laws
  • Cellular automata (mostly in real spacetime)
  • Examples
  • earthquakes
  • disordered moving interfaces
  • (co)-evolution of species
  • agent-based modeling (ants)
  • By the end of 90s breakup of the community and
    specialization
  • Biology
  • Economics and finance
  • Internet
  • Social sciences

3
Networks in complex systems
  • Complex systems
  • Large number of components interacting with each
    other
  • All components and/or interactions are different
    from each other (unlike in traditional physics
    where 1023 electrons are all the same!)
  • Paradigms
  • 104 types of proteins in an organism,
  • 106 routers in the Internet
  • 109 web pages in the WWW
  • 1011 neurons in a human brain
  • The simplest property who interacts with whom?
    can be visualized as a network
  • Complex networks are just a backbone for complex
    dynamical processes

4
Why study the topology of complex networks?
  • Lots of easily available data thats where the
    state of the art information is (at least in
    biology)
  • Large networks may contain information about
    basic design principles and/or evolutionary
    history of the complex system
  • This is similar to paleontology learning about
    an animal from its backbone

5
  • Inside single cells

6
  • A small part of a metabolic network the citric
    acid cycle

7
Metabolic pathway chart by ExPASy
8
Protein binding networks
Bakers yeast S. cerevisiae (only nuclear
proteins shown)
Nematode worm C. elegans
9
Transcription regulatory networks
Single-celled eukaryote S. cerevisiae
Bacterium E. coli
10
GENOME
protein-gene interactions
PROTEOME
protein-protein interactions
METABOLISM
bio-chemical reactions
slide after Reka Albert
11
  • Between cells in a multi-cellular organism

12
Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
13
C. elegans neurons
14
  • Between organisms

15
Freshwater food web by Neo Martinez and Richard
Williams
16
Sexual contacts M. E. J. Newman, The structure
and function of complex networks, SIAM Review 45,
167-256 (2003).
17
  • Social

18
High school dating Data drawn from Peter S.
Bearman, James Moody, and Katherine Stovel
visualized by Mark Newman
19
Network of actor co-starring in movies
20
Networks of scientists co-authorship of papers
21
Webpages connected by hyperlinks on the ATT
website circa 1996 visualized by Mark
Newman Citation networks are similar to the WWW
but time-ordered
22
  • Technological

23
Internet as measured by Hal Burch and Bill
Cheswick's Internet Mapping Project.
24
(No Transcript)
25
transportation networks airlines
26
transportation networks railway maps
Tokyo rail map
27
  • Lecture 1 General introduction into networks
  • Node degrees, its distribution, and correlations
  • Simple models
  • preferential attachment and Simon model
  • Growth model for protein families
  • Percolation transition on networks
  • Clustering coefficient
  • Lectures 2-3 Biomolecular (mostly protein)
    networks
  • Regulatory and signaling networks
  • How many regulators? Bureaucratic collapse
  • Network motifs in directed (e.g. regulatory)
    networks
  • Protein binding networks
  • Broad degree distributions in protein binding
    networks and possible explanations
  • Evolutionary (duplication-divergence)
  • Biophysical (stickiness)
  • Functional
  • Beyond degree distributions How it all is wired
    together? Correlations in degrees
  • Randomization of networks
  • Law of Mass Action and propagation of
    perturbations

28
Degree (or connectivity) of a node the of
neighbors
Degree K2
Degree K4
29
Directed networks havein- and out-degrees
In-degree Kin2
Out-degree Kout5
30
  • Degree distributions in random and real networks

31
Degree distribution in a random network
  • Randomly throw E edges among N nodes
  • Solomonoff, Rapaport, Bull. Math. Biophysics
    (1951)Erdos-Renyi (1960)
  • Degree distribution Binominal ? Poisson
  • K???? with no hubs(fast decay of N(K))

32
Degree distribution in real protein binding
network
  • Histogram N(K) is broad most nodes have low
    degree 1, few nodes high degree 100
  • Can be approximately fitted with N(K)K-?
    functional formwith ?2.5

33
Many real world networkshave broad degree
distributions
34
Basic BA-model
  • Very simple algorithm to implement
  • start with an initial set of m0 fully connected
    nodes
  • e.g. m0 3
  • now add new vertices one by one, each one with
    exactly m edges
  • each new edge connects to an existing vertex in
    proportion to the number of edges that vertex
    already has ? preferential attachment
  • easiest if you keep track of edge endpoints in
    one large array and select an element from this
    array at random
  • the probability of selecting any one vertex will
    be proportional to the number of times it appears
    in the array which corresponds to its degree

1 1 2 2 2 3 3 4 5 6 6 7 8 .
35
generating BA graphs contd
  • To start, each vertex has an equal number of
    edges (2)
  • the probability of choosing any vertex is 1/3
  • We add a new vertex, and it will have m edges,
    here take m2
  • draw 2 random elements from the array suppose
    they are 2 and 3
  • Now the probabilities of selecting 1,2,3,or 4 are
    1/5, 3/10, 3/10, 1/5
  • Add a new vertex, draw a vertex for it to connect
    from the array
  • etc.

36
The tale of linear vs exponential growth
  • Linear growth Barabasi-Albert model with ?3 is
    a version of the Simons word usage model ?2?
  • dnk/dt(k-1)nk-1/(t?t)-knk/(t?t)
  • Exponential growth Protein duplication-deletion
    model ?2?/(?dup-?del)
  • dnk/dt?dup (k-1)nk-1- (?dup?del )knk?del
    (k1)nk1 NF?knk also grows exponentially
    dNF/dt ? NG ? ?kknk

37
Preferential attachment with fitness
  • Bianconi-Barabasi (2001)
  • Attractiveness of a node to new edges is given by
    fiki/?rfrkr
  • For uniform ?(f) Pk k-(1C)/ln(k), where
    C1.255
  • Generally C depends on ?(f)
  • Some ?(f) result in Bose-Einstein condensation
    in which super-hubs emerge

38
  • Percolation transition in networks

39
Why should we care?
  • The most important property of a network. It
    quantifies how broken-up is a network
  • Below the percolation threshold many small
    components
  • At the percolation threshold scale-free
    distribution of component sizes P(S)S-2.5
  • Above the percolation threshold giant connected
    component and a few small ones?
  • Determines the propagation of perturbations which
    affect neighbors with probability p (e.g.
    infections)

40
Naïve (and wrong) argument
  • An average node has ltKgt first neighbors, ltKgtltK-1gt
    second neighbors, ltKgtltK-1gtltK-1gt third neighbors
  • We neglect overlap between e.g. second and first
    neighbors in random networks a small effect 1/N
  • If ltK-1gt ? 1 a single node is connected to a
    finite fraction of all nodes in the network

41
Where is it wrong?
  • Probability to arrive at a node with K neighbors
    is proportional to K!
  • All averages have to be modified ltF(K)gt ? ltF(K)
    Kgt/ltKgt
  • The right answer ltK(K-1)gt/ltKgt ? 1 a
    perturbation would spread
  • In directed networks it is ltKinKoutgt/ltKingt ? 1
  • Correlations between degrees of neighbors and an
    abnormally large number of triangles (clustering)
    would affect the answer

42
How many clusters?
  • If ltK(K-1)gt/ltKgt ltlt 1 there are only small
    clusters
  • If ltK(K-1)gt/ltKgt ? 1 cluster sizes S have a
    scale-free distribution P(S)S-2.5.
  • If ltK(K-1)gt/ltKgt gtgt 1 there is one giant
    cluster and a few small ones
  • Perturbation which affects neighbors with
    probability p propagates if pltK(K-1)gt/ltKgt ? 1
  • For scale-free networks P(K)K-? with ?lt3,
    ltK2gt? ? perturbation always spreads in a large
    enough network

43
Diameter and mean cluster size are determined by
ltk(k-1)gt/ltkgt
  • Mean diameter L 1ltkgt ltkgtltk(k-1)gt/ltkgt
    ltkgt(ltk(k-1)gt/ltkgt)LN ? L ?
    log(N/ltkgt)/log(ltk(k-1)gt/ltkgt)1
  • Mean cluster size below pcltSgt1ltkgt/(1-ltk(k-1)gt/
    ltkgt)

44
Amplification ratios
  • A(dir) 1.08 - E. Coli, 0.58 - Yeast
  • A(undir) 10.5 - E. Coli, 13.4 Yeast
  • A(PPI) ? - E. Coli, 26.3 - Yeast

45
Clustering coefficient C?
  • C?3 N?/?knk k(k-1)/2
  • Could be defined for individual nodes or as a
    function of k C?(k)3 N?(k)/nk k(k-1)/2
  • C?1 could not be realized if k is heterogeneous
  • Needs to be compared to its value in randomized
    networks with the same degree sequence

46
End lecture 1
47
Lecture 2
48
  • Protein networks

49
Places to learn molecular biology
  • Molecular Biology of the Cell. Fourth Edition.
    Bruce Alberts, Alexander Johnson, Julian Lewis,
    Martin Raff, Keith Roberts, Peter Walter. Garland
    Science. 2002.
  • DNA from the beginning. http//www.dnaftb.org/
  • Online Biology Book. http//gened.emc.maricopa.edu
    /bio/bio181/BIOBK/BioBookTOC.html
  • Kimballs Biology Pages. http//www.ultranet.com/
    jkimball/BiologyPages/
  • Gene expression. http//vlib.org/Science/Cell_Biol
    ogy/gene_expression.shtml
  • Human Genome Project. http//www.ornl.gov/hgmis/
  • Microarrays. http//www.gene-chips.com/

From Prof. Michael Hallett (McGill) online
lectures
50
Protein networks
  • Nodes proteins
  • Edges interactions between proteins
  • Metabolic (protein enzymes on sharing common
    metabolites are connected)
  • Physical (binding interactions)
  • Regulatory and signaling (transcriptional
    regulation, protein modifications)
  • Co-expression networks from microarray data
    (connect genes with similar expression
    (abundance) patterns under many conditions)
  • Genetic interactions e.g. synthetic lethal
    protein pairs (removal of any one of the two
    proteins doesnt kill the cell, but removal of
    both proteins does)
  • Etc, etc, etc.

51
Sources of data on protein networks
  • Genome-wide experiments
  • Binding two-hybrid (Y2H) and mass-spec (MS)
    high-throughput techniques
  • Transcriptional regulation ChIP-on-chip, or
    ChIP-then-SAGE
  • Expression, disruption networks microarrays
  • Lethality of genes (including synthetic lethals)
  • Gene knockout yeast
  • RNAi worm, fly
  • Many small or intermediate-scale experiments
  • All stored in public databases BIOGRID, DIP,
    BIND, YPD (no longer public), SGD, Flybase,
    Ecocyc, etc.

52
Pathway ? network paradigm shift
53
Images from ResNet3.0 by Ariadne Genomics
MAPK signaling
Inhibition of apoptosis
54
  • Transcription regulatory networks

55
Transcription factors bind DNA
56
Activators and repressors
  • Depending on the position of the binding site
    (operator) with respect to the RNA-polymerase
    binding site (promoter) Transcription Factors
    could either activate or repress the production
    of mRNA from a given gene (transcription) and
    thus affect the abundance of a protein product

57
Transcription regulatory networks
58
Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
59
  • How many transcriptional regulators are out
    there?

60
Fraction of transcriptional regulators in bacteria
61
Figure from Erik van Nimwegen, TIG 2003
62
Complexity of regulation grows with complexity of
organism
  • NRltKoutgtNltKingtnumber of edges
  • NR/N ltKingt/ltKoutgt increases with N
  • ltKingt grows with N
  • In bacteria NRN2 (Stover, et al. 2000)
  • In eucaryots NRN1.3 (van Nimwengen, 2002)
  • Networks in more complex organisms are more
    interconnected then in simpler ones

63
Complexity is manifested in Kin distribution
E. coli vs H. sapiens
64
Table from Erik van Nimwegen, TIG 2003
65
Toolbox model
  • NTFAN2 ? dNTF2ANdN ? dN/dNTF2A/N
  • In small genomes 100 genes per TF. In large ones
    only 4!
  • A toolbox (e.g. metabolic network) grows linearly
    with N. To handle a new condition (NTF?NTF1) one
    needs fewer and fewer new tools.
  • S. Maslov, S. Krishna, K. Sneppen in preparation

66
How is it all connected? (beyond degree
distribution)
67
What is unusual about topology of a given network?
  • Look for a number of occurrences of a certain
    topological pattern
  • Compare with a randomized network
  • What patterns to look for?
  • Number of edges connecting nodes with given
    degrees (degree-degree correlations)
  • Motifs small subgraphs of 3-4 nodes (in
    undirected networks clustering or the triangles)
  • Overrepresentation Nature needs them for some
    function
  • Underrepresentation they are detrimental and
    nature avoids them

68
  • How to construct a proper random network?

69
Randomization of a network
70
Stub reconnection algorithm
  • Break every edge into two halves (stubs)
  • Randomly reconnect stubs
  • Watch for multiple edges!
  • For example, in the AS-Internet two largest hubs
    would end up being connected with 50 edges (sic!)
  • Not adaptable to conserve other low-level
    topological properties of the network

71
Local rewiring algorithm
  • R. Kannan, P. Tetali, and S. Vempala, Random
    Structures and Algorithms (1999)
  • SM, K. Sneppen, Science (2002)
  • Randomly select and rewire two edges
  • Repeat many times

72
Metropolis rewiring algorithm
energy E
energy E?E
SM, K. Sneppen cond-mat preprint
(2002),Physica A (2004)
  • Randomly select two edges
  • Calculate change ?E in energy function
    E(Nactual-Ndesired)2/Ndesired
  • Rewire with probability pexp(-?E/T)

73
  • Degree-degree correlations

74
Central vs peripheral network architecture
random
A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys.
Rev. Lett. 92, 17870, (2004)
75
What is the case for protein interaction network
SM, K. Sneppen, Science 296, 910 (2002)
76
Correlation profile
  • Count N(k0,k1) the number of links between
    nodes with connectivities k0 and k1
  • Compare it to Nr(k0,k1) the same property in a
    random network
  • Qualitative features are very noise-tolerant with
    respect to both false positives and false
    negatives

77
(No Transcript)
78
Correlation profile of the protein interaction
network
R(k0,k1)N(k0,k1)/Nr(k0,k1)
Z(k0,k1) (N(k0,k1)-Nr(k0,k1))/?Nr(k0,k1)
Similar profile is seen in the yeast regulatory
network
79
Hubs may act within a module, or connect modules
  • Party hub
  • simultaneous interactions
  • tends to be within the same module
  • Date hub
  • sequential interactions
  • connect different modules

Han et al, Nature 443, 88 (2004)
80
(No Transcript)
81
Correlation profile of the yeast regulatory
network
R(kout, kin)N(kout, kin)/Nr(kout,kin)
Z(kout,kin)(N(kout,kin)-Nr(kout,kin))/
?Nr(kout,kin)
82
Some scale-free networks may appear similar
In both networks the degree distribution is
scale-free P(k) k-? with ?2.2-2.5
83
But correlation profiles give them unique
identities
Internet
Protein interactions
84
  • Small network motifs(Uri Alon and his group)

85
All 3 node motifs
86
Motifs can overlap in the network
motif to be found
graph
motif matches in the target graph
http//mavisto.ipk-gatersleben.de/frequency_concep
ts.html
87
Detection of important network motifs
  • Technique
  • construct many random graphs with the same number
    of nodes and degree distribution
  • count the number of motifs in those graphs
  • calculate the Z score the probability that the
    same or larger number of motifs in the real world
    network could have occurred in a random one
  • Software available
  • http//www.weizmann.ac.il/mcb/UriAlon/

88
What the Z score means
m mean number of times the motifappeared in
the random graph
the probability observing a Z score of 2 is
0.02275 In the context of motifs Z gt 0, motif
occurs more often than for random graphs Z lt 0,
motif occurs less often than in random
graphs Z gt 1.65, only a 5 chance of random
occurrence
s standard deviation
of times motif appeared in random graph
x - mx
zx

sx
89
Examples of network motifs (3 nodes)
  • Feed forward loop
  • Found in many transcriptional regulatory
    networks

90
Possible functional role of a coherent
feed-forward loop
  • Noise filtering short pulses in input do not
    result in turning on of the Z
  • To function needs time-delay (about 0.5hrs for
    bacterial transcription)

91
All 4 node subgraphs (computational expense
increases with the size of the graph!)
92
Higher-order motifs
  • 4-node motifs contain some 3-node motifs
  • One needs to be careful when calculating
    over-representation
  • Alon co-authors use our Metropolis algorithm to
    generate networks with a given number of
    low-level motifs

93
Table 1 from R Milo, S Shen-Orr, S Itzkovitz, N
Kashtan, D Chklovskii U Alon, Network Motifs
Simple Building Blocks of Complex Networks
Science, 298824-827 (2002)
94
Examples of network motifs (4 nodes)
  • Parallel paths are over represented
  • Neural networks
  • Food webs

95
Finding classes on graphs based on their motif
profiles
96
THE END
Write a Comment
User Comments (0)
About PowerShow.com