Title: How ScaleFree are Biological Networks.
1How Scale-Free are Biological Networks.
- Raya Khanin and Ernst Wit
- Department of Statistics
- University of Glasgow
2Biological Networks
- Protein interaction network proteins that are
(or might be) connected by physical interactions
- Metabolic network metabolic products and
substrates that participate in one reaction
- Gene regulatory network two genes are connected
if the expression of one gene modulates
expression of another one by either activation or
inhibition
3Biological networks
- Node represents a gene, protein or metabolite
- Edge represents an association, interaction,
co-expression
- Directed edge stands for the modulation
(regulation) of one node by another e.g. arrow
from gene X to gene Y means gene X affects
expression of gene Y
4Network measures
- Degree (or connectivity)
- of a node, k, is the number
- of links (edges) this node has.
- The degree distribution,
- P(k), is the probability that
- a selected node has exactly
- k links. Networks are classified by
- their degree distributions.
- (Barabasi and Oltvai, Nature, 2004)
5Random Network
- A fixed number of nodes are connected randomly to
each other start with N nodes and connect each
pair with probability p creating a graph with
pN(N-1)/2 randomly placed links. - The degrees follow a Poisson distribution most
nodes have roughly the same number of links,
approximately equal to the networks average
degree, nodes that have significantly more
or less links than are very rare.
6Scale-Free Network
- Scale-free networks have a few nodes with a very
large number of links (hubs) and many nodes with
with only a few links.
- It indicates the absence of a typical node in the
network
- Scale-free networks are characterized by a
power-law distribution
7Comparing Random and Scale-free distribution
- In the random network, the five nodes with the
most links (in red) are connected to only 27 of
all nodes (green). In the scale-free network, the
five most connected nodes (red) are connected to
60 of all nodes (green) (source Nature)
8Examples of scale-free networks
- Network of citations between scientific papers
- Network of collaborations, movie actors
- Electrical power grids, airline traffic routes,
railway networks
- World Wide Web and other communication systems
- Biological Networks
- "These laws, applying equally well to the cell
and the ecosystem, demonstrate how unavoidable
nature's laws are and how deeply
self-organization shapes the world around
us.(A.Barabasi, 2002).
9Google scale-free networks
- Scale-free networks are everywhere. They can be
seen in terrorist networks. What this means for
counter-terrorism experts?
- Global guerrillas (network organization,
infrastructure disruption, and the emerging
marketplace of violence)
- http//globalguerrillas.typepad.com/globalguerrill
as/2004/05/scalefree_terro.html
- Scalefree http//www.scalefree.info (Social
Networking, Communities of Practice and Knowledge
Management publish articles such as love and
knowledge, what lovers tell us about
persuasion, how social contagion affects
consumer behaviour)
10Scale-free Networks
- Properties of scale-free systems are
invariant to changes in scale. The ratio of two
connectivities in a scale-free network is
invariant under rescaling
This implies that scale-free networks are
self-similar, i.e. any part of the network is
statistically similar to the whole network and
parameters are assumed to be independent of the
system size.
11Scale-free Networks
- Scale-free (self-similarity) properties of a
common cauliflower plant it is virtually
impossible to determine whether one is looking at
a photograph of a complete vegetable or its part,
unless an additional scale-dependent object (a
match) is added. (A) Complete vegetable (B) a
small segment of the same vegetable (C) small
part of the segment shown in B. The same match
was used in all three photographs to provide a
sense of scale for an otherwise scale-free
structure (Gomez et al, 2001). - Note vegetable was purchased in Sloan
supermarket in Manhattans Upper West Side.
12Biological networks are reported to be scale-free
- Metabolic networks
- Protein interaction networks
- Protein domain networks
- Gene co-expression networks
- Distribution of gene expression and spot
intensities on microarrays
- Frequency of occurrence of generalized parts in
genomes of different organisms
13Example of gene network
- To find indication of a power-law
- the data is usually graphically fitted
- by a straight line on a log-log scale
- Log contract data
- Fit to a few points is not particularly
good Number of regulated genes
per regulating protein Guelzim et al,
Nature, 2002
14Estimating the power exponent by
maximum-likelihood
- the power-law distribution
- Observed values xi is a connectivity of node i
- the likelihood function for N observed
connectivities
- the log-likelihood is maximized by finding zeros
of its derivative using the Newton-Raphston
method.
15Goodness-of-fit testing
- E(k) - expected(k) with estimated O(k)
observed(k)
- Consider a chi-squared statistic,
- (approximately) under H0 network is scale-free
- Pool for connectivity values over k, for which
the expected number of connections is less than
5. As a result, the chi-squared statistic is
approximately chi-squared distributed with k-2
degrees of freedom, if the data come truly from a
power-law distribution.
16Goodness-of-fit testing
- The p-value for each of the networks can be
calculated by the exceedence probability of a
chi-squared distribution
- t is calculated (observed) values of T with
estimated
- p-value is the probability a network has such
connectivities if they were drawn from the
power-law distribution.
- If prejected.
17Exponents of the power-law and scale-free
p-values
- Datasets
p-value
- Uetz 25 2.05 0.00004
- Schwikowski 26 1.865 0
- Ito 27 2.02 0
- Li 28 2.3 0
- Rain 29 2.1 0
- Giot 21 1.53 0
- Tong 9 1.44 0
- Lee 20 1.99 0
- Guelzim 17 1.49 0
- Spellman/Cho 1.27(1.06) 0
18Truncated power-law
- is the cut-off, s. t. the number of
connections is less than expected for pure
scale-free networks for
- and the behaviour is approximately scale-free
within the range
19Parameters of the truncated power-law and
truncated power-law p-values
- Datasets
p-value
- Uetz 25 1.6
8.67 0.37
- Schwikowski 26 1.26 6.2
0.105
- Ito 27 1.79
26 0
- Li 28 2.1
19.5 0.0178
- Rain 1.12
11.5 0.2
- Giot 21 1.09
20 0.0013
- Tong9 0.96
23.7 0
- Lee 20 1.96
294 0
- Guelzim17 1.18
15 0.00001
- Spellman/Cho 1.07(0.78) 73(99)
0.7(0.1)
-
20Scale-free or notwhy is it important?
- Network architecture
- Evolutionary models
- Scalability (self-similarity) criterion does not
hold one has to exercise caution while applying
the features of an already studied part of the
network to investigate properties of the unknown
part of the same network, or of other networks
that seem similar
21Example of gene network
- Guelzim and co-authors (Nature, 2002)
- conclude that connectivity distributions
- of gene transcriptional network for yeast
- and E-coli are scale-free, and thus
- bacterial and fungal genetic networks are
- free of characteristic scale with respect to
- the distributions of both regulating and
- departing connections.
- Number of regulated genes per
regulating protein
- We found, however, that the scale-free property
does not even hold and therefore such biological
conclusions are invalid
22Qualitative properties of biological networks
- Existence of hubs
- Many nodes with a few connections
- Lethality and centrality (the more essential
gene/protein is the more connections it has)
- Small-world property (short average path)
- High clustering of nodes
23Alternative non-scale free distributions
- generalized truncated power-law
- stretched exponential distribution
- geometric random graph a geometric graph with n
independently and uniformly distributed points in
a metric space