1 Scalefree networks: mathematical properties - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

1 Scalefree networks: mathematical properties

Description:

Random graphs: classical field in graph theory. Well studied analytically and ... ligands (proline-rich extensions), and receptors (GPCR/RHODOPSIN) increases. ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 28
Provided by: volkhar
Category:

less

Transcript and Presenter's Notes

Title: 1 Scalefree networks: mathematical properties


1
1 Scale-free networks mathematical properties
Random graphs classical field in graph theory.
Well studied analytically and numerically. Scale-
free networks quite new. Properties were mostly
studied numerically and heuristically
(sofar). Nice review (suggested reading of
today) Graph Theory Approaches to Protein
Interaction Data Analysis, Nataa Prulj.

2
Erdös-Renyi model
n nodes (vertices) joined by edges that have been
chosen and placed between pairs of nodes
uniformly at random. Gn,p each possible edge
in the graph on n nodes is present with
probability p and absent with probability 1
p. Average number of edges in Gn,p Each edge
connects two vertices ? average degree of a
vertex

3
Erdös-Renyi model components
Erdös and Renyi studied how the expected topology
of a random graph with n nodes changes as a
function of the number of edges m. When m is
small, the graph is likely fragmented into many
small connected components having vertex sets of
size at most O(log n). As m increases the
components grow at first by linking to isolated
nodes, and later by fusing with other
components. A transition happens at m n/2,
when many clusters cross-link spontaneously to
form a unique largest component called the giant
component. Its vertex set size is much larger
than the vertex set sizes of any other
components. It contains O(n) nodes, while the
second largest component contains O(log n)
nodes. In statistical physics, this phenomenon
is called percolation.

4
Erdös-Renyi model shortest path length
  • The shortest path length between any pairs of
    nodes in the giant component grows like log n.
  • Therefore, these graphs are called small
    worlds.
  • The properties of random graphs have been studied
    very extensively.
  • Literature B. Bollobas. Random Graphs. Academic,
    London, 1985.
  • However, random graphs are no adequate models for
    real-world networks because
  • real networks appear to have a power-law degree
    distribution,
  • (while random graphs have Poisson distribution)
    and
  • (2) real networks have strong clustering while
    the clustering coefficient of a random graph is C
    p, independent of whether two vertices have a
    common neighbor.

5
Generalized Random Graphs
  • Aim allow a power-law degree distribution in a
    graph while leaving all other aspects as in the
    random graph model.
  • Given a degree sequence (e.g. power-law
    distribution) one can generate a random graph by
    assigning to a vertex i a degree ki from the
    given degree sequence. Then choose pairs of
    vertices uniformly at random to make edges so
    that the assigned degrees remain preserved.
  • When all degrees have been used up to make edges,
    the resulting graph is a random member of the set
    of graphs with the desired degree distribution.
  • Problem method does not allow to specify
    clustering coefficient.
  • On the other hand, this property makes it
    possible to exactly determine many properties of
    these graphs in the limit of large n.
  • E.g. almost all random graphs with a fixed degree
    distribution and no nodes of degree smaller than
    2 have a unique giant component.

6
Barabasi scale-free model
Input (n0, m, t) where n0 is the initial number
of vertices, m (m? n0) is the number of added
edges every time one new vertex is added to the
graph, and t is the number of iterations. Algorit
hm a) Start with n0 isolated nodes. b) Every
time we add one new node v, m edges will be
linked to the existing nodes from v with a
preferential attachment probability where ki
is the number of links at the i-th
node. Eventually, the graph has (n0 t) nodes
and (mt) edges. Problem of pure mathematicians
with this algorithm how to start from n0 0?

7
Properties of Barabasi-Albert scale-free model
P(k) ? k-? with ? 3. Real networks often show
? ? 2.1 2.4 Observation if either growth or
preferential attachment is eliminated, the
resulting network does not exhibit scale-free
properties. The average path length in the
BA-model is proportional to ln n/ln ln n which is
shorter than in random graphs ? scale-free
networks are ultrasmall worlds. Observation
non-trivial correlations clustering between the
degrees of connected nodes. Numerical result for
AB-model C ? n-0.75. No analytical predictions of
C sofar.

8
Properties of scale-free models
Scale-free networks are resistant to random
failures (robustness) because a few high-degree
hubs dominate their topology a deliberate node
that fails probably has a small degree, and thus
not severly affects the rest of the
network. However, scale-free networks are quite
vulnerable to attacks on the hubs. See example
of last lecture about lethality of gene
deletions in yeast. These properties have been
confirmed numerically and analytically by
studying the average path length and the size of
the giant component.

9
Properties of Barabasi-Albert scale-free model
  • BA-model is a minimal model that captures the
    mechanisms responsible for the power-law degree
    distribution observed in real networks.
  • A discrepany is the fixed exponent of the
    predicted power-law distribution (? 3).
  • Does the BA-model describe the true biological
    evolution of networks?
  • Recent efforts
  • study variants with cleaner mathematical
    properties (Bollobas, LCD-model)
  • include effects of adding or re-wiring edges,
  • allow nodes to age so that they can no longer
    accept new edges
  • or vary forms of preferential attachment.
  • These models also predict exponential and
    truncated power-law degree distribution in some
    parameter regimes.

10
2 Scale-free behavior in protein domain networks
  • Domains are fundamental units of protein
    structure.
  • Most proteins only contain one single domain.
  • Some sequences appear as multidomain proteins. On
    average, they have 2-3 domains, but can have up
    to 130 domains!
  • Most new sequences show homologies to parts of
    known protein sequences
  • most proteins may have descended from relatively
    few ancestral types.
  • Sequence of large proteins often seem to have
    evolved by joining preexisting domains in new
    combinations, domain shuffling
  • domain duplication or domain insertion.

Wuchty Mol. Biol. Evol. 18, 1694 (2001)
11
Protein domain database SMART
http//smart.embl-heidelberg.de/ contains 153
signalling domains 176 nuclear domains, e.g. HLH
domains 225 extracellular domains 115 other
domains

Wuchty Mol. Biol. Evol. 18, 1694 (2001)
12
Protein Domain databases
Prosite (http//expasy.proteome.org.au/prosite/)
contains 1400 biologically significant motifs and
profiles. Pfam (http//www.sanger.ac.uk/Software/
Pfam/index.shtml) collection of
multiple-sequence alignments of protein families
and profile HMMs. Curated documentation on 2500
families. ProDom (http//www.toulouse.inra.fr/pro
dom.html) contains all 160.000 protein domain
families that can be automatically generated from
SwissProt and TrEMBL databases. Here, only
consider families with ?10 members ? 6000 ProDom
families. InterPro Proteome Analysis of 41
nonredundant proteomes of genomes of archaea,
bacteria, and eukaryotes (http//www.ebi.ac.uk/pro
teome) yields domains which appear along with
other domains in a protein sequence ? vertices
links.

Wuchty Mol. Biol. Evol. 18, 1694 (2001)
13
Protein Domain databases
Prosite (http//expasy.proteome.org.au/prosite/)
contains 1400 biologically significant motifs and
profiles.

P(number of links to other domains)
Wuchty Mol. Biol. Evol. 18, 1694 (2001)
number of links to other domains
14
Which are highly connected domains?
The majority of highly connected InterPro domains
appear in signalling pathways. List of the 10
best linked domains in various species.

Number of links increases. Number of signalling
domains (PH, SH3), their ligands (proline-rich
extensions), and receptors (GPCR/RHODOPSIN)
increases.
? evolutionary trend toward compartementalization
of the cell and multicellularity demands a higher
degree of organization.
Wuchty Mol. Biol. Evol. 18, 1694 (2001)
15
Evolutionary Aspects
  • BA-model of scale-free networks is constructed by
    preferential attachment of newly added vertices
    to already well connected ones.
  • Fell and Wagner (2000) argued that vertices with
    many connections in metabolic network were
    metabolites originating very early in the course
    of evolution where they shaped a core metabolism.
  • Analogously, highly connected domains could have
    also originated very early.
  • Is this true?

No. Majority of highly connected domains in
Methanococcus and in E.coli are concerned with
maintanced of metabolism. None of the highly
connected domains of higher organisms is found
here. On the other hand, helicase C has roughly
similar degrees of connection in all organisms.
Wuchty Mol. Biol. Evol. 18, 1694 (2001)
16
Conclusions
  • Expansion of protein families in multcellular
    vertebrates coincides with higher connectivity of
    the respective domains.
  • Extensive shuffling of domains to increase
    combinatorial diversity might provide protein
    sets which are sufficient to preserve cellular
    procedures without dramatically expanding the
    absolute size of the protein complement.
  • greater proteome complexity of higher eukaryotes
    is not simply a consequence of the genome size,
    but must also be a consequence of innovations in
    domain arrangements.
  • highly linked domains represent functional
    centers in various different cellular aspects.
  • They could be treated as evolutionary hubs
    which help to organize the domain space.

Wuchty Mol. Biol. Evol. 18, 1694 (2001)
17
3 How did complexity evolve II?
At the molecular level, biological complexity
involves networks of ligand-protein,
protein-protein, and protein-nucleic acid
interactions in metabolism, signal transduction,
gene regulation, protein synthesis etc. As
organismal complexity increases, more control is
required for the positive and negative regulation
of genes. ? Complexity correlates with an
increase in both the ratio and absolute number of
transcription factors.

Amoutzias et al. EMBO reports 5, 1 (2004)
18
Evolution of complex genetic networks
Duplication of genes is predominant factor for
the generation of new members of a protein
family. Duplicated gene creates redundancy if
multiple proteins have the same or overlapping
function. Alternatively, due to reduced
selective pressure, one of the gene copies can
become nonfunctional or acquire new function.

What is more important? Duplication of single
genes or duplication of large gene clusters
building blocks ?
Here Empirical evidence for scale-free protein
networks emerging through single-gene duplication.
Amoutzias et al. EMBO reports 5, 1 (2004)
19
bHLH protein family
bHLH protein family ancient class of eukaryotic
transcription factors found in fungi, plants,
and animals. bHLHs may form homo- and
heterodimers. They form complex protein-protein
interaction network. Very conserved 60-residue
basic region helix loop helix motif bHLHs
dimerize into 4-helix bundle and recognize DNA
with basic regions. Additional regions
responsible for activation or repression of
target gene activity.

http//www.biochem.ucl.ac.uk/bsm/pdbsum/
20
Two possible patterns of network evolution
A Evolution of a heterodimerization network by
single-gene duplication.

B Evolution of a heterodimerization network by
large-scale gene duplication.
Amoutzias et al. EMBO reports 5, 1 (2004)
21
bHLH heterodimerization network
A phylogenetic analysis of human bHLH proteins B
domain architecture

Amoutzias et al. EMBO reports 5, 1 (2004)
22
Topology of bHLH heterodimerization network
Topology based on protein-protein interaction
data (from literature).

Networks with hubs E2A, Arnt, Max. E2A and Arnt
sub-networks are connected. Max is distinct.
Hubs are shown as circles.
Amoutzias et al. EMBO reports 5, 1 (2004)
23
Topology of bHLH heterodimerization network
P(k) follows a scale-free behavior! Relative
connectivity of hubs is higher (? 1) than
reported for most other networks.

This high connectivity appears to result from
gene duplication generating new, peripheral
proteins that interact prefe-rentially with the
hub.
Amoutzias et al. EMBO reports 5, 1 (2004)
24
Topology of bHLH heterodimerization network
The hub proteins are usually widely expressed in
different tissues and organs. They
heterodimerize with peripheral proteins with more
limited expression pattern ? specific effects.

Topology of Max network parallels that of
E2A/arnt superfamily 2 hubs connected by Mad
family (repressor) vs. 2 families connected by
HES family (repressors).
Hubs are shown as circles.
Amoutzias et al. EMBO reports 5, 1 (2004)
25
Phylogenetic relationships
  • Parallelism between E2A/arnt and Max families
    also reflected in phylogetic relationship
  • in Max (and E2A/arnt) network, the 2 hubs
    (families) are not clustered together
  • the bridge linking the 2 hubs is a family of
    repressor proteins that are phylogenetically
    quite distant from the hubs.
  • ? 2 bHLH networks show evolutionary convergence.

Amoutzias et al. EMBO reports 5, 1 (2004)
26
Conclusions
  • For the evolution of networks based on one kind
    of binding domain,
  • a model of single-gene duplication, followed by
    domain rearrangements, point mutations and
    ongoing gene duplication is sufficient to
    generate quite complex interaction patterns,
    which mediate activation and repression.
  • Seems first example where hub-based network with
    scale-free properties is based on real-data
    phylogenies.
  • (2) Compelling symmetry between the 2 networks
    E2A/arnt and Max.

Amoutzias et al. EMBO reports 5, 1 (2004)
27
Watts-Strogartz model
Input (n, k, p) where n is the number of
vertices, k is the distance in which each vertex
is connected initially to its neighbors by
undirected edges, and p(0? p ? 1) is the
probability of rewiring each edge. Algorithm a)
start with a ring lattice with n noces, each has
the kth nearest neighbors Thus, the degree of
each vertex is 2k and the ring has (nk) edges. b)
Replace original edges by random ones based on
the probability p.
Write a Comment
User Comments (0)
About PowerShow.com