Title: A1261774807KcPUG
1 Gene duplication models and reconstruction of
gene regulatory network evolution from network
structure Juris Viksna, David
Gilbert Riga, IMCS, 10.02.2006
2Gene regulatory networks
Yeast network
J.Rung,T.Schlitt,A.Brazma,K.Freivalds,J.Vilo Bio
informatics 18 S2 (ECCB), 202-210
3Gene regulatory networks
- Directed graph
- Graph vertices correspond to
- genes
- An edge from gene A to B
- means that gene B is (directly)
- regulated by gene A
4Properties of gene networks (1)
- Believed to be scale-free (vertex degrees
- satisfy so-called power law)
- N(k) number of vertices with degree k
- N(k) ? k??
5Properties of gene networks (1)
F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687
N(k) ? k??
6Properties of gene networks (2)
- Believed to have a noticeable modularity
- i - vertex
- ki - number of neighbours for vertex i
- ki - number of direct links between these
- ki neighbours
- Clustering coefficient (for vertex i)
- Ci 2ni/ki(ki?1)
7Properties of gene networks (2)
Clustering coefficient (for vertex i) Ci
2ni/ki(ki?1)
E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Sc
ience 297, 1551-1555
8Network evolution models (1)
- networks expand continuously by the addition of
new vertices, - (ii) new vertices attach preferentially to sites
that are already well connected. - A model based on these two ingredients reproduces
the observed stationary scale-free distributions.
A.Barabasi, R.Albert Science 286, 509-512
9Network evolution models (2)
"Hierarchical" model
Sample hierarchical networks (scale-free and
modular)
E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Sc
ience 297, 1551-1555
10Network evolution models (3)
"Duplication" model
Scale-free with b lt 2 for ½ lt p lt
1 F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687
11Network evolution models (4)
12Network evolution models (M1)
M1
13M1, p 0.1, 5000 vertices
? ? 4.5
14M1, p 0.01, 5000 vertices
? ? 3
15M1, p0.05, d0.2, 5000 vertices
16M1, p0.05, d0.2, 5000 vertices
? ? 2.5
17Network evolution models (M1)
M1
V E 20 40 50 200 100 700 500 15000 1000 50000
5000 800000
18Network evolution models (M2)
A
A
genome evolution
X'
X
X'
X
19Network evolution models (M2)
A
X'
X
A
genome evolution
or
X'
X
A
X'
X
20Network evolution models (M2)
M2
21M2, p 0.1, 20000 vertices
22M2, p 0.1, 20000 vertices
? ? 1
23Network evolution models (M2)
M2
V E 20 40 50 80 100 150 500 700 1000 1500 5000
7000
24Evolution graphs
k2 vertices two types of edges - for
swappable events (black) - for dependent events
(grey)
25Evolution graphs
26Evolution graphs
Initial graph G
Numbered vertices correspond to evolution steps
and are marked by the vertices duplicated in
the corresponding steps
Intermediate graphs between G and G' correspond
to cuts of evolution graph (G and G' can also
be obtained in this way)
Graph G' obtained from G after k (in this example
k6) evolution steps
27Evolution graphs some questions
Equivalence Decide whether 2 given evolution
graphs are equivalent Irreducible networks
networks that cant be obtained from simpler
networks by evolution graph Uniqueness of
evolution Is it possible that D(G1,E1) D(G2,E2)
for two different irreducible networks G1 and G2?
28"Reverse engineering" problems
E
Reconstruct
Given
G'
G
29"Reverse engineering" problem (1)
(Assuming either model M1 or M2.)
Reconstruction of evolution graph For a given
network N find an irreducible network N, the
sequence of duplication events D1,...,Dm and the
corresponding evolution tree, such that
ND(N,E).
30"Reverse engineering" problem (2)
(Assuming either model M1 or M2.)
Reconstruction of duplication event For a given
network N find a network N and a duplication
event D, such that ND(N).
31"Reverse engineering" problem (3)
(Assuming either model M1 or M2.)
Reconstruction of the largest duplication
event For a given network N find a network N
with the smallest possible number of genes and a
duplication event D, such that ND(N).
32"Reverse engineering" - complexity
For a given network N find a network N with the
smallest possible number of genes and a
duplication event D, such that ND(N).
- at least as hard as graph isomorphism problem
- likely NP-hard (maximum clique for
reconstruction - graphs)
- reconstruction graphs are much smaller than
- networks
- still might be practically solvable for random
graphs - of reasonable size (few tens of thousands of
vertices).
33Algorithm stage 1
Partition G' vertices into orbits
Can be done e.g. with nauty package One can
try to use some property p which is more simple
to compute than automorphisms and is such that
p(G1)p(G2) for isomorphic graphs G1 and G2.
34Reconstruction graphs
Vertices correspond to non-singleton orbits Two
types of edges - (1) have to participate in
the same duplication event (solid) - (2) can
not participate in the same duplication event
(dotted)
35Algorithm stage 2
Find reconstruction graph
36Algorithm stage 3
Find the largest independent set (according to
type 2 edges) in reconstruction graph
37Algorithm stage 4
- if all selected orbits contain just 2 nodes, we
are practically done - otherwise we have to find
a pair of (largest) sets of vertices from
selected orbits, which correspond to duplication
event currently exhaustive search
38Algorithm
Evolution graph can be reconstructed by repeated
use of Largest duplication event
39Algorithm - efficiency
- using nauty we can deal with networks with lt
200 genes - for larger graphs one can use
heuristics to compute orbits - vertex/edge
counts at different DFS levels seems to
work quite well - likely to find a large part of
duplication event - for lt200 vertices often
gives the exact result
40Algorithm Model 2
General case check automorphisms for all
k-tuples of vertices A serious problem even for
k2 However, large components are duplicated
not that often Previous algorithm could be used
to find "large" part of duplicated genes Still
an open problem Also, a question about good
heuristics
41Model 2 Component sizes
Model M2 550 vertices 132 duplications
42Model 2 Component sizes
Constructing random network with 20000
genes Component sizes of events 1 177008 2
342 3 97 4 49 5 37 6 18 7 13 8
10 10,11,14 4 9,12,13,15,27 3 16,24 2 17,1
8,21,22,31,27 1
43Experiments with yeast network
6270 genes 106 regulators
44Experiments with yeast network
p0.0001 E106 V216
45Experiments with yeast network
277 pairs of duplication candidates were
discovered Few "real" COS5 and COS8, YLR460C
and YNL134L All 5962 genes were compared
all-v-all using SW Normalized compression
score ssearch_score(P1,P2)/minlength(P1),lengt
h(P2) Scores for the found duplication pairs
were compared with average values
46Experiments with yeast network
Observed distances vs average, all non-adjacent
gene pairs