Random sequencematching model for emergent generegulatory networks - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Random sequencematching model for emergent generegulatory networks

Description:

Duygu Balcan (IT ) Muhittin Mungan (B ) Alkan Kabak ioglu (Padova) Ayse H. Bilge (IT ) ... sequence matching model for gene regulatory networks. simulations and ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 53
Provided by: aysee4
Category:

less

Transcript and Presenter's Notes

Title: Random sequencematching model for emergent generegulatory networks


1
Random sequence-matching model for emergent
gene-regulatory networks
  • Ayse Erzan
  • Istanbul Technical University, Gürsey Institute,
  • Collegium Budapest
  • Duygu Balcan (ITÜ) Muhittin Mungan (BÜ)
  • Alkan Kabakçioglu (Padova) Ayse H. Bilge (ITÜ)
  • Yasemin Sengün (ITÜ)

2
outline
  • Random and real networks
  • central dogma of gene regulation
  • RNA interference and more
  • sequence matching model for gene regulatory
    networks
  • simulations and analytical results
  • comparison with experiments
  • outcomes of similar models

3
classical random networks
  • Erdös and Renyi
  • (Publ. Math.Inst. Hung. Acad. Sci. 5, 17 (1960)
  • N vertices
  • N(N-1)/2 possible connections
  • with probability p
  • degree distribution
  • Poissonian for large N
  • P(k) e z zk / k!
  • z ltkgt pN zc1
  • Average minimal path length
  • lER ln N / ln z (1ln p/ln N)-1
  • Clustering coefficient
  • CER z/N p

4
Random Networks
Probability of a connection between any two nodes
same, p N nodes has an average number Np of
connections Small world property Distances
between nodes grow very weakly with N
Most highly connected ? nodes Directly reach 25
of the rest ?
5
naturally occuring networks
  • Social and economic networks
  • Citation and collaborative networks
  • Technological networks
  • www, communications networks
  • Biological networks
  • Neural networks Food networks
  • Co-evolutionary networks Genomic networks

R.Albert and A.-L. Barabasi, Rev.Mod. Phys. 74,
47 (2002) S.N. Dorogovtsev and J.F.F. Mendes,
Adv. Phys 51, 1079 (2002)
6
Real Networks
  • considerable number of very highly connected
    nodes
  • ? Their first neighbors 60 of the total
  • ? most frequent are nodes with very few
    connections (1)

Small world!
7
small world / scale free networks
  • High clustering coefficient
  • ltCi gt ? 2 Ei / ki (ki-1)?
  • gt CER z /N
  • Short average minimum path length ltlmingt
  • (comparable to ER nw
  • for same C and N, differs from regular lattices)
  • Scale free degree distribution
  • P(k) k - ? , cutoff kc
  • a realisation
  • Barabasi-Albert model of
  • preferential attachment
  • growing network with probability of attachment of
    new edge to vertex i is ki
  • P(k) k 3
  • (exact)
  • (Models with preferential attachment ? ? 2)

8
Genotype ? Phenotype Genomic networks a
network of interactions control and modulate
genetic expression - the output (expressed
proteins) capable of great variability - a
dynamical system (e.g., Wagner, PNAS 1994) si
(t1) sign ?j wij sj (t) hi w may be
sparse, but incorporating correlations of all
orders scale free degree distribution ?
9
Genotype ? Phenotype
Nucleic acids 4-letter alphabet Can
replicate DNA Adenine, Guanine , Thymine,
Cytosine Watson-Crick Base Pairing A-T
C-G RNA - Adenine, Guanine , Uracil, Cytosine
A-U C-G
  • Amino acids
  • 20 word dictionary
  • ? proteins
  • Coded for by the nucleic acids
  • 3 nucleic acids 1 anticodon
    on the mRNA
  • 1 amino acid
  • not all combinations correspond to amino acids
  • some combinations are degenerate

10
gene regulation networks - transcription
regulatory networks the central dogma
DNA
promoter1 gene1 promoter2 gene2 promoter3
gene3
transcription RNA mRNA chain amino
acid Transcription
Factors translation Proteins (structural
and regulatory)
Ribosome tRNArRNA
Adapted from Alvis Brazma, www.ebi.ac.uk/microarr
ay/research/networks/genetics
11
(from S. Maslov)
Data from Regulon Database 606 interactions 424
operons Out degree 1ltkout lt 85 broader In degree
1ltkinlt6
12
(from S. Maslov)
data obtained from literature search 1449
regulations 689 proteins kout lt 96 kin lt 40
n(kout ) ? kout 2.5
? 2.5
13
  • A. Wagner, Mol. Bio. Evol. 18, 1283 (2001).
  • Duplication and divergence of genes - interaction
  • between their regulatory proteins

14
Transcription Regulatory genomic and Protein
Interaction Networks (interactions between
regulatory
proteins)
  • for a review of properties see
  • R.V. Sole and R. Pastor Satorras, in Handbook of
    Graphs and Networks (Bornholdt and Schuster eds.,
    Wiley-VCH, Berlin 2002)
  • Previous wisdom
  • out degree distribution scale free with ? 2.5
    !!?
  • A. Wagner, Mol. Bio. Evol. 18, 1283 (2001)
    Jeong, Mason, Barabasi, Oltvai, Nature 411, 41
    (2001) Maslov and Sneppen, Science 296, 910
    (2002)
  • narrower in-degree distribution than out-degree
    distribution ?
  • small world with non-classical clustering ?

15
RNA interference
  • New!

16
New paradigm? Post Transcriptional Gene
Suppression (PTGS)
17
RNA can bind directly on similar DNA
sequences and silence genes at the
transcriptional stage
18
  • Watson-Crick base pairing between nucleic acids
  • DNA Adenine, Guanine , Thymine, Cytosine A-T
    C-G
  • RNA - Adenine, Guanine , Uracil, Cytosine
    A-U C-G
  • stabilisation, replication and transcription of
    DNA
  • RNA interference (siRNA binding to mRNA or chr.
    DNA)
  • binding of regulatory proteins on to mRNA

Basic mechanism of (lock-and-key combinations)
sequence matching
  • D. Balcan , AE, Eur. Phys. J. B 38, 253 (2004)
  • M, Mungan, A. Kabakcioglu, D. Balcan, AE ,
    q-bio.MN/0406049

19
  • three- dimensional architecture (secondary
    structure)
  • also sequence dependent
  • -amino-acid recognition by tRNA
  • -amino-acid binding by rRNA in Ribosome
  • -binding of transcription factors to
    promoter regions
  • Greater generality for modeling genomic
    interactions?
  • Stay tuned!

20
emergent gene expression networks?
21
sequence matching ? gene regulation ?
Model connectivity matrix of genomic network
1 iff the string Gi is embedded inside the
string Gj wij (Gi ? Gj ) li ?
lj 0 otherwise.

1101
2011000101201000110211
1101
1101
201010
112
2
interference (suppression)
1101
directed
kin1
kout 2
22
connected network
Transitivity if wij wjk 1 then wik 1 gt
preferential Clustering
linking Congruenc
e if li lj then wij wji kin (i) ?j w
ji kout (i) ?j w ij

23
simulationsclustering coefficient
  • Ci 2E(i)/ k(i) k(i)-1
  • number of edges connecting nn /total number of
    possible connections
  • For incoming or outgoing bonds to the site i
  • ltCoutgt 0.034
  • ltCingt 0.648
  • ltCgt 0.534 lt z gt / lt s gt

non-classical bhvr
24
giant cluster breaks up for p lt pc(L) ( L p
frequency of stop-start signs)
N (number of genes) too small, genes too long
percolation threshold pc
exponent -3/4 (preliminary)
25
extremely small world networks!
  • cluster radius average minimum path length
  • directed edges (in or out) lmin1
    (transitivity!)
  • undirected edges
  • lmin 1 lmax ? 4 11111 1 001101
    0 00000
  • ltlmin gt depends very weakly on p for fixed L
  • pc lt p lt ½ most genes of length
    unity
  • lmin undefined for p ? pc (L)
  • L 15000 ltlmin gt 1.66 ltlmin gt ? 1.87
    as p? pc

26
simulations network robust under random
mutations
  • random point mutations
  • x? (0 , 1) x? mod 2 (x?
    1)
  • x? 2 x? ? 1 x?
  • random walk steps taken by
  • STOP and GO signs
  • long range modifications due to
    change in reading frame

27
Degree distribution preliminary simulation
results
peaks geometrically spaced for kout small
(log-periodic) periodic for kout large last
peak - the size distribution of the giant
cluster (single bit genes connect to
almost all others)
28
distribution of out-going bonds at peak maxima
- a single sample
nm(kout) kout -? ? 0.45 ? 0.06 averaging
over 500 runs L15 000 p0.05
29
n kout - ? ? 0.9
30
nm kout - ? Maxima of the peaks ?? 0.9 small
k ? ?0.4 large k
0
no double scaling for p0.05 ? ?0.45
31
n(k) ? k -? ? ? (1.1 , 1.8)
32
Simulation results Crossover in the scaling
behaviour of the degree distribution
__ analytical simulation
dc
33
Analytical calculations
1. The matching probabilities Probability of a
given string of length l to be reproduced in a
randomly chosen string of length k for an
alphabet of r letters, p (l, k) 1- (1- r -l )
k - l1 ? r l ( k-l1) for l large
neglecting correlations between overlaps r l
number of l strings with r letters ( k-l1)
number of shifted l- substrings in a k string
(1 for k l ) very good approximation
for r l ( k-l ) ?lt 1
34
Computing the matching probabilities strings x
and y of length k ? l ya,l substring of y, of
length l that has been shifted by a U(x, yal)
Hamming dist. bet. x and ya,l (U 0, match, U
? 0, nomatch) 1- fa (x,y ? ) 1- exp - ?
U(x, yal) ? 0 or 1 for ? ? ? (counts
nomatches) p (l, k x, ? ) 1- ( number of
nomatches / r k ) summed over y p (l, k x, ?
) 1- r - k ? ? 1- fa (x,y ? )
all nomatch for any shift a y a lt k-l
Cluster expansion. Do x averages 2-pt
averages over the f factorise approximate all
higher orders with factorised ones for k ? l
get p (l, k) 1- (1- z l ) k - l1
z 1(r-1)-? / r ? z l (
k-l1) for l large
35
matching probability for r 2 p( l, k) 1- (
1- 2 - l ) k-l1 ? 2 l ( k-l1) for l ? k
0 otherwise
p (l, k)
?exact enumeration __ above expression
l
Curves with embeding string k 16,14,12,10,8,6,4,2
from top to bottom, k ? l
36
2. Understanding the sequence matching
data Matching l with d long genes ? small
degree
37
3. Calculation of the out-degree distribution
number of out-edges from a randomly chosen gene
of length l to genes of length k Xlk ??
Xlk (?) ? different realisations of genes of
length k Xlk (?) independent random var,
binomially distributed p(l,k) Poission
for small p(l,k) large l total number of
out-edges from a randomly chosen gene of length
l Xl ? Xlk Gaussian distributed via the
Central Limit Theorem with mean lt Xlgt
and variance lt Xl 2 gt- lt Xl gt 2 Xl Poisson for
large l
38
mean out-degree for genes of length l for
model with exponential gene length distribution lt
n(k)gt L p 2 q k q 1 - p,
probability of a coding element dl lt Xk gt
? k ? l ltXkl gt ?k ? l p(l, k) ltn(k)gt
Lp (q z) l / (p q z l ) (qz)
l variance of out-degree distribution - length
l ?l 2 lt Xl 2 gt- lt Xl gt 2 dl p (1-z l)
/ 1-q (1-z l ) 2 dl for large l ?
for large l, dl ? ?l 2
Poissonian
39
out-degree distribution for small l (large d)
scaling behaviour of the envelope
hl ? l n ( l ) hl n( l ) / ?l
L p 2 q l / dl ½ dl (qz) l ? h l
(q / z) - ½ l h (d) d - ?
( q z )- ? ( q / z) - ½ gives ? ½
(ln z ln q) / (ln z - ln q) ? ? ½ - p
/ ln r
h
2?
40
out-degree distribution for large l (small d)
P( Xl d ) (dl ) d exp ( - dl ) / d !
Poisson P(d ) ?l n(l ) P (Xl d) Lp ?l
p q l (dl )d exp( - dl ) / d ! ? ?0? dx
x d-? - ½ e-x / d ! for large l P(d) ? ?(d
½ - ?) / ?(d 1) d -? - ½ ? Gamma
funx. where ? ½ (ln z ln q) / (ln z - ln
q) ½ - p/ln r Scaling exponent ?1 ? ? ½
1 - p / ln r
41
out-degree distribution finite size
effects dotted full Gaussian distribution taken
for P (Xl d ) solid lines finite size
correction dlout (?lout )2 , P( Xl d )
Poisson
Thus both for large and small l, P(d ) Lp ?l
p q l (dl )d exp( - dl ) / d ! provides a
good representation
42
Note either for ? ? 0 or for a unique letter
(r1) the outdegree distribution is simply
controlled by the length distribution in which
case we get ?
-1 !!
43
4. Crossover in the scaling behaviour
peaks well seperated for l lt lc 8 dl ( q
/ 2) l ?l ? dl ? 0 slower than
dl crossover occurs where dl dl1 ?
l More precisely (dl dl1 ) / ? 2 ? l
dc ? 6.6 (From requiring that the minimum
between the two Gaussian peaks centered at dl and
dl1 vanish)
44
0
in-degree distribution superposition of
two peaks
0
first peak A (z-z0)2 exp-B (z-z0 )
second peak Gaussian circular very near max
45
5. Simulation and analytical results The
in-degree distribution
Solid line finite size effect taken care of by
inserting dlin (?lin )2
46
The in-degree distribution
The second peak can be obtained accurately
from dlin ? k? l n(l ) p(l,
k) (?lin )2 ? k? l n(l ) p(l, k) 1- p(l,
k) p(d in) ? pq l 2? (?lin )2-½ exp -
(d- din)2/ 2 (?lin )2
47
modelling gene interactionsA. Kabakcioglu, M,
Mungan, D. Balcan, AE, preprint sequence
matching also operates in the case of
transcriptional gene interaction ? claim
secondary structures (conformations) of
transcription factors are determined by their
amino acid sequence, coded for by the
corresponding DNA sequence - the different folds
expose precise regulatory sites, which are
recognized by regulatory sequences on the genome ?
  • !

48
Experimental data from expression of mRNA in DNA
arrayM.Gustafsson, M, Hörnquist, A. Lombardi,
Large-scale reverse Engineering by Lasso,
q-bio.MN/040312. On data from P.T. Spellmann et
al., Mol. Bio. Cell 9, 3273 (1998) from
microarray experiments
  • Yeast data
  • (Saccaromyces cerevisae)

49
Expected model out-degree distribution, with
Gaussian RS length distribution
50
Model with a Gaussian RS length
distribution single realisation, adjustable
parameters ltl gt, ?l and Yeast data

51
Comparison of network of a single realisation of
the model chromosome and yeast microarray
experiment
52
Consensus data (http//cgsigma.cshl/org ) for
length distribution of Regulatory Segments
RS length Gaussian distribution with parameters
fixed by comparison with out-degree of yeast data
53
Single realisation for two independent sets of
Regulatory Sequences associated with each node
of the network Si, Si
Connectivity rule Si ? Sj Note expected
distributions will not change
54
Adnvances in Artificial Life5th Eur. Conf.
(ECAL99), Vol. 1674, LNAI, Springer
promoter seq. of length p 4 2
NL/4 p
Thanks Chrisantha!
55
averaged over 20 genomes - oscillatory behavior
from superposition of Poisson peaks
56
Evolution of gene networks by gene
duplicationWagner, PNAS 91, 4387 (1994),
Vazquez, Flammini, Maritan and Vespignani,
cond-mat/ 0108043, Sole, Pastor-Satorras, Smith
and Kepler, Adv. Comlex Syst. 5, 43 (2002)
  • take random network
  • duplicate gene with connections
  • take out the connections with prob. ? and
    establish new connection to random node with
    probability ?
  • ? scale free proteomic model
  • ? 2.5 , C and minimum path length compares
    well with data

57
Sequence similarity
58
Gaussian network, evolution by duplication of
randomly chosen RSs, mutation (Yasemin Sengün)
59
  • Summary
  • random gene interaction network model with
    sequence
  • matching for
  • - arbitrary alphabet
  • - finite temperature (partial matching)
  • outdegree distribution power law for small d
  • - log-periodic for large
    d
  • exponents ? 1- p / ln r , ?1 0.5 - p / ln r
    universal for small p
  • single realisations compare well with experiment
  • not scale free - crossover behaviour ?
Write a Comment
User Comments (0)
About PowerShow.com