Title: Motif Mining from Gene Regulatory Networks
1Motif Mining from Gene Regulatory Networks
- Based on the publications of Uri Alons group
- presented by Pavlos Pavlidis
- Tartu University, December 2005
2Gene Regulatory Networks
- From Wikipedia
- Gene regulatory network is a collection of DNA
segments in a cell which interact with each other
and with other substances in the cell, thereby
governing the rates at which genes in the network
are transcribed into mRNA - From DOE
- Gene regulatory networks (GRNs) are the on-off
switches and rheostatsdynamically orchestrate
the level of expression for each gene.
3Why networks can regulate Gene Expression?
- U. Alon and his group, stresses the importance of
the building blocks of the network. - These building blocks are called motifs
4Motifs
- They are called also n-node subgraphs in a
directed graph - (The work has also been extended for undirected
graphs) - They are characterized from the number n of the
nodes and the relations between them directed
edges
5The 13 different 3-node subgraphs
6Feed Forward Loop
It regulates rapidly the production of Z
7In what motifs they are interested
- Not in biologically significant
- They dont know a priori if a motif is
biologically significant - They can calculate statistical significance
- The probability that a randomized network
contains the same number or more instances of a
particular motif must be smaller than P. Here P
is 0.01.
8Randomized Network
- A randomized network is not completely
randomized.It has some properties - The same number of nodes as in the real network
- For each node the number of the incoming and
outgoing edges equals to the real network.
9Representation of the network as a matrix
M Randomization Select randomly two cells which
are 1 e.g A(1,3), B(2,1). If A(1, 1) and B(2,
3) are 0 then swap Goal The randomized network
must have the same sum in columns and in rows
Columns The number of outgoing edges Rows The
number of incoming edges
10One more requirement If we are looking for
n-node subgraphs, then the number of n-1 node
subgraphs must be the same in real and randomized
networks This is done to avoid assigning high
significance to a structure only because of the
fact that it includes a highly significant
substructure.
11Significance of a motif
- Three requirements
- P lt 0.01
- P was estimated (or bounded) by using 1000
randomized networks. - The number of times it appears in the real
network with distinct sets of nodes is at least U
4. - The number of appearances in the real network is
significantly larger than in the randomized
networks Nreal Nrand gt 0.1Nrand (Why??).
12(No Transcript)
13What did they find
- That in biological systems as in E.coli or in
S.cerevisiae only some certain types of motifs
are statistically important. - When they studied other systems such asFood
webs. The database of seven ecosystem food
websNeuronal networks the neural system of
C.elegans - WWW
- OTHER KIND OF MOTIFS WHERE STATISTICALLY
IMPORTANT
14FFL SIM DOR
15FFL
- Biological Example
- the L-arabinose utilization system
- Crp is the general transcription factor and AraC
the specific transcription factor.
16The real model
17FFL
- Coherent
- Incoherent
- Important for the speed of response
18Software
mDraw             Network visualization
tool (mfinder and network motifs visualization
tool embedded)
19(No Transcript)
20(No Transcript)
21(No Transcript)