Title: Gene Network Modeling
1(No Transcript)
2Gene Network Modeling
- Prof. Yasser Kadah
- Eng. Fadhl Al-Akwaa
3OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
4From The Last Lecture
- DNA sequence A,T,C,G
- ATCGAATCGA
- Protein sequence except B, J, O, U, X, Z
- KMLSLLMARTYW
5The Central DogmaProtein Synthesis
Cell Function
Transcriptome
Proteome
Genome
6(No Transcript)
7Bioinformatics Important Challenges
Protein Function Protein 3D Structure
Gene Predication
Gene Function
8Public Data Base
Protein sequence KMLSLLMARTYW
DNA sequence A,T,C,G
Microarray
Gene Expression Level
9Gene Expression
9
10Microarray Technology
11Translation Rate
Protein Level
Gene Expression Level
Transcription Rate
12Translation Rate
Protein Level
Gene Expression Level
Transcription Rate
-
13Translation Rate
Protein Level
Gene Expression Level
Transcription Rate
-
?
?
?
?
14OUTLINES
- What is the Gene Regulatory Network?
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Future work
- Reference
15What is Gene Regulatory Network?
(GRN)
Gene A
Gene C
Gene D
Gene B
16GRN An example Fission yeast
Lackner DH ,2007
http//www.sanger.ac.uk/Info/News-releases/2007/07
0413.shtml
17http//en.wikipedia.org/wiki/Metabolic_network_mod
elling
18http//www.enm.bris.ac.uk/anm/summerschools/comple
xity/imagery/191.html
19OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
20Why build a Gene Network? Functional Genomics
- Allow researchers to make predictions about gene
function that can then be tested at the bench. - The Focus is gradually shifting to Functional
Genomics.
21 Application of GRN Translational Genomics
- we can study the effects of a compound (such as a
drug) on the level of expression of many genes. - Translational Genomics
- The mission of the Translational Genomics
is - to translate genomic discoveries into
advances - in human health.
22Application of GRN Understanding Experimental
data
- Biologists are expecting powerful computational
tools to extract functional - information from the Experimental data.
23GRN Model Objective
- Construct a gene network model that
- Describes known genes interactions well
- Predicts interactions not known so far
- Allows for Drug effect simulation
- Understand the otology of the Disease
24OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
25GRN Construction Methodology
- Forward Engineering
- Inverse Engineering Traditional methodology
26Forward Engineering
Hard
27Reverse Engineering
Model Gene Network
very difficultinverse problem
Possible forward problem
Microaary Data
28Reverse Engineering
Boolean networks
easy
Boolean data
easy
29Data Required DNA Microarray
gene 1 gene 2 gene 3
30Data Required Gene Expression Matrix
t1 t2 t3 t4
g1 0 1 2 1
g2 1 2 1 0
g3 0 1 1 1.
g4 1 2 1 0
31Data Required Gene Expression Matrix
t1 t2 t3 t4
g1 0 1 2 1
g2 1 2 1 0
g3 0 1 1 1.
g4 1 2 1 0
a1 a2 a3 a4
g1 0 3 1 1
g2 1 2 1 0
g3 0 1 1 1.
g4 1 2 1 0
Time serious
Snap Shot
32OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN Modeling Steps
- GRN Models
- GRS Software
- Next work
- Reference
33Overview of steps in modeling and control of
Probabilistic Boolean networks Ranadip Pal,2007
Microarray Image
Data Extraction
Discretization
A3
A1
A2
C
Gene Expression Extraction
B
Discretization
Grid Alignment
Segmentation
Hypothesis testing
Upregulated
1
99
t1
0
1.72 2.25 0.94 1.56
t2
-1
Down regulated
t1
t2
Application of Stationary Policy
Design of Optimal Control Policy
Gene Selection
BN generation
(I) Penalty Assignment
Seed Algorithm
Y
PBN steady state matched
(II) Formulation of Optimal Control
Problem
Dynamic Programming
Original Steady State
Optimal Control Policy
F
E
Prior Biological Knowledge
D
Gene Selection
Steady State using Control
Network Generation
G
H
Control of Network
34GRN modeling steppes Discretization
gene 1 gene 2 gene 3
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
35GRN modeling steppes Discretization
t1
36GRN modeling steppes Discretization
on
on
on
on
on
on
on
t1
off
off
off
off
off
off
off
off
37GRN modeling steppes Discretization
gene 1 gene 2 gene 3
t1
t2
t3
38GRN modeling steppes Discretization
gene 1 gene 2 gene 3
on
on
on
on
on
on
on
on
on
t1
on
on
on
t2
off
on
on
t3
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
39GRN modeling steppes Discretization
- we obtain the following discretized gene
expression data
time 0 5 10 15 20 25 30 35 40 45 50 55
gene 1 0 0 0 0 0 0 1 1 1 1 1 1
gene 2 0 0 0 0 0 0 0 1 1 0 0 0
gene 3 1 1 1 1 1 1 1 0 0 0 0 0
- the gene expression data is now in the form of
bit streams
40GRN modeling steppes Discretization
Up-regulated 1
Unchanged 0
Down-regulated -1
assume that genes exist in three states
41GRN modeling steppes Gene SelectionClustring
a1 a2 a3 a4
g1 0 1 2 1
g2 1 2 1 0
g3 0 1 1 1.
g4 1 2 1 0
42Clustering Steps Correlation
- Choose a similarity metric to compare the
transcriptional response or the expression
profiles - Pearson Correlation
- Spearman Correlation
- Euclidean Distance
43Clustering Steps Correlation Algorithm
- Correlation coefficients are values from 1 to 1,
with 1 indicating a similar behavior, 1
indicating an opposite behavior and 0 indicating
no direct relation.
g1 g2 g3 g4 g5
g1 1 0.23 0.00 0.95 -0.63
g2 -1 1 0.91 0.56 0.56
g3 0 0.23 1 0.32 0.77
g4 1 0.5 0.56 1 -0.36
g5 -1 0.91 0.32 0.4 1
44Clustering Steps Clustering Algorithm
- Choose a clustering algorithm
- Hierarchical
- K-means
-
45Hierarchical Clustering
g1 g2 g3 g4 g5
g1 0.23 0.00 0.95 -0.63
g2 0.91 0.56 0.56
g3 0.32 0.77
g4 -0.36
g5
g1 g2 g3 g4 g5
g1 0.23 0.00 0.95 -0.63
g2 0.91 0.56 0.56
g3 0.32 0.77
g4 -0.36
g5
- Find largest value in similarity matrix.
- Recompute matrix and iterate.
46Hierarchical Clustering
g1 , g4 g2 g3 g5
g1 , g4 0.37 0.16 -0.52
g2 0.91 0.56
g3 0.77
g5
g1 , g4 g2 g3 g5
g1 , g4 0.37 0.16 -0.52
g2 0.91 0.56
g3 0.77
g5
- Find largest value is similarity matrix.
- Recompute matrix and iterate.
47Hierarchical Clustering
g1 , g4 g2 , g3 g5
g1 , g4 0.27 -0.52
g2 , g3 0.68
g5
g1 , g4 g2 , g3 g5
g1 , g4 0.27 -0.52
g2 , g3 0.68
g5
- Find largest value is similarity matrix.
- Recompute similarity matrix and iterate.
48Clustering Example
Eisen et al. (1998), PNAS, 95(25) 14863-14868
49GRN Modeling Steppes GRN Generation
t1 t2 t3 t4
g1 0 1 2 1
g2 1 2 1 0
g3 0 1 1 1.
g4 1 2 1 0
Statistical Signal Processing Technique
50OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
51GRN Models
- Directed and undirected graphs
- Bayesian networks
- Boolean networks
- Generalized logical networks
- Non-linear ordinary differential equations
- Piecewise linear differential equations
- Qualitative differential equations
- Partial differential equations
- Stochastic master equations
- Rule based formalisms
52GRN Models
- Hidde de Jong, Modeling and simulation of
genetic regulatory systems a literature review - J Comput Biol. 20029(1)67-103. Review.Â
Node States
Computation
Data
Complexity
Dynamics
53What class of modelsshould be chosen?
- The selection should be made in view of
- data requirements
- goals of modeling and analysis.
54Classical Tradeoff
- A fine model with many parameters
- may be able to capture detailed low-level
phenomena (protein concentrations, reaction
kinetics) - requires very large amounts of data for
inference, lest the model be overfit. - A coarse model with lower complexity
- may succeed in capturing high-level phenomena
(which genes are ON/OFF) - requires smaller amounts of data.
55Occams Razor
56Model Reliability and Adequacy
- P is the set of all possible observations
- S set of all observations made on the study
system - M is the set of all model outputs
- QS ?M
P
S
M
Q
57Model Reliability and Adequacy
S
P
P
M
S
M
Q
Useless Model
Dream Model
58Model Reliability and Adequacy
P
P
M
S
Q
Q
M
S
Complete, but erring model
Incomplete model
Model reliability Q/M Model adequacy Q/S
59GRN Models
- Directed and undirected graphs
- Bayesian networks
- Boolean networks
- Generalized logical networks
- Non-linear ordinary differential equations
- Piecewise linear differential equations
- Qualitative differential equations
- Partial differential equations
- Stochastic master equations
- Rule based formalisms
60 Directed and undirected Graphs
- Probably most straightforward way to model a GRN
- GltV,Egt
- V set of vertices
- Set of edges Elti,jgt where i,j ? V, head and tail
of edge - Additional labels denote positive/negative
influence
61Directed and undirected Graphs
- Advantages
- Intuitive way of visualization
- Common and well explored graph algorithms can
make biologically relevant predictions about
GRSes - paths between genes may reveal missing regulatory
interactions or provide clues about redundancy - cycles in the network point at feedback relations
- connectivity characteristics give indication of
the complexity - loosely connected subgraphs point at functional
modules - Disadvantages
- Time does not play a role
- Too much abstraction very simplified model far
from reality
62GRN Models
- Directed and undirected graphs
- Bayesian networks
- Boolean networks
- Generalized logical networks
- Non-linear ordinary differential equations
- Piecewise linear differential equations
- Qualitative differential equations
- Partial differential equations
- Stochastic master equations
- Rule based formalisms
More popular and efficient
63Boolean Network Model
- A Boolean network is defined by a set of
- nodes, V x1, x2, . . . , xn, and a list
of - Boolean functions, F f1, f2, . . . , fn
- Each xk represents the state (expression) of
- a gene, gk, where xk 1 the gene is
expressed - or xk 0, the gene is not expressed
64Boolean Network
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 1
x3 1 0 1 1 1
GAP
At any given time, combining the gene states
gives a gene activity pattern (GAP).
65Boolean Network
- Given a GAP at time t, a deterministic function
(a set of logical rules) provides the GAP at time
t 1.
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
GAPt1
GAPt
66Boolean Network
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
67Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
68Boolean Network
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
69Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x1
t1
70Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x1
t1
or
71Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x1
t1
72Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x1
t1
73Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x1
t1
or
For each node there will be 22k possible
functions
74Boolean Network Example
t 0 1 2 3 4
x1 1 1 0 1 1
x2 1 0 0 0 0
x3 1 0 1 1 0
x1
x2
x3
t
x2
x1
x3
t1
or
nor
nand
75Boolean Network Example
I. Shmulevich et al., Bioinformatics (2002), 18
(2) 261-274
76Boolean Networks Summary
- Advantages
- Efficient analysis of large RN
- Positive/negative feedback-cycles can be modeled
with BNs - Disadvantages
- Strong simplifying assumptions gene is either
on or off, no in between states - The computation time is very high or often
impractical to construct large-scale gene
networks - Very susceptible to noise
- There are situations where boolean idealisation
is not appropriate more general methods required
77Bayesian Networks
- A gene regulatory network is represented by
directed acyclic graph - Vertices correspond to genes.
- Edges correspond to direct influence or
interaction. - For each gene xi, a conditional distribution
p(xi ancestors(xi) ) is defined. - The graph and the conditional distributions,
uniquely specify the joint probability
distribution.
78Bayesian Network Example
Conditional distributions p(x1), p(x2), p(x3
x2), p(x4 x1,x2), p(x5 x4)
p(X) p(X) p(x1) p(x2) p(x3 x2) p(x4 x1,x2)
p(x5 x4)
79Learning Bayesian Models
- Using gene expression data, the goal is to find
the bayesian network that best matches the data. - Recovering optimal conditional probability
distributions when the graph is known is easy. - Recovering the structure of the graph is NP hard
- (non-deterministic polynomial ).
- But, good statistics are available
- What is the likelihood of a specific assignment?
- What is the distribution of xi given xj?
80Issues with Bayesian Models
- Computationally intensive.
- Requires lots of data.
- Does not allow for feedback loops which are known
to play an important role in biological networks. - Does not make use of the temporal aspect of the
data. - Dynamical Bayesian Networks aim at solving some
of these issues but they require even more data.
81Differential Equations
- Typically uses linear differential equations to
model the gene trajectoriesdxi(t) / dt a0
ai,1 x1(t) ai,2 x2(t) ai,n xn(t) - Several reasons for that choice
- lower number of parameters implies that we are
less likely to over fit the data - sufficient to model complex interactions between
the genes
82Small Network Example
dx1(t) / dt 0.491 - 0.248 x1(t) dx2(t) / dt
-0.473 x3(t) 0.374 x4(t) dx3(t) / dt -0.427
0.376 x1(t) - 0.241 x3(t) dx4(t) / dt 0.435
x1(t) - 0.315 x3(t) - 0.437 x4(t)
83Small Network Example
_
x1
_
_
x2
x3
_
x4
one interaction coefficient
_
dx1(t) / dt 0.491 - 0.248 x1(t) dx2(t) / dt
-0.473 x3(t) 0.374 x4(t) dx3(t) / dt -0.427
0.376 x1(t) - 0.241 x3(t) dx4(t) / dt 0.435
x1(t) - 0.315 x3(t) - 0.437 x4(t)
84Small Network Example
constant coefficients
dx1(t) / dt 0.491 - 0.248 x1(t) dx2(t) / dt
-0.473 x3(t) 0.374 x4(t) dx3(t) / dt -0.427
0.376 x1(t) - 0.241 x3(t) dx4(t) / dt 0.435
x1(t) - 0.315 x3(t) - 0.437 x4(t)
85Problem Revisited
a0,i a1,i a2,i a3,i a4,i
x1 .431 -.248 0 0 0
x2 0 0 0 -.473 .374
x3 -.427 .376 0 -.241 0
x4 0 .435 0 -.315 -.437
Given the time-series data, can we find the
interactions coefficients?
86Issues with Differential Equations
- Even under the simplest linear model, there are
m(m1) unknown parameters to estimate - m(m-1) directional effects
- m self effects
- m constant effects
- Number of data points is mn and we typically have
that n ltlt m (few time-points). - To avoid over fitting, extra constraints must be
incorporated into the model such as - Smoothness of the equations
- Sparseness of the network (few non-null
interaction coefficients)
87OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
88GRN Software
- GNA Genetic Network Analyzer
- Helix Bioinformatics
http//www-helix.inrialpes.fr/article122.html
89GRN Software
- Probabilistic Boolean Networks (PBN)
- Matlab Tool Box
- Ilya Shmulevich
- Institute for Systems Biology
90OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
91Future Work Literature Review
- Study the noisy natural of Microarray Data.
- Study in depth the existing modeling methodology.
- Focus on specialized problem like cancer.
92 Future Work GSP Statistics Books
- Genomics signal processing and statistics,
- Edward,2006
- Introduction to genomics signal processing with
control, Ily,2006 - Computational and Statistical Approaches to
Genomics (Springer, 2006), Ily
93 Future Work Statistics Books
- Handbook of Computational Statistics
- An Introduction to Statistical Signal Processing,
Robert M. Gray,2007 - fundamentals of statistical signal processing
estimation theory, steven kay - nonlinear signal processing a statistical
approach, Gonzalo R,2005 - Inference_in_HMM, Olivier Cappe,2005
94 Future Work Modeling Books
- Modeling and Control of Complex Systems (Control
Engineering) by Petros A. Ioannou, Andreas
Pitsillides,2008Â - MODELING BIOLOGICAL SYSTEMS Principles and
Applications2005 - gene regulation and metabolism postgenomic
computational approaches, Julio, 2000
95Future Work Resources
- IEEE Transactions on Computational Biology and
Bioinformatics - IEEE International Workshop on Genomic Signal
Processing and Statistics - IEEE Journal of Selected Topics in Signal
Processing Special Issue on Genomic and
Proteomic Signal Processing - EURASIP Journal of Bioinformatics and Systems
Biology Special issue of the on Genetic
Regulatory Networks - IEEE Signal Processing Magazine on Signal
Processing Special issue of the Methods in
Genomics and Proteomics - IEEE Transactions on Signal Processing Special
Genomic Signal Processing issue of the - Workshop on Discrete Models for Genetic
Regulatory Networks
96OUTLINES
- What is the Gene Regulatory Network? GRN
- Application of GRN
- GRN Construction Methodology
- GRN modeling steps
- GRN Models
- GRS Software
- Next work
- Reference
97Reference
- Hidde de Jong, Modeling and simulation of genetic
regulatory systems a literature review J Comput
Biol. 20029(1)67-103. Review. - BAYESIAN ROBUSTNESS IN THE CONTROL OF GENE
REGULATORY NETWORKS Ranadip Pal1, Aniruddha
Datta2, Edward R. Dougherty - Anastassiou, D. (2001). Genomic Signal
Processing. IEEE Signal Processing - Dougherty, E. R. and A. Datta (2005). "Genomic
signal processing diagnosis and therapy." Signal
Processing Magazine, IEEE 22(1) 107 - 112. - Vaidyanathan, P. P. (2004). Genomics and
Proteomics A Signal Processorapos's Tour.
Circuits and Systems Magazine, IEEE. 4 1-1.
98Reference
- Vaidyanathan, P. P. and B.-J. Yoon (2004). "The
role of signal-processing concepts in genomics
and proteomics." Journal of the Franklin
Institute.(Special Issue on Genomics).
99(No Transcript)