Title: Outline
1- Outline
- Biological motivation
- Introduction to graph models and Bayesian network
- Case study
- Module networks identifying regulatory modules
and their condition-specific regulators from gene
expression data Segal, Shapira, Regev, Peer,
Botstein, Koller, Friedman. Nature Genetics. 2003 - Large-scale mapping and validation of E. Coli
transcriptiional regulation from a compendium of
expression profiles. PLoS Biology. 2007.
21. Biological motivation
- cis-regulatory motif A short (6-to-12-ish)
series of DNA bases that can bind to an
activator or repressor protein. Illustrated
at right as activator/repressor binding sites.
31. Biological motivation
- Module set of genes that participate in a
coherent biological process - Module group set of modules that all share at
least one cis-regulatory motif - regulator a gene that encodes a protein whose
concentration regulates the expression of other
genes - expression profile concentrations of various
genes in given bio-experimental circumstances
41. Biological motivation
52. Introduction to Bayesian Network
"Graphical models are a marriage between
probability theory and graph theory. They provide
a natural tool for dealing with two problems that
occur throughout applied mathematics and
engineering -- uncertainty and complexity.
..Fundamental to the idea of a graphical model
is the notion of modularity -- a complex system
is built by combining simpler parts. Probability
theory provides the glue whereby the parts are
combined, ensuring that the system as a whole is
consistent. Many of the classical multivariate
probabalistic systems are special cases of the
general graphical model formalism -- examples
include mixture models, factor analysis, hidden
Markov models, Kalman filters and Ising
models...the graphical model formalism provides
a natural framework for the design of new
systems." --- Michael Jordan, 1998.
62. Introduction to Bayesian Network
P(C)0.5, P(-C)0.5
Cloudy
Sprinkler
Rain
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
72. Introduction to Bayesian Network
- Graphs in which nodes represent random variables
(cloudy? sprinkler? rain? wet grass) - Arrows represent conditional independence
assumptions. (e.g. P(WS,R,C)P(WS,R)) - Present absent arrows provide compact
representation of joint probability distributions - BNs have complicated notion of independence,
which takes into account the directionality of
the arrows
8Bayes Rule
2. Introduction to Bayesian Network
- Can rearrange the conditional probability formula
- to get P(AB) P(B) P(A,B), but by symmetry we
can also get P(BA) P(A) P(A,B) It follows
that - The power of Bayes' rule is that in many
situations where we want to compute P(AB) it
turns out that it is difficult to do so directly,
yet we might have direct information about
P(BA). Bayes' rule enables us to compute P(AB)
in terms of P(BA).
92. Introduction to Bayesian Network
Need prior P for root nodes and conditional Ps,
that consider all possible values of parent
nodes, for nonroot nodes.
Cloudy
P(C)0.5, P(-C)0.5
Sprinkler
Rain
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
10Major benefit of BN
2. Introduction to Bayesian Network
- We can know P(W) based only on the conditional
probabilities of W and its parent nodes (R and
S). We dont need to know/include all the
ancestor probabilities between W and the root
nodes (C) .
Cloudy
Sprinkler
Rain
Wet Grass
11This BN benefit hugely reduces of numbers and
computations needed for large networks, e.g.
hundreds or thousands of genes
2. Introduction to Bayesian Network
- SSR article many separate Bayesian networks
generated based on gene expression data. Here one
activator and one repressor form basic BN, with 3
corresponding expression contexts shown at
bottom.
122. Introduction to Bayesian Network
Cloudy
P(C)0.5, P(-C)0.5
Order of reduction of required numbersReduce
from 24-115 to 9
Sprinkler
Rain
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
132. Introduction to Bayesian Network
Bayesian network general formulation
Given a graph G, the likelihood of observing the
data D
De Jong (2002)
14Evaluating Bayesian networks
2. Introduction to Bayesian Network
Where do the numerical estimates of probability
come from?
- Can be, at least initialized with, expert opinion
- Can be learned by system
- Both SSR and BSK articles lay out basics and some
details of iterative algorithms for finding
probability numbers.
15Modellinging regulatory network
2. Introduction to Bayesian Network
De Jong (2002)
163. Case study
- Module networks identifying regulatory modules
and their condition-specific regulators from gene
expression data - Segal, Shapira, Regev, Peer, Botstein, Koller,
Friedman SSR - Nature Genetics, June 2003
- Bayesian network-based algorithms are applied to
gene expression data to generate good testable
hypotheses.
173. Case study
- Expression data set, from Patrick Browns lab, is
for genes of yeast subjected to various kinds of
stress - Compiled list of 466 candidate regulators
- Applied analysis to 2355 genes in all 173 arrays
of yeast data set - This gave automatic inference of 50 modules of
genes - All modules were analyzed with external data
sources to check functional coherence of gene
products and validity of regulatory program - Three novel hypotheses suggested by method were
tested in bio lab and found to be accurate
183. Case study
193. Case study
203. Case study
- 2 examples of 50 modules inferred by SSR methods
- Respiration mostly genes encoding respiration
proteins or glucose-metabolism proteins. One
primary regulator predicted Hap4 which is
known from past experiments to play activation
role in respiration. Secondary regulators affect
Hap4 expression. - Nitrogen catabolite repression 29 genes tied to
process by which yeast uses best available
nitrogen source. Key regulator suggested is Gat1,
due to 26 of 29 genes having Gat1 regulatory
motif in their upstream regions.
213. Case study
Respiration Network
Two major motifs found
223. Case study
- Evaluating module content and regulation programs
- All 50 modules were tested to see if proteins
coded in same module had related functions - Scored modules on how many genes are noted in
current bio databases as being related to the
predicted function diagram, next slide - 31 of 50 modules had coherence gt50 only 4 had
coherence lt30.
233. Case study
- Colored boxes indicate that known experimental
evidence validates the predicted regulatory role
of a regulator (named in one of the Reg
columns) in a given module (each row of the
table). - M, C and G column headers and different colors of
boxes represent different sorts of experimental
evidence that validate the models prediction. - C() functional coherence of module, from
literature mentions of module genes. - G number of genes in module
243. Case study
- To find global relationships between modules,
graph (next 2 slides) made showing modules
their motifs. Motifs were found within the 500
base pairs upstream from each gene. - Observations from this graph modules with
related biological functions often shared at
least one motif, sometimes shared one or more
regulator genes.
25Module relationships
3. Case study
Yeast mutants of Kin82, Ppt1, Ypl230W were
further tests to validate their relationship with
the module.
263. Case study
27What does a BN look like here?
3. Case study
- Need to specify two things to describe a BN
- Graph topology (structure)
- Parameters of each conditional probability
distribution - Possible to learn both from data
- Learning structure is much harder than learning
parameters
28What can we learn?Why the Segals paper can be
successful?
3. Case study
- Yeast! Not mouse or human.
- Use gene clustering first. Start from simplified
hypothesis and a small set of known regulators
not to attempt a network of thousands of genes. - Experimental validation.