Title: Using Bayesian Networks to Analyze Expression Data
1Using Bayesian Networks to Analyze Expression Data
- By Friedman Nir, Linial Michal, Nachman Iftach,
Pe'er Dana (2000) - Presented by
- Nikolaos Aravanis
- Lysimachos Zografos
- Alexis Ioannou
2Outline
- Introduction
- Bayesian Networks
- Application to expression data
- Application to cell cycle expression patterns
- Discussion and future work
3The Road to Microarray Data Analysis
- Development of microarrays
- Measure all the genes of an organism
- Enormous amount of data
- Challenge Analyze datasets and infer biological
interactions
4Most Common Analysis Tool
- Clustering Algorithms
- Allocate groups of genes with similar expression
patterns over a set of experiments - Discover genes that are co-regulated
5Problems
- Data give only a partial picture
- Key events are not reflected (translation and
protein (in) activation) - Amount of samples give few information for
constructing full detailed models - Using current technologies even few samples have
high noise to signal ratio
6Possible Solution
- Analyze gene expression patterns that uncover
properties of the transcriptional program - Examine dependence and conditional independence
of the data - Bayesian Networks
7Bayesian Networks
- Represent dependence structure between multiple
interacting quantities - Capable of handling noise and estimating the
confidence in the different features of the
network - Focus on interactions whose signal is strong
- Useful for describing processes composed of
locally interacting components - Statistical foundations for learning Bayesian
networks and the statistics to do so are well
understood and have been successfully applied - Provide models of causal influence
8Informal Introduction to Bayesian Networks
- Let P(X,Y) be a joint distribution over variables
X and Y - X and Y independent if P(X,Y) P(X)P(Y) for all
values X and Y
- Gene A is transcriptor factor of gene B
- We expect their expression level to be dependent
- A parent of B
- B trascription factor of C
- Expression levels of each pair are dependent
- If A does not directly affect C, if we fix the
expression - level of B, we will observe A and C are
independent - P(AB,C) P(AB) (A and C conditionally
independent of B) -
I(ACB)
9Informal Introduction to Bayesian Networks
(contd)
- Component of Bayesian Networks is that each
variable - is a stochastic function of its parents
- Stochastic models are natural in gene expression
domain - The biological models we want to process are
stochastic - Measurements are noisy
10Representing Distributions with Bayesian
Networks
- Representation of joint probability distribution
consisting of 2 components - Directed acyclic graph (G)
- Conditional distribution for each variable given
its parents in G - G encodes Markov Assumption
-
- By applying chain rule this decomposes in product
form
11Equivalence Classes of BNs
- A BN implies further independence assumptions
- gt Ind(G)
- gt1 graphs can imply the same assumptions
- gt Equivalent networks if Ind(G)Ind(G')
12Equivalence Classes of BNs
- A BN implies further independence statements
- gt Ind(G)
- gt1 graphs can imply the same statements
- gt Equivalent networks if Ind(G)Ind(G')
13Equivalence Classes of BNs
- For equivalent networks
- DAGs have the same underlying undirected graph.
- PDAGs are used to represent them.
14Equivalence Classes of BNs
- For equivalent networks
- DAGs have the same underlying undirected graph.
- PDAGs are used to represent them.
Disagreeing edge
15Learning BNs
- Question
- Given dataset D, what BN, BltG,Tgt best matches D?
- Answer
- Statistically motivated scoring function to
evaluate each BN e.g. Bayesian Score - S(GD)logP(GD)logP(DG)logP(G)C,
- where C is a constant independent of G
- and P(DG)?P(DG,T)P(TG)dT
- is the marginal likelihood over all parameters
for G.
16Learning BNs
- Question
- Given dataset D, what BN, BltG,Tgt best matches D?
- Answer
- Statistically motivated scoring function to
evaluate each BN e.g. Bayesian Score - S(GD)logP(GD)logP(DG)logP(G)C,
- where C is a constant independent of G
- and P(DG)?P(DG,T)P(TG)dT
- is the marginal likelihood over all parameters
for G.
17Learning BNs (contd)
- Steps
- Decide priors (P(TD), P(G))
- gt Use of BDe priors
- (structure equivalent, decomposable)
- Find G to maximize S(GD)
- NP hard problem
- gtlocal search using local permutations of
candidate G
(Heckerman et al. 1995)
18Learning Causal Patterns
- Bayesian Network is model of dependencies
- Interest in modelling the process that generated
them. - gt model the flow of causality in the system of
interest and create a Causal Network (CN). - A Causal Network models the probability
distribution - as well as the effect of causality.
19Learning Causal Patterns
- CNs VS BNs
- - CNs interpret parents as immediate causes
- (c.f. BNs)
- - CNs and BNs relate when using the
- Causal Markov Assumption
- given the values of a variable's immediate
causes, it is independent of its earlier causes,
if this holds, then BNCN
20Learning Causal Patterns
- CNs VS BNs
- - CNs interpret parents as immediate causes
- (c.f. BNs)
- - CNs and BNs relate when using the
- Causal Markov Assumption
- given the values of a variable's immediate
causes, it is independent of its earlier causes,
if this holds, then BNCN
equivalent BNs but not CNs
21Applying BNs to Expression Data
- Expression level of each gene as a random
variable - Other attributes (e.g temperature, exp.
conditions) that affect the system can be
modelled as random variables - Bayesian Net/ Dependency structure can answer
queries - CON problems in computational complexity and
the statistical significance of the resulting
networks. - PRO genetic regulation networks are sparse
22Representing Partial Models
- Gene networks many variables
- gt gt1 plausible models (not enough data) we can
learn up to equivalence class. - Focus on feature learning in order to have a
causal network
23Representing Partial Models
- Features
- - Markov relations (e.g. Markov Blanket)
- - Order relations (e.g. X is an ancestor of Y in
all networks)
24Representing Partial Models
- Features
- - Markov relations (e.g. Markov Blanket)
- - Order relations (e.g. X is an ancestor of Y in
all networks) - Feature learning leads to a Causal Network
25 Statistical Confidence of Features
- Likelihood that a given feature is actually
true. - Can't calculate posterior (P(GD))
- gt Bootstrap method
for i1...n resample D with replacement -gt
D' learn G' from D' end
26 Statistical Confidence of Features
- Individual feature confidence (IFC)
- IFC (1/n)?f(G')
- where f(G') 1 if the feature exists in G'
27Efficient Learning Algorithms
- Vast search space
- gt need efficient algorithms
- Attention on relevant regions of the search
space - gt Sparse Candidate Algorithm
-
28Efficient Learning Algorithms
- Sparse Candidate Algorithm
- Identify a small number of candidate parents
for each gene based on simple local statistics
(e.g. correlation). - Restrict our search to networks with the
candidate parents - Potential pitfall early choice
- gt Solution adaptive algorithm
29Discretization
- The practical side
- Need to define the local probability model for
each variable. - gt discretize experimental data into -1,0,1
- (expression level lower, similar, higher than
control) - Set control by averaging.
- Set a threshold ratio for significantly
higher/lower.
30Application to Cell Cycle Expression Patterns
- 76 gene expression measurements of the mRNA
levels of 6177 Saccharomyces cerevisiae ORFs. Six
time series under different cell cycle
synchronization methods(Spellman 1998). - 800 differentially expressed, 250 clustered in 8
distinct clusters. Variables for the networks
represent the expression level of the 800 genes. - Introduced an additional variable that denoted
the cell cycle phase to deal with the temporal
nature of the cell cycle process and forced it as
a root in the network - Applied Sparse Candidate Algorithm to 200- fold
bootstrap of the original data. - Used no prior biological knowledge in the
learning algorithm
31Network with all edges
32Network with edges that represent relations with
confidence level above 0.3
33YNL058C Local Map
- Edges
- Markov
- Ancestors
- Descendants
- SGD entry
- YPD entry
34Robustness analysis
- Use 250 gene data for robustness analysis
- Create random data set by permuting the order of
experiments independently for each gene - No real features are expected to be found
35Robustness analysis (contd)
- Lower confidence for order and
- Markov relations in the random
- data set
- Longer and heavier tail in the
- high confidence region in the
- original data set
- Sparser networks learned from
- real data
- Features learned in original
- data with high confidence level are
- not an artifact of the bootstrap
- estimation
36Robustness analysis (contd)
- Compared confidence level of learned features
between 250 and 800 gene data set - Strong linear correlation
- Compared confidence level of learned features
between different discretization thresholds - Definite linear tendency
37Biological Analysis
- Order relations
- Dominant genes indicate potential causal sources
of the cell cycle process - Dominance score of X
- where is the confidence in X
being ancestor of Y , k is used to reward high
confidence features and t is a threshold to
discard low confidence ones
38Biological Analysis (contd)
- Dominant genes are key genes in basic cell
functions
39Biological Analysis (contd)
Markov relations
- Top Markov relations reveal functional relations
between genes - 1. Both genes known The relations make sense
biologically - 2. One unknown gene Firm homologies to proteins
functionally - related to the other gene
- 3. Two unknown genes Physically adjacent to the
chromosome, - presumably regulated by the same
mechanism - FAR1- ASH1, low correlation, different clusters,
known though to participate in a mating type
switch - CLN2 is likely to be a parent to RNR3, SVS1, SRO4
and RAD41. Appeared in same cluster. No links
between the 4 genes. CLN2 is known to be a
central cycle control and there is no clear
biological relationship between the others
40Discussion and Future Work
- Applied Sparse Candidate Algorithm and Bootstrap
resampling to extract a Bayesian Network for the
800 genes data set of Spellman - Used no prior biological knowledge
- Derived biologically plausible conclusions
- Capability of discovering causal relationships,
interactions between genes and rich structure
between clusters. - Developing hybrid algorithms with clustering
algorithms to learn models over clustered genes - Extensions
- Learn local probability models dealing with
continuous data - Improve theory and algorithms
- Include biological knowledge as prior knowledge
- Improve search heuristics
- Apply Dynamic Bayesian Networks to temporal data
- Discover causal patterns (using interventional
data)