Title: Folie 1
1Graphical Models and Biological Networks
Recommended Christopher Bishops tutorial on
graphical models, based on his book Pattern
Recognition and Machine Learning
http//research.microsoft.com/en-us/um/people/cmb
ishop/prml/slides/prml-slides-8.ppt
2Learning from non-interventional data
- Gaussian Graphical Models
- Pruning
Best suited for high dimensional, noisy data
What do the arrows mean? Do they have a
biological interpretation?
3Learning from non-interventional data
- Possible Models include
- Correlation Graphs
- Gaussian Graphical Models
- Bayesian Networks
- However Correct Reconstruction of the complete
regulatory network is impossible due to - Lack of data
- Measurement error
- Oversimple/wrong model assumptions
All models are wrong, some of them are
useful (Edwards Deming, George Box)
4Correlation Graphs
5Correlation Graphs
- An expression profile is a collection of
expression vectors Xg
(Xg,s)s ? samples , g ? Genes - Correlation graph Depict genes as vertices of a
graph and draw an undirected edge (i, j) if some
correlation measure (Pearson correlation,
Spearman rank correlation, Kendalls tau) between
Xi and Xj is sufficiently different from zero. - Advantage This representation of the marginal
dependence structure is easy to interpret and can
be estimated accurately even if genes
measurements (p N situation). - Application Stuart et al. (Science, 2003) build
a graph from coexpression across multiple
organisms.
6Correlation Measures
Pearson
with
Spearman
Kendall
Sample (Pearson) correlations (taken from
Wikipedia)
7Correlation Graphs
- It is impossible to distinguish direct from
indirect dependence - Three reasons why X, Y , and Z may be highly
correlated - Possible remedies
- search for correlations which cannot be explained
by other variables. - measure effects of gene perturbations
- A strong correlation is not a strong evidence for
regulatory dependence (lots of false positives)
rather than a low correlation is a strong
evidence for no regulatory edge.
8Recap Conditional Independence
- In other words
- Knowing Z, knowing Y is irrelevant for knowing X
(and vice versa). - Z explains any observed dependence between X and
Y .
taken from Florian Markowetz
9Gaussian Graphical Models
taken from Florian Markowetz
10Gaussian Graphical Models
11Gaussian Graphical Models
If we assume that the common expression
distribution of all genes follows a multivariate
Gaussian distribution (which is of course never
the case), conditional independence can be
assessed as follows
12Problems in high dimensions
- Full conditional relationships can only be
accurately estimated if the number of samples N
is relatively large compared to the number of
variables p.
- Thus, if p N, you can . . .
- use the Moore-Penrose pseudoinverse, bootstrap
aggregation and shrinkage estimators to stabilize
the result - resort to a simpler model that does not rely on
full conditional independence
Graph from Basso et al (Nat Genet, 2005)
13Problems in high dimensions
14Modified GGMs
Correlation Graphs
GGMs
Wille / Bühlmann
Recall that independence does not imply
conditional independence and vice versa, thus all
these methods are distinct.
All methods failed to accurately reconstruct
networks, even if they were of very moderate
size (20)
15Markov Random Fields
Definition An undirected graph G(V,E), together
with a family of random variables (Xv, v?V) is a
Markov network (Markov Random Field, MRF) if one
of the three equivalent conditions holds
Pairwise Markov property
For all non-adjacent u,v ?V
Local Markov property
For all v?V
Global Markov property
For all subsets A,B,S of V such that S separates
A and B
16Markov Random Fields
The joint density of a Markov Random field can be
factorized into clique potentials,
if the density is positive (Hammersley-Clifford
theorem), or if the graph is chordal (without
proof).
A Gaussian Graphical Model is a particular Markov
random field
And (u,v)?E whenever
(Proof Blackboard)
17Graphical Models - Overview
Probabilistic models
Graphical models
Undirected
Directed
(Markov Randomfields - MRFs)
(Bayesian networks)