Identifying co-regulation using Probabilistic Relational Models - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

Identifying co-regulation using Probabilistic Relational Models

Description:

Identifying co-regulation using Probabilistic Relational Models. by Christoforos ... methods to express structural, nonquantifiable prior beliefs about the ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 75

Provided by: Christo151

Category:

more less

Transcript and Presenter's Notes

Title: Identifying co-regulation using Probabilistic Relational Models

1
Identifying co-regulation using Probabilistic
Relational Models

by Christoforos Anagnostopoulos
BA Mathematics, Cambridge University
MSc Informatics, Edinburgh University

supervised by Dirk Husmeier
2
General Problematic

Bringing together disparate data sources

Promoter sequence data ...ACGTTAAGCCAT... ...GGCAT
GAATCCC...
3
General Problematic

Bringing together disparate data sources

Promoter sequence data ...ACGTTAAGCCAT... ...GGCAT
GAATCCC...
Gene expression data gene 1 overexpressed gene
2 overexpressed ...
mRNA
4
General Problematic

Bringing together disparate data sources

Promoter sequence data ...ACGTTAAGCCAT... ...GGCAT
GAATCCC...
Gene expression data gene 1 overexpressed gene
2 overexpressed ...
mRNA
Protein interaction data protein 1 protein 2 ORF
1 ORF 2 ------------------------------------
-------------- AAC1 TIM10 YMR056C
YHR005CA AAD6 YNL201C YFL056C YNL201C
Proteins
5
Our data
Promoter sequence data ...ACGTTAAGCCAT... ...GGCAT
GAATCCC...
Gene expression data gene 1 overexpressed gene
2 overexpressed ...
mRNA
6
Bayesian Modelling Framework

Bayesian Networks

7
Bayesian Modelling Framework
Conditional Independence Assumptions
Factorisation of the Joint Probability
Distribution

Bayesian Networks

UNIFIED TRAINING
8
Bayesian Modelling Framework
Probabilistic Relational Models

Bayesian Networks

9
Aims for this presentation

Briefly present the Segal model and the main
criticisms offered in the thesis
Briefly introduce PRMs
Outline directions for future work

10
The Segal Model

Cluster genes into transcriptional modules...

Module 1
Module 2
?
gene
11
The Segal Model
Module 1
Module 2
P(M 1)
P(M 2)
gene
12
The Segal Model

How to determine P(M 1)?

Module 1
P(M 1)
gene
13
The Segal Model

How to determine P(M 1)?

Motif Profile motif 3 active motif 4 very
active motif 16 very active motif 29 slightly
active
Module 1
gene
14
The Segal Model

How to determine P(M 1)?

How to determine P(M 1)?

Predicted Expression Levels Array 1
overexpressed Array 2 overexpressed Array 3
underexpressed ...
Motif Profile motif 3 active motif 4 very
active motif 16 very active motif 29 slightly
active
Module 1
P(M 1)
gene
16
The Segal model
PROMOTER SEQUENCE
17
The Segal model
PROMOTER SEQUENCE
MOTIF PRESENCE
18
The Segal model
PROMOTER SEQUENCE
MOTIF MODEL
MOTIF PRESENCE
19
The Segal model
MOTIF PRESENCE
MODULE ASSIGNMENT
20
The Segal model
MOTIF PRESENCE
REGULATION MODEL
MODULE ASSIGNMENT
21
The Segal model
MODULE ASSIGNMENT
EXPRESSION DATA
22
The Segal model
MODULE ASSIGNMENT
EXPRESSION MODEL
EXPRESSION DATA
23
Learning via hard EM
HIDDEN
24
Learning via hard EM
Initialise hidden variables
25
Learning via hard EM
Initialise hidden variables
Set parameters to Maximum Likelihood
26
Learning via hard EM
Initialise hidden variables
Set parameters to Maximum Likelihood
Set hidden values to their most probable value
given the parameters (hard EM)
27
Learning via hard EM
Initialise hidden variables
Set parameters to Maximum Likelihood
Set hidden values to their most probable value
given the parameters (hard EM)
28
Motif Model
OBJECTIVE Learn motif so as to discriminate
between genes for which the
Regulation variable is on and genes
for which it is off.
r 1
r 0
29
Motif Model scoring scheme
...CATTCC...
high score
low score
...TGACAA...
30
Motif Model scoring scheme
...CATTCC...
high score
low score
...TGACAA...
high scoring subsequences
...AGTCCATTCCGCCTCAAG...
31
Motif Model scoring scheme
...CATTCC...
high score
low score
...TGACAA...
high scoring subsequences
...AGTCCATTCCGCCTCAAG...
low scoring (background) subsequences
32
Motif Model scoring scheme
...CATTCC...
high score
low score
...TGACAA...
high scoring subsequences
...AGTCCATTCCGCCTCAAG...
promoter sequence scoring
low scoring (background) subsequences
33
Motif Model
SCORING SCHEME
P ( g.r true g.S, w )
parameter set
w
can be taken to represent motifs
34
Motif Model
SCORING SCHEME
P ( g.r true g.S, w )
parameter set
w
can be taken to represent motifs
Maximum Likelihood setting
Most discriminatory motif
35
Motif Model overfitting
TRUE PSSM
36
Motif Model overfitting
typical motif ...TTT.CATTCC...
TRUE PSSM
high score
37
Motif Model overfitting
typical motif ...TTT.CATTCC...
TRUE PSSM
high score
INFERRED PSSM
Can triple the score!
38
Regulation Model
For each module m and each motif i, we estimate
the association umi
P ( g.M m g. R ) is proportional to
39
Regulation Model Geometrical Interpretation
The (umi )i define separating hyperplanes Classi
fication criterion is the inner product Each
datapoint is given the label of the hyperplane it
is the furthest away from, on its positive side.
40
Regulation Model Divergence and Overfitting
pairwise linear separability overconfident
classification
Method A dampen the parameters (eg Gaussian
prior) Method B make the dataset linearly
inseparable by augmentation
41
Erroneous interpretation of the parameters
Segal et al claim that When umi 0, motif i
is inactive in module m When umi gt 0 for all
i,m, then only the presence of motifs is
significant, not their absence
42
Erroneous interpretation of the parameters
Segal et al claim that When umi 0, motif i
is inactive in module m When umi gt 0 for all
i,m, then only the presence of motifs is
significant, not their absence
Contradict normalisation conditions!
43
Sparsity
INFERRED PROCESS
TRUE PROCESS
44
Sparsity
Reconceptualise the problem
Sparsity can be understood as pruning Pruning
can improve generalisation performance (deals
with overfitting both by damping and by
decreasing the degrees of freedom) Pruning ought
not be seen as a combinatorial problem, but can
be dealt with appropriate prior distributions
45
Sparsity the Laplacian
How to prune using a prior choose a prior with
a simple discontinuity at the origin, so that
the penalty term does not vanish near the
origin every time a parameter crosses the
origin, establish whether it will escape the
origin or is trapped in Brownian motion around
it if trapped, force both its gradient and value
to 0 and freeze it Can actively look for nearby
zeros to accelerate pruning rate
46
Results generalisationperformance
Synthetic Dataset with 49 motifs, 20 modules and
1800 datapoints
47
Results interpretability
DEFAULT MODEL LEARNT WEIGHTS
TRUE MODULE STRUCTURE
LAPLACIAN PRIOR MODEL LEARNT WEIGHTS
48
Regrets BIOLOGICAL DATA
49
Aims for this presentation

Briefly present the Segal model and the main
criticisms offered in the thesis
Briefly introduce PRMs
Outline directions for future work

50
Probabilistic Relational Models
How to model context specific regulation? Need
to cluster the experiments...
51
Probabilistic Relational Models
Variable A can vary with genes but not with
experiments
52
Probabilistic Relational Models
We now have variability with experiments but also
with genes!
53
Probabilistic Relational Models
Variability with experiments as required but too
many dependencies
54
Probabilistic Relational Models
Variability with experiments as required provided
we constrain the parameters of the probability
distributions P(EA) to be equal
55
Probabilistic Relational Models
Resulting BN is essentially UNIQUE. But
derivation VAGUE, COMPLICATED, UNSYSTEMATIC
56
Probabilistic Relational Models
GENES g.S1, g.S2, ... g.R1, g.R2, ... g.M g.E1,
g.E1, ...
this variable cannot be considered an attribute
of a gene, because it has attributes of its own
that are gene-independent
57
Probabilistic Relational Models
GENES g.S1, g.S2, ... g.R1, g.R2, ... g.M g.E1,
g.E1, ...
58
Probabilistic Relational Models
GENES g.S1, g.S2, ... g.R1, g.R2, ... g.M g.E1,
g.E1, ...
EXPERIMENTS e.Cycle_Phase e.Dye_Type
59
Probabilistic Relational Models
GENES g.S1, g.S2, ... g.R1, g.R2, ... g.M g.E1,
g.E1, ...
EXPERIMENTS e.Cycle_Phase e.Dye_Type
An expression measurement is an attribute of both
a gene and an experiment.
60
Probabilistic Relational Models
GENES g.S1, g.S2, ... g.R1, g.R2, ... g.M g.E1,
g.E1, ...
EXPERIMENTS e.Cycle_Phase e.Dye_Type
MEASUREMENTS m(e,g).Level
61
Examples of PRMs - 1
Segal et al, From Promoter Sequence to Gene
Expression
62
Examples of PRMs 1
Segal et al, From Promoter Sequence to Gene
Expression
63
Examples of PRMs - 2
Segal et al, Decomposing gene expression into
cellular processes
64
Examples of PRMs - 2
Segal et al, Decomposing gene expression into
cellular processes
65
Probabilistic Relational Models
PRM BN1, BN2, BN3, ...
given Dataset1 PRM BN1 given Dataset2 PRM
BN2
Relational schema higher level
description of data PRM higher level
description of BNs
66
Probabilistic Relational Models

Relational vs flat data structures
Natural generalisation knowledge carries over
Expandability
Richer semantics better interpretability
No loss in coherence

Personal opinion (not tested yet)
Not entirely natural as a generalisation
Some loss in interpretability
Some loss in coherence

67
Aims for this presentation

Briefly present the Segal model and the main
criticisms offered in the thesis
Briefly introduce PRMs
Outline directions for future work

68
Future research

Improve the learning algorithm
soften it by exploiting sparsity
systematise dynamic
addition / deletion

69
Future research

2. Model Selection Techniques improve
interpretability
learn the optimal number of
modules in our model

70
Future research

2. Model Selection Techniques improve
interpretability
learn the optimal number of
modules in our model
Are such methods consistent?
Do they carry over just as well in PRMs?

71
Future research

3. Fine tune the Laplacian regulariser to fit the
skewing of the model

72
Future research

4. The choice of encoding the question into a
BN/PRM is only partly determined by the domain
Are there any general rules about how to
restrict the choice so as to promoter
interpretability?

73
Future research