From Sequence to Expression: A Probabilistic Framework - PowerPoint PPT Presentation

About This Presentation
Title:

From Sequence to Expression: A Probabilistic Framework

Description:

Itamar Simon (Whitehead Inst.) Nir Friedman (Hebrew U.) Daphne Koller (Stanford) ... Simon et al (2001) ... data for 9 TFs (Simon et al) Yeast genome (promoters) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 34
Provided by: erans4
Category:

less

Transcript and Presenter's Notes

Title: From Sequence to Expression: A Probabilistic Framework


1
From Sequence to ExpressionA Probabilistic
Framework
  • Eran Segal(Stanford)

Joint work with
Nir Friedman (Hebrew U.) Daphne Koller (Stanford)
Yoseph Barash(Hebrew U.) Itamar Simon (Whitehead
Inst.)
2
Understanding Cellular Processes
  • Complex biological processes (e.g. cell cycle)
  • Coordination of multiple events
  • Each event requires different modules

Can we recover the regulatory circuits that
control such processes?
3
Gene Structure
4
Gene Regulation
A
mRNA
5
Gene Regulation
A
mRNA
6
Gene Regulation
Swi5
mRNA
7
Gene Regulation
Swi5
mRNA
8
Gene Regulation
mRNA
9
Goal
ACTAGTGCTGA

CTATTATTGCA
CTGATGCTAGC
10
Model of Gene Regulation
Probabilistic Relational Models (PRMs) Pfeffer
and Koller (1998)Friedman et al (1999)Segal et
al (2001)
Sequence
Promoter sequences
Gene
Experiment
Regulation by transcription factors
  • Context
  • Cluster

Expression measurements
Expression
11
Regulation to Expression
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. cluster
Level
Expression
R(t1) yes ? t1 regulates geneR(t1) no ?
t1 does not regulate gene
12
Regulation to Expression
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. cluster
Level
Expression
13
Modeling Context Specificity
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. cluster
Level
Expression
14
Sequence Model
  • Assumptions
  • Binding site is of length k
  • Binding may occur at any k-mer
  • TF regulates gene if binding occurs anywhere

Sequence
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. cluster
Level
Expression
15
From Sequence to Regulation
  • Assumptions
  • Binding site is of length k
  • Binding may occur at any k-mer
  • TF regulates gene if binding occurs anywhere

16
From Sequence to Regulation
  • Model for one gene g, promoter region of length
    5 and k2

17
Joint Probabilistic Model
k-mer

s1
sk
B(t1)
B(t2)
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. Cluster
Level
Expression
18
Localization Assay
19
Localization Assay
Swi5
  • Induce TF protein level

20
Localization Assay
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
? TF binds to targets
  • Induce TF protein level

21
Localization Assay
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Swi5
Gene Not Bound
Gene Bound
  • Measure TF binding to promoter of every gene
  • Assign confidence for each binding

? TF binds to targets
  • Induce TF protein level

22
Localization Assay
  • Localization data measure TF binding to promoter
    of each gene (assign binding confidence)

Simon et al (2001)
23
Is Regulation Observed?
  • Not quite
  • Localization is measured for specific conditions
  • Localization is measured for large DNA regions
  • Localization is noisy

24
Incorporating Localization
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. Cluster
L(t2)
L(t1)
Observedlocalization
Level
Expression
  • Localization p-value is noisy sensor of actual
    regulation
  • If regulation occurs, p-value likely to be low
  • If no regulation, p-value likely to be high

25
Localization Model
Gene
R(t1)
L(t1)
Observed
  • Localization p-value is noisy sensor of actual
    regulation
  • If regulation occurs, p-value likely to be low
  • If no regulation, p-value likely to be high

26
Joint Probabilistic Model
promoter

s1
sk
Gene
Experiment
R(t2)
R(t1)
Exp. type
Exp. Cluster
L(t2)
L(t1)
Level
Expression
27
Learning the Models
Experimental Details
LocalizationData
28
Learning the Models
Experimental Details
LocalizationData
29
Model Learning
  • Structure Learning
  • Tree structure
  • Missing Data
  • Experiment cluster
  • Regulation variables
  • Motif Model
  • Parameter estimation

30
Model Learning
promoter

s1
sk
Gene
Experiment
R(t2)
R(t1)
Exp. type
L(t1)
Exp. cluster
L(t1)
Level
Expression
31
Resulting Bayesian Network
Exp. type
Exp. type2
s11
sk1
Exp. cluster
Exp. cluster
R(t2)1
Level1,2
Level1,1
L(t2)1
R(t1)1
L(t1)1
s12
sk2
R(t2)2
Level2,2
Level2,1
L(t2)2
R(t1)2
L(t1)2
s13
sk3
R(t2)3
Level3,2
Level3,1
L(t2)3
R(t1)3
L(t1)3
32
Model Learning E-Step
Exp. type
Exp. type2
s11
sk1
Exp. cluster
Exp. cluster
R(t2)1
Level1,2
Level1,1
L(t2)1
R(t1)1
L(t1)1
s12
sk2
R(t2)2
Level2,2
Level2,1
L(t2)2
R(t1)2
L(t1)2
s13
sk3
R(t2)3
Level3,2
Level3,1
L(t2)3
R(t1)3
Loopy belief propagation
L(t1)3
33
Model Learning M-Step
Exp. type
Exp. type2
s11
sk1
Exp. cluster
Exp. cluster
R(t2)1
Level1,2
Level1,1
L(t2)1
R(t1)1
L(t1)1
s12
sk2
R(t2)2
Level2,2
Level2,1
L(t2)2
R(t1)2
L(t1)2
s13
sk3
R(t2)3
ConjugateGradient
Level3,2
Level3,1
L(t2)3
R(t1)3
Standard ML estimation
L(t1)3
34
Experimental Results
  • Yeast
  • Cell Cycle expression data (Spellman et al)
  • Localization data for 9 TFs (Simon et al)
  • Yeast genome (promoters)

35
Generalization
Gene log-likelihood
-112.24
Experiment
Gene
R(t2)
R(t1)
Exp. Cluster
Level
Expression
36
Generalization
Gene log-likelihood
-121.48
  • Localization

-112.24
Experiment
Gene
L(t2)
L(t1)
Exp. type
Level
Expression
37
Generalization
Gene log-likelihood
-112.24
Experiment
Gene
R(t2)
R(t1)
Exp. type
Exp. Cluster
Level
Expression
38
Generalization
Gene log-likelihood
-112.24
Experiment
Gene
R(t2)
R(t1)
Exp. type
Exp. Cluster
Level
Expression
39
Generating Hypotheses
40
Expression vs Regulation
1
0.5
Genes predicted to be regulated by Swi5 are
probably real Swi5 targets
0
-0.5
-1
0
21
42
63
84
105
10
70
100
130
160
190
220
250
0
30
60
90
120
150
0
90
180
270
360
cdc15
cdc28
elu
alpha
41
Combinatorial Effects
1
Phase
Fkh2 Swi4
Fkh2 Ndd1
0.5
0
-0.5
-1
0
21
42
63
84
105
10
70
100
130
160
190
220
250
0
30
60
90
120
150
0
90
180
270
360
cdc15
cdc28
elu
alpha
42
Combinatorial Effects
1
Phase
0.5
Mcm1 Ndd1
Mcm1 Ace2
Mcm1 Swi5
0
-0.5
-1
0
21
42
63
84
105
10
70
100
130
160
190
220
250
0
30
60
90
120
150
0
90
180
270
360
cdc15
cdc28
elu
alpha
43
Localization Assignment Changes
44
Motifs Found
  • Ndd1

Simonet al.
17
Expanded set identified additional genes
regulated by Ndd1
ExpandedSet
28
1
Remaining Genes
45
TF Simon Expanded Rest P-Value
Ace2 10 9 1 1.4e-6
Fkh1 29 25 8 4.4e-10
Fkh2 29 29 10 5.4e-11
Mbp1 66 56 8 1.9e-45
Mcm1 28 24 2 4.2e-18
Ndd1 17 28 1 1.9e-24
Swi4 41 37 5 6.4e-26
Swi5 28 23 2 4.9e-15
Swi6 50 52 6 2.3e-48
46
Induced Interaction Network
  • TF pairs whose regulation predicts expression of
    same gene cluster

G1
Swi5
Swi6
Ace2
M
Mcm1
Mbp1
Swi4
Fkh1
S
G2
Ndd1
Fkh2
47
Conclusions
  • Unified probabilistic model explaining gene
    regulation using sequence, localization and
    expression data
  • Models complex interactions between regulators
  • Discriminative model maximizing P(Expr. Seq.)
  • Sequence data helps explain expression patterns

48
Big Picture
  • Goal unified probabilistic framework
  • Models complex biological domains
  • Incorporates heterogeneous data
  • Framework incorporates explicitly within model
    basic biological building blocks
  • Genes, TFs, proteins, patients, cells, species,
  • Much closer connection between biology and model
  • Can read biology directly from model
  • Can incorporate prior knowledge easily
  • Can explicitly represent and learn biological
    models
Write a Comment
User Comments (0)
About PowerShow.com