Rich Probabilistic Methods for Gene Expression - PowerPoint PPT Presentation

About This Presentation

Title:

Rich Probabilistic Methods for Gene Expression

Description:

Automatically trades off fit to data (likelihood of data) with model complexity ... Handling time. Handling sequence data (TFs) Incorporate structure information ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 28

Provided by: get73

Learn more at: http://ai.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Rich Probabilistic Methods for Gene Expression

1
Rich Probabilistic Methodsfor Gene Expression
Eran Segal Ben Taskar Audrey Gasch Nir Friedman
Daphne Koller
2
Outline

Motivation for richer models
PRMs for gene expression Modeling Learning Inf
erence
Results Synthetic Stress Compendium

3
One Sided Clustering

Non-Parametric Clustering
Hierarchical Agglomerative
SVD
K-means
Parametric Clustering
Probabilistic Clustering

Autoclass using expression levels
Gene-cluster
Level-1
Level-n
Level-2
experiments
4
One Sided Clustering
Experiments
Undetected Separability
Cluster 1
Cluster 2
Undetected Similarity
Cluster 3
Genes
Cluster 4
Cluster 5
Cluster 6
5
Basic Bi-Clustering
Experiments
Detected Separability
C1
C2
C3
C4
C5
C6
Undetected Similarity
C7
C8
C9
Genes
C11
C10
C12
C14
C13
C15
C17
C16
C18
6
Desired Clustering

Allow for non-grid clusters
Rows no longer correspond to genes (similarly
for columns)

Experiments
Detected Separability
C1
C2
C3
C4
C5
Detected Similarity
C6
C7
Genes
C8
C9
C11
C10
C12
C14
C13
C15
7
Basic Bi-Clustering
Clust(gene2)
Clust(exp2)
Clust(gene1)
Clust(exp1)
G1-E2
G1-E1
G2-E2
G2-E1
Two-sided clustering (PLSA, Hoffman)
8
Outline

Motivation for richer models
PRMs for gene expression Modeling Learning Inf
erence
Results Synthetic Stress Compendium

9
PRMs Basic Bi-Clustering
Classes of objects
Gene
Experiment
Gene-cluster
Exp. cluster
Expression
Level
Compact representation of two-sided clustering
10
PRMs Relational Schema

Describes the types of objects and relations in
the database

Gene
Experiment
Mutation
Cluster
Cluster
Binding Sites
Exp. Attributes
Functional Classes
Expression
Exp. Level
11
PRM for Compendium Data

Parameters for nodes
Structure over gene features

Gene
Array/Mutated Gene
GCluster
GCluster (of mutated gene)
GCN4
HSF
Lipid (of mutated gene)
ACluster
Lipid
Endoplasmatic
Expression
Level
12
Resulting Bayesian Network

3 Genes, 2 Mutation Experiment

Lipid
Lipid
ACluster1
ACluster1
GCluster1
Endoplasmatic
E1,2
E1,1
GCluster2
Endoplasmatic
E2,1
E2,2
GCluster2
Endoplasmatic
E3,1
E3,2
13
PRM Learning
Data
Gene
Experiment
Gene-cluster
Exp. cluster
Learner
Expression
Level
Expert knowledge

PRM models can be learned from empirical data
parameter estimation
structure learning learning the dependency
structure
Can learn with missing data hidden variables

14
PRM Learning

Goal Find PRM structure that explains the data
well
Define scoring function to evaluate models
Bayesian Score works bestScore (SD) log
P(D S) P(S)
Automatically trades off fit to data (likelihood
of data) with model complexity
Do heuristic search to find high-scoring
structure
Structure found is not necessarily best one

Marginal likelihood
Prior
15
Learning PRMs

Parameter Estimation EM Approximate Inference
for E-Step
Structure Learning Complete Data Learning Tree
splits Avoiding Local Maxima
Structure Learning Incomplete Data Iterate
until convergence (Hard SEM) EM Hard
assignment Structure Learning

16
Context Specific Dependencies
GCluster 0 (of gene)
true
false
GCluster 3(of mutant)
. . .
false
true
HSF gt 2
ACluster 4
false
true
false
true
Endoplasmatic
Level
Level
. . .
true
false
Level
Level
17
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiments
Genes
18
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiments
Genes
19
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Gene Similarity
Experiments
Genes
20
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiment Similarity
Experiments
Genes
21
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Separability by TF
Experiments
Genes
22
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Attribute Dependencies
Experiments
InduceCluster Change
Genes
23
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Achieved Desired Clustering
Experiments
Genes
24
Outline

Motivation for richer models
PRMs for gene expression Modeling Learning Inf
erence
Results Synthetic Stress Compendium

25
Synthetic Data Recovering Structure

Synthetic data 1000 genes, 90 arrays (12 types)
Parents recovered Simulated data 84.5 /-
2.5 Permuted data 56 /- 2.5
Cluster recovery Simulated data PRMs
98.4 /- 1.07 Naïve Bayes 90.8
/- 0.42 Permuted data PRMs 88.1
/- 1.52 Naïve Bayes 76.7 /-
1.42

26
Stress Data

954 genes, 88 arrays (12 types)
Structure learning 15 significant TFs 7
significant function categories
Cluster coherence Average variance reduction
0.69 -gt 0.61 in 3 iterations
Allowing annotation changes Average variance
reduction 0.69 -gt 0.56 in 3 iterations

27
Fragment of PRM for Yeast Stress Data (Gasch al)
Gene
GCluster
Array
Carbon
AAM
Condition
Mig1
Expression
Level
28
Result Context-Specific Groupings

A grouping is a set of genes that behave the same
within a certain context a condition or a set
of conditions
Breakdown of genes into clusters is different in
different contexts

Yeast Stress Data (Gasch al)
29
Example Biological Result

Discovered grouping of 17 genes
all induced in diauxic shift
all have ? 2 binding sites for Mig1 transcription
factor
many not known to have been regulated by Mig1
Context-sensitive groupings were key to
identifying cluster

30
Compendium Data Results

Figure out array cluster of particular gene
mutation before performing the experiment
Can hope to do this because
array cluster depends on gene cluster
gene cluster predicted based on behavior in other
arrays

1
44 arrays predictedat 95 accuracy
0.8
0.6
Correct predictions
Accuracy / Predicted
Total predicted
0.4
0.2
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prediction confidence
31
Future Directions