ContextSpecific Bayesian Clustering for Gene Expression Data

About This Presentation

Title:

ContextSpecific Bayesian Clustering for Gene Expression Data

Description:

Data analysis methods are crucial for understanding such data ... MIG1 CCCCGC, CGGACC, ACCCCG. GAL4 CGGGCC. Others CCAATCA. mean expression level -2 -1 ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 19

Provided by: yoseph3

Category:

more less

Transcript and Presenter's Notes

Title: ContextSpecific Bayesian Clustering for Gene Expression Data

1
Context-Specific Bayesian Clustering for Gene
Expression Data

Yoseph Barash Nir Friedman
School of Computer Science Engineering
Hebrew University

2
Introduction

New experimental methods ? abundance of data
Gene Expression
Genomic sequences
Protein levels
Data analysis methods are crucial for
understanding such data
Clustering serves as tool for organizing the data
and finding patterns in it

3
This Talk

New method for clustering
Combines different types data
Emphasis on learning context-specific description
of the clusters
Application to gene expression data
Combine expression data with genomic information

4
The Data
Experiments
Binding Sites
Genes
i

Goal
Understand interactions between TF and expression
levels

5
Simple Clustering Model

attributes are independent given the cluster
Simple model ? computationally cheap
Genes are clustered according to both expression
levels and binding sites

6
Local Probability Models
TF1
TF2
Multinomial
Gaussian
7
Structure in Local Probability Models
TF1
TF2
8
Context Specific Independence

Benefits
Identifies what features characterize each
cluster
Reduces bias during learning
A compact and efficient representation

9
Scoring CSI Cluster Models

Represent conditional probabilities with
different parametric families
Gaussian,
Multinomial,
Poisson
Choose parameters priors from appropriate
conjugate prior families
Score
where

MarginalLikelihood
Prior
10
Learning Structure Naive Approach

A hard problem

Standard approach

Basic problem efficiency
11
Learning Structure Structural EM
We can evaluate each edges parameters separately
given complete data for MAP we compute EM only
once for each iteration Guaranteed to converge to
a local optimum
12
Results on Synthetic Data

Basic approach
Generate data from a known structure
Evaluate learned structures for different sample
numbers (200 800).
Add noise of unrelated samples to the training
set to simulate genes that do not fall into
nice functional categories (10-30).
Test learned model for structure as well as for
correlation between its tagging and the one
given by the original model.
Main results

Cluster number models with fewer clusters were
sharply penalized. Often models with 1-2
additional clusters got similar score , with
degenerate clusters none of the real samples
where classified to.
Structure accuracy very few false negative edges
, 10-20 false positive edges (score dependent)
Mutual information Ratio max for 800 samples ,
100-95 for 500 and 90 for 200 samples.
13
Yeast Stress Data (Gasch et al 2001)

Examines response of yeast to stress situations
Total 93 arrays
We selected 900 genes that changed in a
selective manner
Treatment steps
Initial clustering
Found putative binding sites based on clusters
Re-clustered with these sites

14
Stress Data -- CSI Clusters
15
CSI Clusters
16
Promoters Analysis

Cluster 3
MIG1 CCCCGC, CGGACC, ACCCCG
GAL4 CGGGCC
Others CCAATCA

17
Promoters Analysis

Cluster 7
GCN4 TGACTCA
Others CGGAAAA, ACTGTGG

18
Discussion

Goals
Identify binding sites/transcription factors
Understand interactions among transcription
factors
Combinatorial effects on expression
Predict role/function of the genes
Methods
Integration of model of statistical patterns of
binding sites (see Holmes Bruno, ISMB00)
Additional dependencies among attributes
Tree augmented Naive Bayes
Probabilistic Relational Models (see poster)

Write a Comment

User Comments (0)

About PowerShow.com

ContextSpecific Bayesian Clustering for Gene Expression Data - PowerPoint PPT Presentation

ContextSpecific Bayesian Clustering for Gene Expression Data

Data analysis methods are crucial for understanding such data ... MIG1 CCCCGC, CGGACC, ACCCCG. GAL4 CGGGCC. Others CCAATCA. mean expression level -2 -1 ... – PowerPoint PPT presentation