Using Bayesian Networks to Analyze Expression Data - PowerPoint PPT Presentation

About This Presentation
Title:

Using Bayesian Networks to Analyze Expression Data

Description:

Understanding regulatory processes is a central problem of ... Example: Pedigree. A node represents. an individual's. genotype. Homer. Bart. Marge. Lisa ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 29
Provided by: nirf
Category:

less

Transcript and Presenter's Notes

Title: Using Bayesian Networks to Analyze Expression Data


1
Using Bayesian Networks to Analyze Expression Data
  • N. Friedman M. Linial I. Nachman D. Peer
  • Hebrew University, Jerusalem

2
Central Dogma
Translation
Protein
Cells express different subset of the genes In
different tissues and under different conditions
3
Gene Regulation
  • Regulation of expression of genes is crucial
  • Regulation occurs at many stages
  • pre-transcriptional (chromatin structure)
  • transcription initiation
  • RNA editing (splicing) and transport
  • Translation initiation
  • Post-translation modification
  • RNA Protein degradation
  • Understanding regulatory processes is a central
    problem of biological research

4
Microarrays (aka DNA chips)
  • New technological breakthrough
  • Measure RNA expression levels of thousands of
    genes in one experiment
  • Measure expression on a genomic scale
  • Opens up new experimental designs
  • Many major labs are using,or will use this
    technology in the near future

5
The Problem
Genes
j
Experiments
i
  • Goal
  • Learn regulatory/metabolic networks
  • Identify causal sources of the biological
    phenomena of interest

6
Analysis Approaches
  • Clustering of expression data
  • Groups together genes with similar expression
    patterns
  • Does not reveal structural relations
    between genes
  • Boolean networks
  • Deterministic models of the logical interactions
    between genes
  • Deterministic, impractical for real data

7
Example Cell-Cycle Data Spellman et al
Cell cycle stages
clusters
8
Our Approach
  • Characterize statistical relationships between
    expression patterns of different genes
  • Beyond pair-wise interactions
  • Many interactions are explained by intermediate
    factors
  • Regulation involves combined effects of several
    gene-products
  • We build on the language of Bayesian networks

9
Network Example
  • Noisy stochastic process
  • Example Pedigree
  • A node represents an individualsgenotype
  • Modeling assumptions
  • Ancestors can effect descendants' genotype only
    by passing genetic materials through intermediate
    generations

10
Network Structure
Ancestor
  • Generalizing to DAGs
  • A child is conditionally independent from its
    non-descendents, given the value of its parents
  • Often a natural assumption for causal processes
  • if we believe that we capture the relevant state
    of each intermediate stage.

Parent
Non-descendent
Non-descendent
Descendent
11
Local Probabilities
  • Associated with each variable Xi is a conditional
    probability distribution P(XiPai?)
  • Discrete variables Multinomial distribution
  • Continuous variables Choice for example
    linear Gaussian

12
Bayesian Network Semantics
Qualitative part DAG specifies
conditional independence statements
Quantitative part local probability models
Unique joint distribution over domain

  • Compact efficient representation
  • ? k parents ?? O(2kn) vs. O(2n) params
  • parameters pertain to local interactions

P(C,A,R,E,B) P(B)P(EB)P(RE,B)P(AR,B,E)P(C
A,R,B,E)
versus P(C,A,R,E,B) P(B)P(E)
P(RE) P(AB,E) P(CA)
13
Why Bayesian Networks?
  • Bayesian Networks
  • Flexible representation of dependency structure
    of multivariate distributions
  • Natural for modeling processes with local
    interactions
  • Learning of Bayesian Networks
  • Can learn dependencies from observations
  • Handles stochastic processes
  • true stochastic behavior
  • noise in measurements

14
Modeling Biological Regulation
  • Variables of interest
  • Expression levels of genes
  • Concentration levels of proteins
  • Exogenous variables Nutrient levels, Metabolite
    Levels, Temperature,
  • Phenotype information
  • Bayesian Network Structure
  • Capture dependencies among these variables

15
Examples
  • Interactions are represented by a graph
  • Each gene is represented by a node in the graph
  • Edges between the nodes represent direct
    dependency

16
More Complex Examples
  • Dependencies can be mediated through other nodes
  • Common effects can imply conditional dependence

B
A
C
B
A
C
Common cause
Intermediate gene
17
Outline of Our Approach
Bayesian Network Learning Algorithm
Expression data
Use learned network to make predictions about
structure of the interactions between genes
18
Learning With Many Variables
Sparse Candidate algorithm - efficient heuristic
search that relies on sparseness
  • Choose candidate set for direct influence for
    each gene
  • Find optimal BN constrained on candidates
  • Iteratively improve candidate set

19
Experiment
  • Data from Spellman et al. (Mol.Bio. of the Cell
    1998).
  • Contains 76 samples of all the yeast genome
  • Different methods for synchronizing cell-cycle in
    yeast.
  • Time series at few minutes (5-20min) intervals.
  • Spellman et al. identified 800 cell-cycle
    regulated genes.

20
Methods
  • Experiment 1 discretized data into 3 levels
  • Learn multinomial probabilities
  • Experiment 2
  • Learn linear interactions (w/ Gaussian noise)
  • No prior biological knowledge was used

21
Network Learned
22
Challenge Statistical Significance
  • Sparse Data
  • Small number of samples
  • Flat posterior -- many networks fit the data
  • Solution
  • estimate confidence in network features
  • Two types of features
  • Markov neighbors X directly interacts with Y
  • Order relations X is an ancestor of Y

23
Confidence Estimates
D1
Bootstrap approachFGW, UAI99
Learn
resample
D2
E
B
D
Learn
resample
R
A
C
...
resample
E
B
Dm
Learn
R
A
C
Estimate
24
Testing for Significance
  • We run our procedure on randomized data where we
    reshuffled the order of values for each gene
  • Histograms of number of Markov features at each
    confidence level

Randomized Data
Original Data
25
Testing for Significance
  • We run our procedure on randomized data where we
    reshuffled the order of values for each gene

Markov w/ Gaussian Models
4000
3500
3000
2500
2000
Features with Confidence above t
1500
1000
500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
26
Testing for Significance
Markov w/ Multinomial Models
1400
1200
1000
800
Features with Confidence above t
600
400
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
27
Local Map
28
Finding Key Genes
  • Key gene a gene that preceeds many other genes
  • YLR183C
  • MCD1 Mitotic Chromosome Determinant
  • RAD27 DNA repair protein
  • CLN2 role in cell cycle START
  • SRO4 involved in cellular polarization during
    budding
  • YOX1 Homeodomain protein that binds leu-tRNA gene
  • POL30 required for DNA replication and repair
  • YLR467W
  • CDC5
  • MSH6 Homolog of the human GTBP protein
  • YML119W
  • CLN1 role in cell cycle START

29
Strong Markov Relations
30
Future Work
  • Finding suitable local distribution models
  • Temporal aspect - DBN
  • Correct handling of hidden variables
  • Can we recognize hidden causes of coordinated
    regulation events?
  • Incorporating prior knowledge
  • Incorporate large mass of biological knowledge,
    and insight from sequence/structure databases
  • Abstraction
  • Combine with cluster analysis

31
Future Work -- Causality
  • Greatest promise -- dealing with causality
  • Biological techniques allow to experiment with
    interventions
  • These clearly provide better handle on causal
    interactions
  • Goal Learning from observations interventions
  • How to learn causality from knockout experiments?
  • How to plan such experiments?

32
Sensitivity Analysis Genes in Analysis
  • To find sensitivity to removal of genes, we run
    our procedure on a subset of 250 genes

Markov
Order
800 Gene dataset
250 Gene dataset
Write a Comment
User Comments (0)
About PowerShow.com