Using Bayesian Networks to Analyze Expression Data - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Using Bayesian Networks to Analyze Expression Data

Description:

A central goal of molecular biology is to understand the regulation of ... The PDAG (Partially DAG) structure uniquely represents an equivalence class of ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 22
Provided by: Kyubae8
Category:

less

Transcript and Presenter's Notes

Title: Using Bayesian Networks to Analyze Expression Data


1
Using Bayesian Networks to Analyze Expression Data
  • Nir Friedman, Michal Linial, Iftach Nachman, and
    Dana Peer
  • RECOMB' 00 
  • Talk by Kyu-Baek Hwang

2
Abstract
3
Introduction
  • A central goal of molecular biology is to
    understand the regulation of protein synthesis.
  • DNA microarray experiments can measure thousands
    of gene expression levels simultaneously.
  • An important challenge is to develop
    methodologies that are both statistically sound
    and computationally tractable.
  • Bayesian network learning.

4
An Example of a Simple Bayesian Network Structure
  • - Gene B and Gene D are independent given Gene A.
  • Gene B asserts dependency between Gene A and
    Gene E.
  • Gene A and Gene C are independent given Gene B.

5
Representing Distributions with Bayesian Networks
  • A Bayesian networks is a representation of a
    joint probability distribution.
  • A Bayesian network has two components.
  • G a directed-acyclic graph structure
  • ? a set of parameters for conditional
    distribution of each variable
  • The joint probability distribution of X1, , Xn
    is represented by Bayesian network as follows
  • where PaG(Xi) is the set of parents of Xi.

6
Equivalence Classes of Bayesian Networks
  • More than one graph can imply exactly the same
    set of independencies.
  • Theorem 2.1 Two DAGs are equivalent if and only
    if they have the same underlying undirected graph
    and the same v-structures.
  • The PDAG (Partially DAG) structure uniquely
    represents an equivalence class of network
    structures.

7
Learning Bayesian Networks
  • Given a training set D x1, , xN of
    independent instances of X, find a network B
    ltG, ?gt that best matches D.
  • The score function for a network is defined as,
  • where C is a constant independent of G and
  • is the marginal likelihood which averages the
    probability of the data over all possible
    parameter assignments to G.

8
Learning Causal Patterns
  • Causal networks have a stricter interpretation of
    the meaning of edges the parents of a variable
    are its immediate causes.

9
Analyzing Expression Data
  • Consider probability distributions over all
    possible states of the system in question.
  • Describe the state of the system using random
    variables.
  • These random variables include
  • Expression levels of individual genes,
  • Experimental conditions,
  • Temporal indicators, and
  • Background variables.

10
Representing Partial Models
  • Analyze the set of plausible networks and attempt
    to characterize features that are common to most
    of these networks.
  • Features
  • Markov relations Is Y in the Markov blanket of
    X?
  • Order relations Is X an ancestor of Y in all the
    networks of a given equivalence class?

11
Estimating Statistical Confidence in Features
  • To what extent does the data support a given
    feature?
  • An effective and relatively simple approach for
    estimating confidence is the bootstrap method.
  • For i 1, , m
  • Re-sample with replacement N instances from D.
    Denote by Di the resulting dataset.
  • Apply the learning procedure on Di to induce a
    network structure G.
  • For each feature f of interest calculate
  • where f(G) is 1 if f is a feature in G, and 0
    otherwise.

12
Efficient Learning Algorithms
  • Sparse Candidate algorithm
  • Identify a relatively small number of candidate
    parents for each variable based on simple local
    statistics (such as correlation).
  • Restrict search space to candidate parents.
  • Greedy search with restriction on the search
    space.
  • Score for the set of candidate parents

13
Local Probability Models
  • Multinomial model
  • Discretization of the expression levels
  • Linear Gaussian model
  • P(Xu1, u2, , uk) N(a0 ?iaiui, ?2)

14
Application to Cell Cycle Expression Patterns
  • The data of Spellman
  • 76 gene expression measurements of the mRNA
    levels of 6177 S. cerevisiae ORFs.
  • Six time series under different cell cycle
    synchronization methods.
  • Each measurement was treated as an independent
    sample from a distribution.
  • An additional root variable denoting the cell
    cycle phase.

15
The Learned Bayesian Network Structure
16
Robustness Analysis
  • Create a random data set by randomly permuting
    the order of the experiments independently for
    each gene.

17
Comparison of Multinomial Distribution and Linear
Gaussian Distribution
18
Biological Analysis of Order Relations
  • Dominant score of X is defined as
  • ?Y,Co(X,Y)gttCo(X,Y)k

19
Biological Analysis of Markov Relations
  • Multinomial experiment

20
Conditional Independence in the Network
21
Discussion and Future Work
  • A novel search algorithm
  • An approach for estimating statistical confidence
  • Discover causal relationships and interactions
    between genes
  • Probabilistic semantics
  • Future extensions
  • Local probability models
  • Estimating confidence levels
  • Biological knowledge as prior
  • Search heuristics
  • Temporal models
  • Discover hidden variables (e.g. protein
    activation)
Write a Comment
User Comments (0)
About PowerShow.com