Outline - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Outline

Description:

Outline Biological motivation Introduction to graph models and Bayesian network Case study Module networks: identifying regulatory modules and their condition ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 29
Provided by: biostatPi
Category:
Tags: outline

less

Transcript and Presenter's Notes

Title: Outline


1
  • Outline
  • Biological motivation
  • Introduction to graph models and Bayesian network
  • Case study
  • Module networks identifying regulatory modules
    and their condition-specific regulators from gene
    expression data Segal, Shapira, Regev, Peer,
    Botstein, Koller, Friedman. Nature Genetics. 2003
  • Large-scale mapping and validation of E. Coli
    transcriptiional regulation from a compendium of
    expression profiles. PLoS Biology. 2007.

2
1. Biological motivation
  • cis-regulatory motif A short (6-to-12-ish)
    series of DNA bases that can bind to an
    activator or repressor protein. Illustrated
    at right as activator/repressor binding sites.

3
1. Biological motivation
  • Module set of genes that participate in a
    coherent biological process
  • Module group set of modules that all share at
    least one cis-regulatory motif
  • regulator a gene that encodes a protein whose
    concentration regulates the expression of other
    genes
  • expression profile concentrations of various
    genes in given bio-experimental circumstances

4
1. Biological motivation
5
2. Introduction to Bayesian Network
"Graphical models are a marriage between
probability theory and graph theory. They provide
a natural tool for dealing with two problems that
occur throughout applied mathematics and
engineering -- uncertainty and complexity.
..Fundamental to the idea of a graphical model
is the notion of modularity -- a complex system
is built by combining simpler parts. Probability
theory provides the glue whereby the parts are
combined, ensuring that the system as a whole is
consistent. Many of the classical multivariate
probabalistic systems are special cases of the
general graphical model formalism -- examples
include mixture models, factor analysis, hidden
Markov models, Kalman filters and Ising
models...the graphical model formalism provides
a natural framework for the design of new
systems." --- Michael Jordan, 1998.
6
2. Introduction to Bayesian Network
P(C)0.5, P(-C)0.5
Cloudy
Sprinkler
Rain
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
7
2. Introduction to Bayesian Network
  • Graphs in which nodes represent random variables
    (cloudy? sprinkler? rain? wet grass)
  • Arrows represent conditional independence
    assumptions. (e.g. P(WS,R,C)P(WS,R))
  • Present absent arrows provide compact
    representation of joint probability distributions
  • BNs have complicated notion of independence,
    which takes into account the directionality of
    the arrows

8
Bayes Rule
2. Introduction to Bayesian Network
  • Can rearrange the conditional probability formula
  • to get P(AB) P(B) P(A,B), but by symmetry we
    can also get P(BA) P(A) P(A,B) It follows
    that
  •  The power of Bayes' rule is that in many
    situations where we want to compute P(AB) it
    turns out that it is difficult to do so directly,
    yet we might have direct information about
    P(BA). Bayes' rule enables us to compute P(AB)
    in terms of P(BA).

9
2. Introduction to Bayesian Network
Need prior P for root nodes and conditional Ps,
that consider all possible values of parent
nodes, for nonroot nodes.
Cloudy
P(C)0.5, P(-C)0.5
Sprinkler
Rain
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
10
Major benefit of BN
2. Introduction to Bayesian Network
  • We can know P(W) based only on the conditional
    probabilities of W and its parent nodes (R and
    S). We dont need to know/include all the
    ancestor probabilities between W and the root
    nodes (C) .

Cloudy
Sprinkler
Rain
Wet Grass
11
This BN benefit hugely reduces of numbers and
computations needed for large networks, e.g.
hundreds or thousands of genes
2. Introduction to Bayesian Network
  • SSR article many separate Bayesian networks
    generated based on gene expression data. Here one
    activator and one repressor form basic BN, with 3
    corresponding expression contexts shown at
    bottom.

12
2. Introduction to Bayesian Network
Cloudy
P(C)0.5, P(-C)0.5
Order of reduction of required numbersReduce
from 24-115 to 9
Sprinkler
Rain
P(RC)0.8, P(-RC)0.2 P(R-C)0.2, P(-R-C)0.8
P(SC)0.1, P(-SC)0.9 P(S-C)0.5, P(-S-C)0.5
Wet Grass
P(WS,R)0.99, P(-WS,R)0.01 P(WS,-R)0.9,
P(-WS,-R)0.1 P(W-S,R)0.9, P(-W-S,R)0.1 P(W-
S,-R)0, P(-W-S,-R)1
From http//www.cs.ubc.ca/murphyk/Bayes/bnintro.
html
13
2. Introduction to Bayesian Network
Bayesian network general formulation
Given a graph G, the likelihood of observing the
data D
De Jong (2002)
14
Evaluating Bayesian networks
2. Introduction to Bayesian Network
  • Generally NP hard!

Where do the numerical estimates of probability
come from?
  • Can be, at least initialized with, expert opinion
  • Can be learned by system
  • Both SSR and BSK articles lay out basics and some
    details of iterative algorithms for finding
    probability numbers.

15
Modellinging regulatory network
2. Introduction to Bayesian Network
De Jong (2002)
16
3. Case study
  • Module networks identifying regulatory modules
    and their condition-specific regulators from gene
    expression data
  • Segal, Shapira, Regev, Peer, Botstein, Koller,
    Friedman SSR
  • Nature Genetics, June 2003
  • Bayesian network-based algorithms are applied to
    gene expression data to generate good testable
    hypotheses.

17
3. Case study
  • Expression data set, from Patrick Browns lab, is
    for genes of yeast subjected to various kinds of
    stress
  • Compiled list of 466 candidate regulators
  • Applied analysis to 2355 genes in all 173 arrays
    of yeast data set
  • This gave automatic inference of 50 modules of
    genes
  • All modules were analyzed with external data
    sources to check functional coherence of gene
    products and validity of regulatory program
  • Three novel hypotheses suggested by method were
    tested in bio lab and found to be accurate

18
3. Case study
19
3. Case study
20
3. Case study
  • 2 examples of 50 modules inferred by SSR methods
  • Respiration mostly genes encoding respiration
    proteins or glucose-metabolism proteins. One
    primary regulator predicted Hap4 which is
    known from past experiments to play activation
    role in respiration. Secondary regulators affect
    Hap4 expression.
  • Nitrogen catabolite repression 29 genes tied to
    process by which yeast uses best available
    nitrogen source. Key regulator suggested is Gat1,
    due to 26 of 29 genes having Gat1 regulatory
    motif in their upstream regions.

21
3. Case study
Respiration Network
Two major motifs found
22
3. Case study
  • Evaluating module content and regulation programs
  • All 50 modules were tested to see if proteins
    coded in same module had related functions
  • Scored modules on how many genes are noted in
    current bio databases as being related to the
    predicted function diagram, next slide
  • 31 of 50 modules had coherence gt50 only 4 had
    coherence lt30.

23
3. Case study
  • Colored boxes indicate that known experimental
    evidence validates the predicted regulatory role
    of a regulator (named in one of the Reg
    columns) in a given module (each row of the
    table).
  • M, C and G column headers and different colors of
    boxes represent different sorts of experimental
    evidence that validate the models prediction.
  • C() functional coherence of module, from
    literature mentions of module genes.
  • G number of genes in module

24
3. Case study
  • To find global relationships between modules,
    graph (next 2 slides) made showing modules
    their motifs. Motifs were found within the 500
    base pairs upstream from each gene.
  • Observations from this graph modules with
    related biological functions often shared at
    least one motif, sometimes shared one or more
    regulator genes.

25
Module relationships
3. Case study
Yeast mutants of Kin82, Ppt1, Ypl230W were
further tests to validate their relationship with
the module.
26
3. Case study
27
What does a BN look like here?
3. Case study
  • Need to specify two things to describe a BN
  • Graph topology (structure)
  • Parameters of each conditional probability
    distribution
  • Possible to learn both from data
  • Learning structure is much harder than learning
    parameters

28
What can we learn?Why the Segals paper can be
successful?
3. Case study
  • Yeast! Not mouse or human.
  • Use gene clustering first. Start from simplified
    hypothesis and a small set of known regulators
    not to attempt a network of thousands of genes.
  • Experimental validation.
Write a Comment
User Comments (0)
About PowerShow.com