Modeling regulatory networks in cells using Bayesian networks - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling regulatory networks in cells using Bayesian networks

Description:

Measure the expression levels of thousands of genes in a cell under specific ... nuclear proteins, but also cytoplasm membrane proteins (budding and sporulation) ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 31
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: Modeling regulatory networks in cells using Bayesian networks


1
Modeling regulatory networks in cells using
Bayesian networks
Golan Yona
Department of Computer Science Cornell University
2
Outline
  • Regulatory networks
  • Expression data
  • Bayesian Networks
  • What
  • Why
  • How
  • Learning networks from expression data
  • Using Bayesian networks to analyze expression
    data (Friedman et al)

3
Regulatory networks
  • KEGG Regulatory
  • Pathways

4
(No Transcript)
5
Metabolic pathways
  • KEGG Metabolic
  • Pathways

6
(No Transcript)
7
Expression Arrays
  • Measure the expression levels of thousands of
    genes in a cell under specific conditions (e.g.
    cell cycle) simultaneously
  • Each cell has the same genomic data but different
    subsets of proteins are being expressed in
    different cells and at the same cell under
    different conditions.
  • Protein level is controlled by controlling
  • transcription initiation
  • mRNA transcription
  • mRNA transport
  • splicing
  • post-translational modifications
  • degradation of mRNA and proteins.
  • Microarray measure the level of mRNA, thus
    providing an indirect evidence for the control of
    protein levels

8
Micro Spotting pin
9
(No Transcript)
10
  • Some are over-expressed (red), some
    under-expressed (green) measured with respect to
    a control group of genes (fixed genes)
  • Different pathways are activated under different
    conditions

11
Goals
  • Recover protein interactions and sub-networks
    that correspond to regulatory networks in the
    cell.
  • Basic assumption some genes are dependent on
    others while others exhibit independence or
    conditional independence
  • The means Bayesian networks. Capable of modeling
    the statistical dependencies between different
    variables (genes)
  • Different from clustering analysis ..
  • Applicable when the dependency between genes is
    local
  • Problems data is noisy, partial, sometimes
    misleading (translation, activation), not enough
    to ensure statistically significant models, time
    scale

12
Bayesian Networks
  • A compromise between the assumption of complete
    dependency and complete conditional independence
    (Naïve Bayes)
  • Less constraining yet still tractable
  • We know something about the statistical
    dependencies between features but not necessarily
    about the type of the underlying distributions

13
Example
Oil pressure In engine
Oil temp.
Fan speed
Engine temp.
Air pressure In tire
Coolant temp.
Smoke
14
Bayesian Networks
  • Also called belief nets
  • A graph description of dependencies and
    independencies between variables. Each node
    corresponds to a variable (gene).
  • The graph is directed and acyclic
  • The variables are discrete
  • A variable can take on a value from a set of
    values a1,a2, e.g. on/off
  • The probability of a specific value P(ai) and ?i
    P(ai) 1
  • A link joining node A to node C is directional
    and represents the set of conditional
    probabilities P(cj/ai) causality (the
    probability that C is on when A is off)
  • The network is described in terms of its nodes
    and edges and the conditional probability
    distributions associated with each node.

15
Network structure
  • For every node A
  • The parents of A is the set of immediate
    predecessors of A
  • The children of A is the set of immediate
    successors of A
  • B is a descendant of A if there is a directed
    path from A to B
  • Conditional probability
  • Network assertions
  • The value of a variable depends on its parents
  • A variable is conditionally independent of it
    non-descendants given its parents

Parents
Eng cold, Eng cold, Eng hot, Eng hot Fan
fast Fan slow Fan fast Fan slow High 0.1 0.4 0.6 0
.9 Low 0.9 0.6 0.4 0.1
Coolant temp.
16
Calculating the probability of an assignment
  • The network describes the joint probability
    distribution of all variables (some conditionally
    independent and some are not)
  • Depends on the structure!
  • The probability of a specific assignment of
    values y1,y2,yn for the variables Y1,Y2,,Yn
  • This is the likelihood of the data given the
    model.
  • All you need to know is..

17
Learning Bayesian network from data
  • Given the data set with specific assignments for
    variables (on/off for each gene), how can we find
    the most probable network structure that explains
    the data (the best match to the data)?
  • How to quantify a match?
  • Note that there are two aspects of the network
    that we need to learn
  • Structure (nodes, edges)
  • Conditional probability distributions
  • Common strategy assign a score to each network G

18
  • Common strategy assign a score to each network G
  • Pick the network that maximizes the score

Likelihood prior
19
Learning
  • The likelihood of the data given the model is
    estimated by averaging over all possible
    assignments of parameters (conditional
    probabilities) to G
  • Summation over all possible assignments for
    conditional probabilities. The major contribution
    is from the set estimated from the data
  • Given a specific structure, for every node we
    lookup its parents and calculate the empirical
    conditional probability distribution

20
Model selection
  • The second term (log prior) is a measure for the
    complexity of the model (through uncertainty)
  • Occam razor entia non sunt multiplicanda
    praeter necessitatem(thou shall not multiply
    entities)
  • MDL principle
  • In the papers discussed here it is being ignored

21
In search for the best network
  • In theory test different structures, calculate
    the probability of assignment to variables for
    each network structure, and output the network
    that maximizes the likelihood of the data given
    the network.
  • Impossible in practice the number of possible
    networks over n genes is
  • For the yeast genome with 6000 genes this is gt
    105,000,000

22
Possible solution
  • Apply a heuristic local greedy search Start
    with a random network and locally improve it, by
    testing perturbations over the original
    structure.
  • Test one edge at a time, by adding, removing or
    reversing the edge, and testing its affect on the
    score. If the score improves - accept

23
How to learn from expression data
  • Two types of features learned from multiple
    networks
  • First - a gene Y is in the Markov blanket of X
    (two genes are involved in the same biological
    process. No other gene mediates the dependence)
  • Problem of unobserved variables that can
    intermediate the interaction
  • Second type a gene X is ancestor of Y (based on
    all networks that are learned)

24
Application to the Yeast Cell cycle data
  • Expression level measurements for 6177 genes
    along different time points in six cell cycles
    altogether 76 measurements for each gene
  • Only 800 genes vary during cell cycle and 250
    cluster into 8 fairly distinct classes.
  • Networks are learned for the 800 genes
  • Confidence values based on the set of networks
    learned from different bootstrap sets

25
Typical sub-network
26
Biological significance
  • Order relations there are a few dominant genes
    that appear before many others, e.g. genes that
    are involved in cell cycle control and initiation.

27
  • Most are nuclear proteins, but also cytoplasm
    membrane proteins (budding and sporulation)
  • Some DNA repair proteins (prerequisite for
    transcription)
  • RSR1 initiator of signal trunsduction cascades
    in the cell

28
Biological significance
  • Markov connection functionally related

29
  • Most pairs have similar functions (verified
    sometimes through transitivity)
  • Some are physically adjacent on the chromosome
  • Some relations cannot be detected directly from
    expression data
  • Detect conditional independence group of genes
    that are expressed similarly, but one is a parent
    of all others and there are no connections
    between the others the parent is a control gene
    (e.g. CLN2 early cell cycle control gene, that
    controls RNR3, SVS1, SRO4 and RAD41 that are
    functionally unrelated).

30
Conclusions
  • A powerful tool, but
  • not enough data
  • Computational problems
  • Learning algorithms
  • Authors decompose networks into basic elements
    again
  • Many possible extensions
Write a Comment
User Comments (0)
About PowerShow.com