Learning gene regulatory networks in Arabidopsis thaliana - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Learning gene regulatory networks in Arabidopsis thaliana

Description:

Iain Manfield, Phil Gilmartin. Institute of Integrative and Comparative Biology. David Westhead ... GRNs govern the functional development and biological ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 19
Provided by: bmb5
Category:

less

Transcript and Presenter's Notes

Title: Learning gene regulatory networks in Arabidopsis thaliana


1
  • Learning gene regulatory networks in Arabidopsis
    thaliana
  • Chris Needham, Andy Bulpitt
  • School of Computing
  • Iain Manfield, Phil Gilmartin
  • Institute of Integrative and Comparative Biology
  • David Westhead
  • Institute of Molecular and Cellular Biology

2
Gene Regulatory Networks
  • GRNs govern the functional development and
    biological processes of cells in all organisms.
  • GRNs are a representation that encapsulate all
    info about gene regulation
  • Incorporating time, conditions, development
  • We aim to learn transcription networks for
    components of Arabidopsis thaliana from gene
    expression microarray data.

3
Gene Expression Microarrays
transcription
translation
DNA
mRNA
protein
microarrays
genes
experiments
4
Arabidopsis thaliana
  • Plants are important
  • Arabidopsis
  • is the best annotated plant (poor rel. to yeast)
  • has excellent large uniform microarray dataset
  • has a large genome of 30000 genes with many
    large gene families duplications
  • has many mutants
  • analysis often not very successful
  • has many transcription factors (TFs)
  • what do they do?
  • even well characterised TFs are not
    fully-characterised

5
Arabidopsis GATA Factor genes
Night-phased Clock regulation
Light Up-regulated
Day-phased Clock regulation
Light Down-regulated
Inconsistent Clock regulation of GATA2 and GATA4
between experiments
6
Biological approach
  • The experimental biological work involved to
    discover regulatory networks is hard expensive
  • mutants in TFs
  • microarray experiments
  • time course experiments
  • How do poorly-characterised genes fit into
    well-characterised networks? such as
  • Light up-regulation, Light down-regulation,
    Clock, Abiotic stress

What can we get from the existing data?
7
Informatics approaches
Ordinary Differential Equations Dynamical
Systems Boolean networks Logical relations
between genes Bayesian networks Modelling a
stochastic system Friedman, Inferring cellular
networks using probabilistic graphical models.
Science 303(6). 2004. Review article. Imoto et
al. Combining microarrays and biological
knowledge for estimating gene networks via
Bayesian networks. CSB 2003. Incorporate prior
knowledge from protein-protein interactions,
protein-DNA interactions, gene networks and
literature. Analysis of Saccharomyces cerevisiae
gene expression data newly obtained by disrupting
100 genes, mainly transcription factors. Sachs
et al. Causal protein signalling networks derived
from multi-parameter single-cell data. Science
308(5721) 2005.
8
  • Meaningful gene regulatory networks can be
    learned from microarray data
  • without interventions
  • but using large datasets
  • publicly available
  • start to design before extra data collection

9
Data Arabidopsis thaliana
  • 2466 Microarrays (NASC) 25,000 genes
  • Filtering
  • Genes with low entropy are removed.
  • Can select a subset of genes to consider
  • Quantisation
  • Expression signal values discretised into 2 or 3
    classes.
  • Boundaries chosen to create classes with equal
    probability masses.

825
819
822
GATA2 AT2G45050
21.9
48.6
10
Bayesian networks
  • BNs are a framework for explaining causal
    relationships consisting of a set of variables
    connected by a set of directed edges
  • Probability calculus is used to describe the
    probabilistic relationship of each variable with
    its parents
  • The joint probability distribution over all the
    variables can be written as a product of
    conditional probability distributions
  • p(x1,xn) p(xipai)
  • where pai are the parents of xi

p(x1,,x7) p(x1)p(x2)p(x3)p(x4x1,x2,x3)p(x5x1,
x3)p(x6x4)p(x7x4,x5)
11
Conditional Probability Distributions
p(xipai)
Conditional probability tables for GATA4
Marginal probabilities for GATA4
12
Structure Learning
  • Aim is to find the model (network structure) that
    has the maximum likelihood for a given set of
    genes (nodes)
  • For a given set of genes, likelihood L
    P(DS,?S) is the probability of the data D being
    generated by the model
  • To search for a good model structure, a greedy
    learning algorithm is used. From an initial
    network, edges are added, reversed or deleted
    until an optimum is reached.

Learned structure S arg maxS ln p(D?S,S) ½
d ln N
The BIC score has a measure of how well the model
fits the data, and a penalty term to penalise
model complexity. ?S is an estimate of the model
parameters for the structure S, d is the number
of model parameters, and N is the size of the
dataset.
13
Conditional Independence
  • The different structures encode the conditional
    independences between the genes.
  • Causality the directionality of the arrows can
    be determined when they lead into a v-structure
    the gene at the v depends on all of its parents.
  • Otherwise, the direction of the causal relation
    between genes cannot be discovered from data
    alone. Interventions can be used.
  • i.e. test using mutants in the respective genes
    to see which gene is mis-regulated in which
    mutant. (transcript levels)

14
Method
An initial set of key genes of interest is chosen
and a network structure inferred e.g.
Circadian clock regulated
To this model a number of genes may be
added. Genes are added separately
. . .
Either all genes, or a selection
The structure learning algorithm is applied to
each set of genes, finding the GRN which is most
likely to have generated the data
. . .
The best network structure is chosen, and the
gene is added to the model
15
Results
  • Meaningful gene regulatory networks can be
    learned from microarray data
  • without interventions
  • but using large datasets
  • publicly available
  • start to design before extra data collection

16
Predictive models
Figure 2. Given information about the state of a
genes expression level (or set of genes), the
marginal probability of any other gene (or set of
genes) being in a particular state may be
calculated. Fixing of the value of a gene (in
this case through growing a specific mutant)
allows predictions about the likely values of
other genes to be made and tested experimentally
to verify the predictive model of the GRN. This
figure shows the change in marginal likelihood of
each gene (y-axis) in Figure 1 when one other
genes value is fixed (x-axis), based on real
data, and the learned network in Figure 1. Dark
values show greatest expected change in
expression levels, whereas white values show
little observable change.
Figure 1. Bayesian network of the transcription
network for forty genes identified in light/clock
regulation of selected GATAs from the literature.
17
Future Computation
  • New structure learning algorithms
  • Strength of connections
  • Selecting relevant experiments
  • Effect of discretisation
  • Sensitivity to noise

18
Future Biology
  • We wish to learn GRNs in order to form hypotheses
    about possible roles of a gene and likely
    redundant genes.
  • Main aim is to reduce the number of related genes
    to be screened for experimental verification of
    findings.
  • Look for mis-regulation of genes predicted to be
    downstream of e.g. well characterised regulators.
  • Make mutants of poorly characterised genes and
    look for mis-regulation of gene expression or
    other phenotype.
  • Carry these predictions from this model organism
    to a crop plant, e.g. rice, where many of the
    regulatory components are conserved.

19
Conclusions
20
Acknowledgments
  • Paul Devlin, Enrique Lopez RHUL.
  • NASCArrays team.
  • People contributing samples for array analysis at
    NASC.
  • BBSRC, University of Leeds.

21
  • Extra slides
  • And
  • Slides pre-empting questions.

22
Benchmarks for assessment of network accuracy
23
Generating testable hypotheses
  • Can we generate hypotheses using gene expression
    data?
  • Genevestigator Tool
  • OK for small numbers of genes
  • ACT Arabidopsis Co-expression Tool
  • co-expressed ? co-regulated
  • What is the regulator?

24
Arabidopsis thaliana
  • Many transcription factors (TFs)
  • what do they do?
  • even well characterised TFs are not fully-
    characterised
  • Many mutants
  • analysis often not very successful

25
Co-expression and Co-regulation
Promoter motif over-representation
26
GATA factors and abiotic stress
27
Meristem de-etiolation arrays
  • Etiolated (dark-grown) seedlings.
  • Time course array analysis of meristem and
    cotyledons following illumination.
  • Expression of selected GATA genes.
  • Enrique Lopez, Royal Holloway.

28
Co-expression scatter plots
GATA2 and GATA4 are co-expressed with phyA but
not with other phy genes. Expression of most
genes show similar correlation of expression with
GATA2 and GATA4, suggesting conservation of
expression pattern following gene duplication.
GATA9 and 12 do not show co-expression with any
of the well-characterised genes seen with GATA2
and 4. Divergence of expression following
duplication is indicated by correlation of some
genes with GATA9 but not with GATA12, perhaps
leading to sub-functionalization.
GATA21 (GNC) and 22 are co-expressed with lhy and
cca1, suggesting these GATA genes may fit within
characterised pathways. Expression of most genes
is better correlated with GATA21 than GATA22.
This may reflect a subtle divergence of
expression pattern for these duplicated genes.
29
AtGATA gene conservation
30
Expression divergence
31
Phenotypes of GATA mutants
32
Leave one out network learning
Clock genes CCA1, LHY, TOC1, GI, ELF3
ELF4. Subsidiary list CBF1, COL1, PHYA, PIF3
HY5.
33
Networks from expression correlation r-values
  • A set of genes from a microarray experiment.
  • Find the r-values for correlation of expression
    between all these genes.
  • Connect genes with high r-values.
  • Gordon Breen, University of Bristol

34
Results
  • Meaningful gene regulatory networks can be
    learned from microarray data
  • without interventions
  • but using large datasets
  • publicly available
  • start to design before extra data collection

35
Well-characterised networks
  • Light up-regulation
  • Light down-regulation
  • Clock
  • Abiotic stress
  • How do poorly-characterised genes fit into these
    networks?
Write a Comment
User Comments (0)
About PowerShow.com