Linear Modeling of Genetic Networks from Experimental Data - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Linear Modeling of Genetic Networks from Experimental Data

Description:

Linear Modeling of Genetic Networks. from Experimental Data ... Linear models were used to represent the relationships between the resulting gene-prototypes. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 27
Provided by: Kyubae8
Category:

less

Transcript and Presenter's Notes

Title: Linear Modeling of Genetic Networks from Experimental Data


1
Linear Modeling of Genetic Networksfrom
Experimental Data
  • E.P. van Someren, L.F.A. Wessels and M.J.T.
    Reinders
  • ISMB 00.
  • Talk by Kyu-Baek Hwang

2
Abstract
  • Topic
  • Modeling regulatory interactions between genes
  • Linear genetic networks
  • Gene expression data
  • The dimensionality problem (contribution of this
    paper)
  • The number of genes gtgt the number of measured
    time points ? many solutions that fit the
    training data
  • Prototypical genes (by clustering) ? biological
    genetic networks are sparse and redundant.
  • Experiments
  • An artificial dataset
  • S. cerevisiae yeast cell-cycle dataset

3
Exploitation ofDNA Microarray Datasets
  • DNA microarray ? simultaneous measurements on the
    expression levels of thousands of genes
  • Infer functionality of genes based on this new
    massive datasets.
  • Clustering and pattern recognition techniques
    (NNs and SVMs)
  • The regulatory interactions between genes
  • Boolean networks, Bayesian networks, linear
    networks, neural networks, and differential
    equations
  • Data sparseness problem inherent in the analysis
    of microarray data ? as few parameters as possible

4
Linear Networks
  • The basic linear model
  • ?? 1
  • where xj(t) represents the activity level of gene
    j at time point t, ri,j represents how strongly
    gene i controls gene j and N is the total number
    of genes under consideration.
  • Prototypical genes ? hierarchical clustering
  • Tackling the dimensionality problem
  • Input and output sharing among genes involved
    within a gene family or pathway
  • Genes are estimated to interact with four to
    eight other genes.

5
The Modeling Approach
6
Preprocessing Step I Thresholding
  • Eliminate insignificant signals (genes).
  • Due to experimental noises
  • Gene expression levels in different cultures
    under similar conditions
  • Vary up to ratio of two
  • Gene expression levels in different cultures
    under different conditions
  • Vary up to ratio of two to five
  • Genes with profiles that remain below an absolute
    value of two ? do not participate in regulation
  • Reduce the dimensionality problem.
  • Avoid learning erroneous relationships.

7
Preprocessing Step II Normalization
  • If two signals share the actually (?) same
    characteristics, these two signals should be very
    similar after normalization. (Euclidean distance
    vs. Pearson correlation)

Used in the experiments
8
The Linear Model Calls Clustering
  • A set of measurements of gene expression levels
    at consecutive time points.
  • The linear model is learned by Gaussian
    elimination.
  • P a particular solution
  • H a basis of homogeneous solutions
  • F a set of free variables

More details on the whiteboard.
9
Clustering ? Prototypes
  • Find groups (clusters) of signals based on the
    similarity.
  • Conceptualize the data by representing each
    cluster with a proper prototype.
  • Selection of distance measuring metric is very
    important.
  • Clustering method complete linkage hierarchical
    clustering based on the Euclidean distance
    measure.
  • Prototype
  • Reduction of noises in the gene expression levels
  • The mean value of all the signals in one cluster
    (RMS)

10
Prototypes
  • Transforming signals to prototypes
  • The inverse

11
Experiments
12
An Artificial Linear System
  • An artificial linear system with five genes.
  • R5 matrix in graphical representation.

13
Expansion of the System
  • Replication of R5 to R25 matrix.
  • The (i, j)-th 5 ? 5 sub-matrix in R25 is
    constructed by placing r5i, j on the diagonal
    with all other positions in the sub-matrix
    occupied by zeros.

14
Time Response for the 25 ? 25 System
  • Initial values of genes in the same cluster were
    set to the more similar values than the values of
    genes in the other clusters. (20 time points)

15
Estimation of the Model (1/2)
  • Experimental steps (k 1 25)
  • 1. The set of prototypes, Yk associated with the
    clusters in Ck was determined.
  • 2. The weight matrix, , corresponding to
    each clustering was determined from Yk.
  • 3. Given the complete model and the initial
    state, approximations to the original signals can
    be computed as follows
  • One-step approximation
  • Free-run approximation

16
Estimation of the Model (2/2)
  • Experimental steps (continued)
  • 4. The mean squared errors (MSE) were computed.
    (Eos,k, Efr,k)
  • 5. The weighted prototype MSE, Ewp,k is computed.

17
Error Curves
  • Error curves as a function of the number of
    clusters

18
The Resulting Model
  • The resulting model (analysis of multiple causes
    is not easy.)

19
Real Experiments Yeast Data Set
  • Gene expression profiles extracted from the 2467
    genes in the budding yeast S. cerevisiae by
    Eisen.
  • Considered conditions
  • Mitotic cell division cycle, sporulation and
    temperature and reducing shocks
  • Thresholding
  • For the ALPHA subset 18 time points with 45
    genes.
  • For the CDC15 subset 15 time points with 113
    genes.

20
The Effect of Normalization
  • Normalization
  • If the cluster size is equal to or greater than
    one less than the time step size, the prototype
    free run error is zero. (the limit of
    over-constrained condition)

The one-step MSE of all datasets and the four
kinds of normalization
21
Error Curves on the CDC15 Dataset
  • Fitting the linear model on the CDC15 subset.
  • The error curve

22
The Resulting Model on CDC15
  • The resulting model

23
Error Curves on the ALPHA Dataset
  • Fitting the linear model on the ALPHA subset.
  • The error curve

24
The Resulting Model on ALPHA
  • The resulting model

25
Resulting Model without Normalization
  • The resulting model without normalization (more
    reasonable ?)

Mating
26
Summary
  • The dimensionality problem was tackled by
    clustering.
  • Biologically sound.
  • Linear models were used to represent the
    relationships between the resulting
    gene-prototypes.
  • Good balance between the model complexity and the
    accuracy.
  • No intrinsic semantics was found.
Write a Comment
User Comments (0)
About PowerShow.com