Linear Modeling of Genetic Networks from Experimental Data - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Linear Modeling of Genetic Networks from Experimental Data

Description:

Linear Modeling of Genetic Networks. from Experimental Data ... Linear models were used to represent the relationships between the resulting gene-prototypes. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 27

Provided by: Kyubae8

Category:

more less

Transcript and Presenter's Notes

Title: Linear Modeling of Genetic Networks from Experimental Data

1
Linear Modeling of Genetic Networksfrom
Experimental Data

E.P. van Someren, L.F.A. Wessels and M.J.T.
Reinders
ISMB 00.
Talk by Kyu-Baek Hwang

2
Abstract

Topic
Modeling regulatory interactions between genes
Linear genetic networks
Gene expression data
The dimensionality problem (contribution of this
paper)
The number of genes gtgt the number of measured
time points ? many solutions that fit the
training data
Prototypical genes (by clustering) ? biological
genetic networks are sparse and redundant.
Experiments
An artificial dataset
S. cerevisiae yeast cell-cycle dataset

3
Exploitation ofDNA Microarray Datasets

DNA microarray ? simultaneous measurements on the
expression levels of thousands of genes
Infer functionality of genes based on this new
massive datasets.
Clustering and pattern recognition techniques
(NNs and SVMs)
The regulatory interactions between genes
Boolean networks, Bayesian networks, linear
networks, neural networks, and differential
equations
Data sparseness problem inherent in the analysis
of microarray data ? as few parameters as possible

4
Linear Networks

The basic linear model
?? 1
where xj(t) represents the activity level of gene
j at time point t, ri,j represents how strongly
gene i controls gene j and N is the total number
of genes under consideration.
Prototypical genes ? hierarchical clustering
Tackling the dimensionality problem
Input and output sharing among genes involved
within a gene family or pathway
Genes are estimated to interact with four to
eight other genes.

5
The Modeling Approach
6
Preprocessing Step I Thresholding

Eliminate insignificant signals (genes).
Due to experimental noises
Gene expression levels in different cultures
under similar conditions
Vary up to ratio of two
Gene expression levels in different cultures
under different conditions
Vary up to ratio of two to five
Genes with profiles that remain below an absolute
value of two ? do not participate in regulation
Reduce the dimensionality problem.
Avoid learning erroneous relationships.

7
Preprocessing Step II Normalization

If two signals share the actually (?) same
characteristics, these two signals should be very
similar after normalization. (Euclidean distance
vs. Pearson correlation)

Used in the experiments
8
The Linear Model Calls Clustering

A set of measurements of gene expression levels
at consecutive time points.
The linear model is learned by Gaussian
elimination.
P a particular solution
H a basis of homogeneous solutions
F a set of free variables

More details on the whiteboard.
9
Clustering ? Prototypes

Find groups (clusters) of signals based on the
similarity.
Conceptualize the data by representing each
cluster with a proper prototype.
Selection of distance measuring metric is very
important.
Clustering method complete linkage hierarchical
clustering based on the Euclidean distance
measure.
Prototype
Reduction of noises in the gene expression levels
The mean value of all the signals in one cluster
(RMS)

10
Prototypes

Transforming signals to prototypes
The inverse

11
Experiments
12
An Artificial Linear System

An artificial linear system with five genes.
R5 matrix in graphical representation.

13
Expansion of the System

Replication of R5 to R25 matrix.
The (i, j)-th 5 ? 5 sub-matrix in R25 is
constructed by placing r5i, j on the diagonal
with all other positions in the sub-matrix
occupied by zeros.

14
Time Response for the 25 ? 25 System

Initial values of genes in the same cluster were
set to the more similar values than the values of
genes in the other clusters. (20 time points)

15
Estimation of the Model (1/2)

Experimental steps (k 1 25)
1. The set of prototypes, Yk associated with the
clusters in Ck was determined.
2. The weight matrix, , corresponding to
each clustering was determined from Yk.
3. Given the complete model and the initial
state, approximations to the original signals can
be computed as follows
One-step approximation
Free-run approximation

16
Estimation of the Model (2/2)

Experimental steps (continued)
4. The mean squared errors (MSE) were computed.
(Eos,k, Efr,k)
5. The weighted prototype MSE, Ewp,k is computed.

17
Error Curves

Error curves as a function of the number of
clusters

18
The Resulting Model

The resulting model (analysis of multiple causes
is not easy.)

19
Real Experiments Yeast Data Set

Gene expression profiles extracted from the 2467
genes in the budding yeast S. cerevisiae by
Eisen.
Considered conditions
Mitotic cell division cycle, sporulation and
temperature and reducing shocks
Thresholding
For the ALPHA subset 18 time points with 45
genes.
For the CDC15 subset 15 time points with 113
genes.

20
The Effect of Normalization

Normalization
If the cluster size is equal to or greater than
one less than the time step size, the prototype
free run error is zero. (the limit of
over-constrained condition)

The one-step MSE of all datasets and the four
kinds of normalization
21
Error Curves on the CDC15 Dataset

Fitting the linear model on the CDC15 subset.
The error curve

22
The Resulting Model on CDC15

The resulting model

23
Error Curves on the ALPHA Dataset

Fitting the linear model on the ALPHA subset.
The error curve

24
The Resulting Model on ALPHA

The resulting model

25
Resulting Model without Normalization

The resulting model without normalization (more
reasonable ?)

Mating
26
Summary

The dimensionality problem was tackled by
clustering.
Biologically sound.
Linear models were used to represent the
relationships between the resulting
gene-prototypes.
Good balance between the model complexity and the
accuracy.
No intrinsic semantics was found.

Write a Comment

User Comments (0)