Reverse engineering gene networks using singular value decomposition and robust regression

About This Presentation

Title:

Reverse engineering gene networks using singular value decomposition and robust regression

Description:

Reverse engineering gene networks using singular value ... Thus, to reverse-engineer a network of N genes, we 'only' need Mc = O(logN) experiments. ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 24

Provided by: bartsa3

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Reverse engineering gene networks using singular value decomposition and robust regression

1
Reverse engineering gene networks using singular
value decomposition and robust regression

M.K.Stephen Yeung
Jesper Tegner
James J. Collins

2
General idea

Reverse-engineer
Genome-wide scale
Small amount of data
No prior knowledge
Using SVD for a family of possible solutions
Using robust regression to choose from them

If the system is near a steady state, dynamics
can be approximated by linear system of N ODEs
xi concentration of mRNA
(reflects expression level of genes)
?i self-degradation rates
bi external stimuli
?i noise
Wij type and strength of effect
of jth gene on ith gene

4
(No Transcript)
5

Suppositions made
No time-dependency in connections
(so W is not time-dependent), and they are not
changed by the tests
System near steady state
Noise will be discarded, so exact measurements
are assumed
can be calculated exactly enough

In M experiments with N genes,
each time apply stimuli (b1,,bN) to the genes
measure concentrations of N mRNAs (x1,,xN) using
a microarray
You get
subscript i mRNA number
superscript j experiment number

Goal is to use as few measurements as possible.
By this method (with exact measurements)
M O(log(N))
e.g. in 1st test, the results will be

System becomes
With A W diag(-?i)
Compute by using several measurements of the
data for X. (e.g. using interpolation)
Goal deduce W (or A) from the rest
If MN, compute (XT)-1, but mostly M ltlt N
(this is our goal M log(N))

Therefore, use SVD (to find least squares sol.)
Here, U and V are orthogonal (UT U-1)
and W is diag(w1,,wN) with wi the singular
values of X
Suppose all wi 0 are in the beginning, so wi
0 for i 1L and wi ? 0 (iL1...LN)

Then the least squares (L2) solution to the
problem is
With 1/wj replaced by 0 if wj 0
So this formula tries to match every datapoint as
closely as possible to the solution.

But all possible solutions are
with C (cij)NxN where cij 0 if j gt L and
otherwise just a scalar coefficient
How to choose from the family of solutions ?
The least squares method tries to match every
datapoint as closely as possible
? a not-so-sparse matrix with a lot of small
entries.

Basing on prior biological knowledge,impose this
on the solutions.e.g. when we know 2 genes are
related,the solution must reflect this in the
matrix
Work from the assumption that normal gene
networks are sparse, and look for the matrix
that is most sparsethus search cij to maximize
the number of zero-entries in A

So
get as much zero-entries as you can
therefore get a sparse matrix
the non-zero entries form the connections
fit as much measurements as you can, exactly
robust regression
(So you suppose exact measurements)

Do this using L1 regression. Thus, when
considering
we want to minimize A.
The L1 regression idea is then to look for the
solution C where is minimal.
This causes as many zeros as possible.
Implementation was done using the simplex method
(linear adjustment method)

Thus, to reverse-engineer a network of N genes,
we only need Mc O(logN) experiments.
Then Mc ltlt N, and the computational cost will be
O(N4)
(Brute-force methods would have a cost of
O(N!/(k!(N-k)!)) with k non-zero entries)

16
Test 1

Create random connectivity matrixfor each row,
select k entries to be non-zero
- k lt kmax ltlt N (to impose sparseness)
- non-zero entry random from uniform distrib.
Do random perturbations
Do measurements while system relaxes back to its
previous steady state ? X
Compute by interpolation
Do this M times

17
Test 1

Then apply algorithm to become approximation of A
Computed error (with the computed A)

Results Mc O(log(N))
Better than only SVD,
without regression

19
Test 2

One-dimensional cascade of genes
Result for N 400
Mc 70

20
Test 3

Large sparse gene
network, with ran-
dom connections,
external stimuli,
Results the same
as in previous tests

21
Discussion

Advantages
Very few data needed, in comparison with neural
networks, Bayesian models
No prior knowledge needed
Easy to parallelize, as it recovers the
connectivity matrix row by row (gene by gene)
Also applicable to protein networks

22
Discussion

Disadvantages
Less efficient for small networks (MN)
No quantification yet of the necessary
sparseness, though avg. 10 connections is good
for a network containing gt 200 genes
Uncertain
Especially useful with exact data, which we dont
have

23
Improvements

Other algorithms to impose sparseness
alternatives are possible both for L1 (basic
criterion) as for simplex (implementation)
By using a deterministic linear system of ODEs, a
lot has been neglected (noise, time delays,
nonlinearities)
Connections could change by experiments then
the use of time-dependent W is necessary

Write a Comment

User Comments (0)

About PowerShow.com

Reverse engineering gene networks using singular value decomposition and robust regression - PowerPoint PPT Presentation

Reverse engineering gene networks using singular value decomposition and robust regression

Reverse engineering gene networks using singular value ... Thus, to reverse-engineer a network of N genes, we 'only' need Mc = O(logN) experiments. ... – PowerPoint PPT presentation