Reverse engineering gene networks using singular value decomposition and robust regression - PowerPoint PPT Presentation

About This Presentation
Title:

Reverse engineering gene networks using singular value decomposition and robust regression

Description:

Reverse engineering gene networks using singular value ... Thus, to reverse-engineer a network of N genes, we 'only' need Mc = O(logN) experiments. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 24
Provided by: bartsa3
Category:

less

Transcript and Presenter's Notes

Title: Reverse engineering gene networks using singular value decomposition and robust regression


1
Reverse engineering gene networks using singular
value decomposition and robust regression
  • M.K.Stephen Yeung
  • Jesper Tegner
  • James J. Collins

2
General idea
  • Reverse-engineer
  • Genome-wide scale
  • Small amount of data
  • No prior knowledge
  • Using SVD for a family of possible solutions
  • Using robust regression to choose from them

3
  • If the system is near a steady state, dynamics
    can be approximated by linear system of N ODEs
  • xi concentration of mRNA
  • (reflects expression level of genes)
  • ?i self-degradation rates
  • bi external stimuli
  • ?i noise
  • Wij type and strength of effect
  • of jth gene on ith gene

4
(No Transcript)
5
  • Suppositions made
  • No time-dependency in connections
  • (so W is not time-dependent), and they are not
    changed by the tests
  • System near steady state
  • Noise will be discarded, so exact measurements
    are assumed
  • can be calculated exactly enough

6
  • In M experiments with N genes,
  • each time apply stimuli (b1,,bN) to the genes
  • measure concentrations of N mRNAs (x1,,xN) using
    a microarray
  • You get
  • subscript i mRNA number
  • superscript j experiment number

7
  • Goal is to use as few measurements as possible.
    By this method (with exact measurements)
  • M O(log(N))
  • e.g. in 1st test, the results will be

8
  • System becomes
  • With A W diag(-?i)
  • Compute by using several measurements of the
    data for X. (e.g. using interpolation)
  • Goal deduce W (or A) from the rest
  • If MN, compute (XT)-1, but mostly M ltlt N
  • (this is our goal M log(N))

9
  • Therefore, use SVD (to find least squares sol.)
  • Here, U and V are orthogonal (UT U-1)
  • and W is diag(w1,,wN) with wi the singular
    values of X
  • Suppose all wi 0 are in the beginning, so wi
    0 for i 1L and wi ? 0 (iL1...LN)

10
  • Then the least squares (L2) solution to the
    problem is
  • With 1/wj replaced by 0 if wj 0
  • So this formula tries to match every datapoint as
    closely as possible to the solution.

11
  • But all possible solutions are
  • with C (cij)NxN where cij 0 if j gt L and
    otherwise just a scalar coefficient
  • How to choose from the family of solutions ?
  • The least squares method tries to match every
    datapoint as closely as possible
  • ? a not-so-sparse matrix with a lot of small
    entries.

12
  1. Basing on prior biological knowledge,impose this
    on the solutions.e.g. when we know 2 genes are
    related,the solution must reflect this in the
    matrix
  2. Work from the assumption that normal gene
    networks are sparse, and look for the matrix
    that is most sparsethus search cij to maximize
    the number of zero-entries in A

13
  • So
  • get as much zero-entries as you can
  • therefore get a sparse matrix
  • the non-zero entries form the connections
  • fit as much measurements as you can, exactly
    robust regression
  • (So you suppose exact measurements)

14
  • Do this using L1 regression. Thus, when
    considering
  • we want to minimize A.
  • The L1 regression idea is then to look for the
    solution C where is minimal.
  • This causes as many zeros as possible.
  • Implementation was done using the simplex method
    (linear adjustment method)

15
  • Thus, to reverse-engineer a network of N genes,
    we only need Mc O(logN) experiments.
  • Then Mc ltlt N, and the computational cost will be
    O(N4)
  • (Brute-force methods would have a cost of
    O(N!/(k!(N-k)!)) with k non-zero entries)

16
Test 1
  • Create random connectivity matrixfor each row,
    select k entries to be non-zero
  • - k lt kmax ltlt N (to impose sparseness)
  • - non-zero entry random from uniform distrib.
  • Do random perturbations
  • Do measurements while system relaxes back to its
    previous steady state ? X
  • Compute by interpolation
  • Do this M times

17
Test 1
  • Then apply algorithm to become approximation of A
  • Computed error (with the computed A)

18
  • Results Mc O(log(N))
  • Better than only SVD,
  • without regression

19
Test 2
  • One-dimensional cascade of genes
  • Result for N 400
  • Mc 70

20
Test 3
  • Large sparse gene
  • network, with ran-
  • dom connections,
  • external stimuli,
  • Results the same
  • as in previous tests

21
Discussion
  • Advantages
  • Very few data needed, in comparison with neural
    networks, Bayesian models
  • No prior knowledge needed
  • Easy to parallelize, as it recovers the
    connectivity matrix row by row (gene by gene)
  • Also applicable to protein networks

22
Discussion
  • Disadvantages
  • Less efficient for small networks (MN)
  • No quantification yet of the necessary
    sparseness, though avg. 10 connections is good
    for a network containing gt 200 genes
  • Uncertain
  • Especially useful with exact data, which we dont
    have

23
Improvements
  • Other algorithms to impose sparseness
    alternatives are possible both for L1 (basic
    criterion) as for simplex (implementation)
  • By using a deterministic linear system of ODEs, a
    lot has been neglected (noise, time delays,
    nonlinearities)
  • Connections could change by experiments then
    the use of time-dependent W is necessary
Write a Comment
User Comments (0)
About PowerShow.com