Aracne - PowerPoint PPT Presentation

About This Presentation
Title:

Aracne

Description:

gj. Accounting for dependence: definition and measurement ... Reconstruction of class of synthetic transcriptional networks by Mendes et al (cf. ... – PowerPoint PPT presentation

Number of Views:257
Avg rating:3.0/5.0
Slides: 35
Provided by: openwe
Category:
Tags: aracne | class

less

Transcript and Presenter's Notes

Title: Aracne


1
Aracne
  • Jorge Viveros
  • Summer 2006 Workshop
  • June 29th, 2006

2
Contents
  1. Overview (the problem, the alternatives, ARACNEs
    arlgorithm central idea)
  2. Demo (reconstruction of gene regulatory networks
    for affymatrix gene expression data)
  3. Algorithm details (approximating the mutual
    information, comparative study results, ARACNE vs
    Bayesian and Relevance Networks)
  4. Conclusions
  5. Bibliography

3
1. Overview ARACNE
  • Algorithm for the Reconstruction of Accurate
    Cellular Networks
  • Reverse engineering or
    deconvolution problem

Samples
ga
gb
ga gb gc gd ge
Information-theory max
entropy methods
gc
gd
ge
Gene regulatory network
4
(overview, contd) Authors
  • A.A. Margolin 1,2, I. Nemenman 2, K. Basso
    3, C. Wiggings 2,4, G. Stolovitzky 5, R.
    Dalla-Favera 3, A. Califano 1,2
  • 1 Dept. Biomedical informatics, 2 Joint
    Centers for Sys Biology, 3 Institute for Cancer
    Genetics, 4 Dept. of Appl. Physics and Appl.
    Math.
  • Columbia University
  • 5 IBM T.J. Watson Research Center.

Main reference http//www.arxiv.org/abs/q-bio/041
0037 BMC Bioinformatics 2006, 7(Suppl 1)S7
5
(overview, contd)
Goal
  • Understand mammalian normal cell physiology and
    complex pathologic
  • phenotypes through elucidating gene
    transcriptional regulatory networks.
  • Thesis
  • Statistical associations between mRNA abundance
    levels helps to
  • uncover gene regulatory mechanisms.

6
(overview alternatives) ARACNE vs
Clustering
  • ARACNE recovers specific transcriptional
    interactions but does not attempt to
  • recover all of them (too complex a problem).
  • Genome-wide clustering of gene expression
    profiles cannot discern direct
  • (irreducible) from cascade transcriptional gene
    interactions.

ga gb gc gd ge
a
b
clustering
ARACNE
c
d
e
ga,gb gc,gd ge
7
(central idea) Gene network
inference
  • edge (direct) statistical dependency
  • direct regulatory interaction
  • nodes genes
  • Temporal gene expression data for higher
    eukaryotes, difficult to obtain.
  • Only steady-state statistical dependencies are
    studied.

gi
gj
8
Accounting for dependence definition and
measurement
  • Gene expression values samples from a
    joint probability distribution
  • Consider the multi-information average
    log-deviation of the joint probability
    distribution (JPD) from the product of its
    marginals (also Kullback-Leibler divergence
    (KL-div)).
  • Use maximum entropy methods to approximate JPD by
    an element of its m-way marginal Frechet class
    (m-way maximum-entropy estimate m-MEE)
  • Use m-MEE to define mth-order connected
    information (m-cinfo) to account for m-way
    statistical dependencies (only!).
  • Multi-info sum of all m-cinfos.

9
The multi-information
  • Multi-information (KL-div)

JPD
nodes, expressions or genes
Integral if conts case sum if discrete case
Entropy of P(x)
JPD not known, approximate it!
10
m-way max entropy estimate of JPD
  • m-MEE , , has the same m-marginals as

Lagrange multipliers
m-MEE has the following form
Have no analytical solution BUT can be obtained
via an iterative Proportional fitting proc (IPFP)
11
Connected and Multi informations
mth-order connected information
Multi-information
Compensate for the lack of knowledge of JPD by
using the (truncated!) multi-info to establish
and quantify statistical dependencies
12
Detecting a particular m-way interaction
  • M-way interaction
    contributes to multi-info, iff minimum of
    interaction multi-information (inter multi-info)
    over -specific Frechet class is positive.
  • Inter multi-info
  • and are m-MEE sharing same
    m-way marginals except for, perhaps,

Positivity of minimal inter multi-info ?
is an irreducible (direct) interaction Thus draw
edges coming from nodes and meeting at
m-edge vertex.
13
Examples
Regulatory cascade (Markov chain)
Information processing inequalty
generically dependent (similarly, )
generically independent
No triplet interactions (coregulation)
14
(examples, contd) Other
dependencies
2 regulates 1 and 3 OR 1 and 3 regulate 2
jointly
does not factor but pairwise marginals
do
15
2. Demo
  • Platforms
  • caWorkBench2.0 (downloadable through web site)
    (JAVA)
  • Most developed features microarray
    data analysis, pathway analysis and reverse
    engineering, sequence analysis, transcription
    factor binding site analysis, pattern discovery.
  • http//amdec-bioinfo.cu-genome.org/html/caWorkBe
    nch.htm
  • Cygwin (for windows). Windows and Linux versions
    available in web site

16
(Demo) Sample input data
file
  • Input_file_name.exp
  • N 3 genes
  • M 2 microarrays
  • Input file has N14 lines
  • each lines has M2 (2M2) fields
  • AffyID HG_U95Av2 SudHL6.CHP ST486.CHP
  • G1 G1 16.477367 0.69939363 20.150969 0.5297595
  • G2 G2 7.6989274 0.55935365 26.04019 0.5445875
  • G3 G3 8.8098955 0.5445875 21.554955 0.31372303

Microarray chip names
annotation name
header line
(value,p-value)-chip1
17
(Demo, contd)
Syntax (Cygwin)
  • ARACNE algorithm for gene regulatory network
    computation given
  • microarray data.

Usage aracne aracne GeneExpressionFile -a
-k -s -t -e -f aracne -adj
GeneExpressioFile AdjacencyFile -t -e -a
accurate fast default accurate -k gaussian
kernel width accurate method only default
0.15 -s Averaging Window step size fast method
only default 6 -t Mutual Info. threshold
default 0 -e DPI tolerance (btw 0 and 1)
default 1 -f mean stdev default no
filtering
18
(Demo, contd) Sample output data file
  • input_data_file_namenon-default_param_vals.adj
  • lines N genes
  • G10 8 0.064729
  • G21 2 0.0298643 7 0.0521425
  • G32 1 0.0298643
  • G43 8 0.0427217
  • G54 5 0.403516
  • G65 4 0.403516 6 0.582265
  • G76 5 0.582265 9 0.38039
  • G87 1 0.0521425 8 0.743262
  • G98 0 0.064729 3 0.0427217 7 0.743262 9 0.333104
  • G109 6 0.38039 8 0.333104

5
AffyID
ID
Associated gene ID
MI value
1
4
6
9
7
8
10
2
3
19
3.
Algorithm details
  • Incorporate information-theoretic ideas (Markov
    networks) to model statistical dependencies (cf.
    2)
  • joint prob dist function of
    stationary expressions of all genes (i1,,N)
  • N genes, Z partition fun (normalization
    factor), Hamiltonian,
  • , , , interaction potentials
    (e.g., genes i,j,k do not interact in the
  • model iff 0.
  • Aim identify nonzero potentials.

20
(Algorithm details) Aracnes
model
  • First-order approximation genes are independent
  • 1st order potentials obtained from marginal
    probabilities (estimated
    experimentally).
  • ARACNEs approximation truncate joint prob dist
    fun to pairwise potentials
  • In this model
    non-interacting genes (includes statistically
  • independent genes
    and genes that do not interact directly,
  • i.e., but
    ).
  • Reduce number of potential pairwise interactions
    via realistic biological
  • assumptions.

21
(algorithm details, contd) MI estimation
  • Assume two-way interaction pairwise potentials
    determine all statistical dependencies.
  • Mutual information (MI) measure of relatedness
  • 0 iff
  • MI approximation
  • G
    bivariate standard Gaussian density
  • h kernel width

22
(algorithm details, contd)
  • Some details and technicalities
  • Transform x, y so and
    their marginal distributions seem uniform
  • There is not a universal way of choosing h,
    however the ranking of the MIs
  • depends only weakly on them.

23
(algorithm details, contd) Establishing
the network
  • Define threshold IO to discard MIs
    (lower-bound interaction)
  • Shuffle genes across microarray profiles
    evaluate MIs for seemingly
  • independent genes, choose IO based on what
    fraction of MIs falls below the
  • threshold.
  • Data processing inequality if genes g1 and g2
    interact thorugh g3 then
  • ARACNE starts with network so for
    every edge
  • look at gene triplets and remove
    edge with smallest MI

24
(algorithm details, contd) Establishing the
network
ARACNEs algorithm complexity
N number of genes, M number of samples
DPI analysis MI estimation (order
of pairwise interactions
)
25
Perfect network reconstruction
theorems
  • Thm 1 If MIs are estimated with no errors and
    true underlying interaction network is a tree
    with only pairwise interactions then ARACNE will
    reconstruct it.
  • Thm 2 If Chow-Liu maximum MI info tree is
    subnetwork of ARACNEs network then this is the
    true network.
  • Thm 3 ARACNE will reconstruct tree-network
    topologies exactly.

26
Comparative study results
  • Reconstruction of class of synthetic
    transcriptional networks by Mendes et al
  • (cf. 1) and human B lymphocyte genetic network
    from gene expressions
  • profile data.
  • Performance of ARACNE compared against Bayesian
    Networks (use LibB
  • package) and Relevance networks (similar to
    ARACNE but has less accurate
  • MI estimation procedure and less-developed of
    assigning statistical
  • significance).

27
(results) Synthetic
networks
  • 100 genes, 200 interactions organized in two
    types of networks
  • 1. Erdos-Renyi each vertex interaction is
    equally likely
  • 2. Scale-free topology distribution of vertex
    connections obeys a power law

28
(results) Performance metrics
  • Pairwise gene interaction is
  • (True) positive if their statistical regulatory
    interaction is directly linked.
  • (True) negative if their interaction is not
    direct.
  • Precision fraction
    of true interactions correctly inferred

  • (expected success rate in experimental validation
    of

  • predicted interactions)
  • Recall
    fraction of true interactions among all inferred
    ones
  • Performance to be assessed via Precision-Recall
    curves (PRCs)

29
(results contd) PRCs for synthetic
data
1
2
ARACNEs performance above 40 for both models
30
(result contd) Quantitative results on
synthetic data
ARACNE recovers far more true connections and
predicts far less false ones
31
(results contd) Results on Human
B cells
  • Assembled expression profile data set of 340 B
    lymphocytes from normal, tumor-related and
    experimentally manipulated populations.
  • Data set was deconvoluted by ARACNE to generate
    B-cell specific regulatory network of 129,000
    interactions.
  • Validation of the networks quality was done by
    comparing inferred interactions
  • with those identified through biochemical
    methods.
  • See cf 3.

32
Conclusions and Discussions
  1. Algorithm is robust enough for its application in
    other network reconstruction problems in biology
    and the social and engineering fields.
  2. Pairwise interaction model ? higher-order
    potential interactions will not be accounted for
    (ARACNEs algorithm will open 3-gene loops).
  3. A two-gene interaction will be detected iff there
    are no alternate paths.
  4. To keep three-gene loops, modify tolerance for
    edge-removal by introducing tolerance parameter,
    .
  5. ARACNEs performance deteriorates as local (true)
    network topology deviates from a tree (tight
    loops may be a problem).
  6. ARACNE achieved high precision and substantial
    recall even for few data points when compared to
    BN and RN (synthetic data).
  7. ARACNE cannot predict the orientation of the
    edges of the networks.
  8. The algorithm is suited for more complex
    (mammalian) networks.

33
Bibliography
  1. P. Mendes, W. Sha, K. Ye. Artificial gene
    networks for objective comparison of analysis
    algorithms. Bioinformatics 2003, 19 Suppl 2
    II122-II129.
  2. I. Nemenman. Information theory, multivariate
    dependence and genetic network inference.
    Technical report arXivq-bio/0406015 2004.
  3. K. Basso, A.A. Margolin, G. Stolovitzky, U.
    Klein, R. Dalla-Favera, A. Califano. Reverse
    engineering of regulatory networks in human B
    cells. Nature Genetics, 2005, 37(4)382-390.

34
Main web site
  • Important documentation and relevant
    publications, application download and support.
  • AMDeC Bionformatics Core Facility at the
    Columbia Genome Center
  • AMDeC (Academic Medicine Development Company)
  • http//amdec-bioinfo.cu-genome.org/html/ARACNE.
    htm
Write a Comment
User Comments (0)
About PowerShow.com