DISCOVERING LARGER NETWORK MOTIFS - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

DISCOVERING LARGER NETWORK MOTIFS

Description:

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS ... Examples of bioinformatics application : finding the densest subspaces in ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 31
Provided by: csG7
Learn more at: http://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: DISCOVERING LARGER NETWORK MOTIFS


1
DISCOVERING LARGER NETWORK MOTIFS
  • Li Chen
  • 4/16/2009
  • CSC 8910 Analysis of Biological Network, Spring
    2009
  • Dr. Yi Pan

2
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
3
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Two distinct definitions of a motif based on
    frequency and statistical significance
  • Definition 1 a motif is a sub-graph that appears
    more than a threshold number of times.
  • Definition 2 a motif is a sub-graph that
    appears more often than expected by chance.
    (over-presented motif)

4
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Two characteristics used to evaluate a motif
  • Frequency
  • 1. Arbitrary overlaps of nodes and edges (non-
    identical
  • case)
  • 2. Only overlaps of nodes (edge-disjoint case)
  • 3. No overlaps (edge and vertex-disjoint case)

5
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Statistical Significance compares the obtained
    values of the frequencies for the observed and
    random networks.
  • 1. Z-score
  • 2. Abundance

6
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Models of Random Graphs
  • Preserves the same degree distribution of
  • biological networks
  • Preserve degree sequence (search of n-node
    motifs)
  • Based on geometric random networks and Poisson
  • distribution of the degree
  • Incorporate node clustering into model

7
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • 3. Compact Topological Motifs introduces
    a compact graph
  • representation obtained by grouping
    together maximal
  • sets of nodes that are
    indistinguishable.




  • The graph on the left
    show the


  • sets U1 and U2 as
    compact nodes


  • and U1U2 as compact
    edge.

8
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Motif Discovery Algorithm
  • Exact algorithm on motifs with a small number of
    nodes
  • 1. Exhaustive Recursive Search (ERS) the
    input
  • network is represented by an adjacency
    matrix M.
  • (motif size lt 4)
  • 2. ESU starting with individual nodes and
    adding
  • one node at a time until the required
    size k is
  • reached. (motif size lt14)

9
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • Approximate Algorithms
  • 1. Search Algorithm Based on Sampling (MFINDER)
    it
  • picks at random edges of the input graph
    until a set of
  • k nodes obtained to get sample sub-graph
    and assigns
  • weights to the samples to correct the
    non-uniform
  • sampling. It scale will with large networks,
    but does not
  • scale well with large motifs.

10
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • 2. Rand-ESU do not needed to compute the weights
    of all
  • samples compared with MFINDER. ESU builds a
    tree
  • whose leaves correspond to sub-graphs of size
    k while
  • internal nodes correspond to sub-graphs of
    size 1 up to
  • k-1, depending on the tree level. It assigns
    to each level
  • in the tree a probability that the nodes are
    further
  • explored, so as to guarantee all leaves are
    visited with
  • uniform probability.

11
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • 3. NeMoFINDER combines approaches of data
    mining and
  • computational biology communities. It
    search for repeated
  • trees and extend them to sub-graphs. It
    leads to a
  • reduction of the computation time for
    discovery of larger
  • motifs, but at the cost of missing some
    potentially
  • interesting sub-graphs.

12
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • 4. Sub-graph Counting by Scalar Computation
    it
  • characterize a biological network by a
    set of measures
  • based on scalars and functional of the
    adjacency matrix
  • associated to the network. Its advantages
    are
  • mathematical elegance and computational
    efficiency.

13
THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF
DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS
  • 5. A-priori-based Motif Detection the basic
    idea is if a sub-
  • graph is frequent so are all its
    sub-graphs. It builds
  • candidate motifs of size k by joining
    motifs of size k-1 and
  • then evaluating their frequency.

14
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
15
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Desirable features of clustering algorithms to
    evaluate
  • Scalability
  • Robustness
  • Order insensitivity
  • Minimum user-specified input
  • Mixed data types
  • Arbitrary-shaped clusters
  • Point proportion admissibility Duplicating data
    and re-clustering should not alter the results.

16
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Five categories clustering algorithm
  • Partitioning Clustering Algorithm
  • Hierarchical Clustering Algorithm
  • Grid-based Clustering Algorithm
  • Density-based Clustering Algorithm
  • Model-based Clustering Algorithm
  • Graph-based Clustering Algorithm

17
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Partition Clustering Algorithm
  • Numerical Methods
  • 1. K-means algorithm and Farthest First Traversal
    k-center (FFT) algorithm
  • 2. K-medoids or PAM (Partitioning Around
    Medoids)
  • 3. CLARA (Clustering Large Applications)
  • 4. CLARANS (Clustering Large Applications Based
    upon
  • Randomized Search) and Fuzzy K-means

18
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Discrete Methods
  • 1. K-modes
  • 2. Fuzzy K-modes
  • 3. Squeezer and COOLCAT.
  • Mixed of Discrete and Numerical Clustering
    Methods
  • 1. K-prototypes

19
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Hierarchical Clustering Algorithm
  • Divide the data into a tree of nodes, where each
    node represents a cluster.
  • Two categories based on methods or purposes
  • 1. Agglomerative vs. Divisive
  • 2. Single vs. Complete vs. Average linkage

20
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Popular natures can have various levels of
    subsets
  • Drawbacks
  • 1. Slow
  • 2. Errors are not tolerable
  • 3. Information losses when moving the levels
  • Two kinds of methods
  • 1. Numerical Methods BIRCH, CURE , Spectral
    clustering
  • 2. Discrete Methods ROCK, Chameleon, LIMBO

21
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Grid-based Clustering Algorithm
  • Form a grid structure of cells from the input
    data. Then each data is distributed in a cell of
    the grid.
  • STING combines a numerical grid-base clustering
    method and hierarchical method

22
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Density-based Clustering Algorithm
  • Use a local density standard
  • Clusters are dense subspaces separated by low
    density spaces
  • Examples of bioinformatics application finding
    the densest subspaces in interactome(protein-prote
    in interaction) networks

23
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • DBSCAN, OPTICS, DENCLUE, WaveCluster, CLIQUE use
    numerical values for clustering
  • SEQOPTICS is used for sequence clustering
  • HIERDENC (Hierarchical Density-based Clustering),
  • MULIC (Multiple Layer Incremental
    Clustering), Projected (subspace) clustering,
    CACTUS, STIRR, CLICK, CLOPE use discrete values
    for clustering

24
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Model-based Clustering Algorithm
  • Uses a model often derived by a statistical
    distribution
  • Bioinformatics applications
  • 1. gene expression
  • 2. interactomes
  • 3. sequences

25
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Numerical model-based methods
  • 1. Self-Organizing Maps
  • Discrete model-based clustering algorithm
  • 1. COBWEB
  • Numerical and discrete model-based clustering
    methods
  • 1. BILCOM (Bi-level clustering of Mixed Discrete
    and
  • Numerical Biomedical Data) using empirical
    Bayesian
  • approach

26
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Examples
  • 1. Gene expression clustering
  • 2. Protein sequence clustering
  • 3. AutoClass
  • 4. SVM Clustering methods
  • Graph-based Clustering Algorithm
  • Applied to interactomers for complex prediction
    and sequence networks

27
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Examples
  • 1. MCODE (Molecular Complex Detection)
  • 2. SPC (Super Paramagnetic Clustering)
  • 3. RNSC (Restricted Neighborhood Search
    Clustering)
  • 4. MCL(Markov Clustering)
  • 5. TribeMCL
  • 6. SPC
  • 7. CD-HIT
  • 8. ProClust
  • 9. BAG algorithms

28
A ROADMAP OF CLUSTERING ALGORITHM IN
BIOINFORMATICS APPLICATIONS
  • Usage in Bioinformatics Applications
  • Gene expression clustering
  • 1. K-means algorithm
  • 2. Hierarchical algorithm
  • 3. SOMs
  • Interactomes
  • 1. AutoClass,
  • 2. SVM clustering
  • 3. COBSEB
  • 4. MULIC
  • Sequence clustering
  • 1. Hierarchical clustering algorithm

29
REFERENCES
  • 1 Bill Andreopoulos, Aijun An, Xiaogang Wang,
    and Michael Schroeder. A roadmap of clustering
    algorithms finding a match for a biomedical
    application. Brief Bioinform, pages bbn058,
    February 2009.
  • 2 Alberto Apostolico, Matteo Comin, and Laxmi
    Parida". Bridging Lossy and Lossless Compression
    by Motif Pattern Discovery. Electronic Notes in
    Discrete Mathematics, 21219 - 225, 2005. General
    Theory of Information Transfer and Combinatorics.
  • 3 Giovanni Ciriello and Concettina Guerra. A
    review on models and algorithms for motif
    discovery in protein-protein interaction
    networks. Brief Funct Genomic Proteomic,
    7(2)147-156, 2008.
  • 4 Jun Huan, Wei Wang, and Jan Prins. Efficient
    Mining of Frequent Subgraphs in the Presence of
    Isomorphism. Data Mining, IEEE International
    Conference on, 0549, 2003.
  • 5 Michihiro Kuramochi and George Karypis.
    Finding Frequent Patterns in a Large Sparse
    Graph. Data Mining and Knowledge Discovery,
    11(3)243-271, November 2005.
  • 6 Laxmi Parida. Discovering Topological Motifs
    Using a Compact Notation. Journal of
    Computational Biology, 14(3)300-323, 2007.

30
  • Thank you so much !
Write a Comment
User Comments (0)
About PowerShow.com