Graph mining in bioinformatics - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Graph mining in bioinformatics

Description:

Graphs are often used in bioinformatics for describing processes in the cell ... Stijn van Dongen, Graph Clustering by Flow Simulation. ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 16
Provided by: csI88
Category:

less

Transcript and Presenter's Notes

Title: Graph mining in bioinformatics


1
Graph mining in bioinformatics
  • Laur Tooming

2
Graphs in biology
  • Graphs are often used in bioinformatics for
    describing processes in the cell
  • Vertices are genes or proteins
  • The meaning of an edge depends on the type of the
    graph
  • Protein-protein interaction
  • Gene regulation

3
What were looking for
  • We want to find sets of genes that have a
    biological meaning.
  • Idea find graph-theoretically relevant sets of
    vertices and find out if they are also
    biologically meaningful.
  • Simple example connected components
  • A more advanced idea graph clustering. Find
    subgraphs that have a high edge density.

4
(No Transcript)
5
(No Transcript)
6
Markov Cluster Algorithm (MCL)
  • If there is cluster structure in a graph, random
    walks tend to remain in a cluster for a long time
  • Graph modelled as a stochastic matrix sum of
    entries in a column is 1
  • aij - probability that randomly walking out of j
    will go to i on the next step
  • Bigger edge weight means greater probability of
    choosing that edge
  • Stijn van Dongen, Graph Clustering by Flow
    Simulation. PhD thesis, University of Utrecht,
    May 2000.
  • http//micans.org/mcl/

7
Markov Cluster Algorithm (MCL)
  • Two procedures, inflation and expansion, are
    applied alternatively
  • Expansion matrix squaring
  • considers longer random walks
  • Inflation raising entries to some power,
    rescaling to remain stochastic
  • Weakens weak edges and strengthens strong ones
  • Converges to a steady state

8
Markov Cluster Algorithm (MCL)
  • Images from http//micans.org/mcl/ani/mcl-animatio
    n.html

9
Betweenness centrality clustering
  • An edge between different clusters is on many
    shortest paths from one cluster to another.
  • An edge inside a cluster is on less shortest
    paths, because there are more alternative paths
    inside a cluster.
  • Betweenness centrality of an edge - the number of
    shortest paths in the graph containing that edge.
  • Remove edges with the highest centrality from the
    graph to obtain clustering.
  • Optimisations
  • instead of all shortest paths, pick a sample of
    vertices and calculate shortest paths from them
  • remove several edges at once

10
GraphWeb
  • Web interface for analysing biological graphs
  • Simple syntax for entering graphs
  • multiple datasets
  • directed edges
  • edge weights
  • Visualising graphs with GraphViz
  • Finding biological meaning with gProfiler
  • ds1 A gt B 10
  • ds2 A gt B 4
  • ds1 B C 5
  • ds2 C gt D 12

11
Combining several datasets
  • Whether or not there is an edge between two
    vertices is determined in biological experiments,
    which may sometimes give false results.
  • For a given graph different sources may give
    different information. Some sources may be more
    trustworthy than others.
  • We would like to combine different sources and
    assess the trustworthyness of each edge in the
    resulting graph.
  • Edge weight in summary graph sum over datasets
  • w(e,G) S w(e,Gi)w(Gi)

12
Combining several datasets
13
(No Transcript)
14
(No Transcript)
15
The end
Write a Comment
User Comments (0)
About PowerShow.com