A scalable multilevel algorithm for community structure detection - PowerPoint PPT Presentation

About This Presentation
Title:

A scalable multilevel algorithm for community structure detection

Description:

A scalable multilevel algorithm for community structure detection Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 27
Provided by: Tev92
Category:

less

Transcript and Presenter's Notes

Title: A scalable multilevel algorithm for community structure detection


1
A scalable multilevel algorithm for community
structure detection
  • Melih Onus Hristo Djidjev
  • Arizona State University Los Alamos National
    Laboratory

Models and Algorithms for the Web Graph (WAW
2006) November 29 December 2, 2006
2
Community Structure Detection Problem
  • The problem of identifying communities in a
    network is usually modeled as a graph clustering
    problem
  • Vertices correspond to individual items
  • Edges describe relationships
  • The communities correspond to subgraphs
  • Dense connections between vertices from the same
    subgraph
  • Fewer connections between vertices in different
    subgraphs

3
Motivation Why to detect communities?
  • Analyze and understand the information contained
    in the huge amount of data available on the WWW
  • Finding related commercial items
  • Recommendation systems
  • Important for
  • Social networks
  • Ad-hoc networks
  • Protein interaction networks
  • Genetic networks

4
Motivation Why to detect communities?
  • Predict how much someone going to love a
    movie based on their movie preferences

Grand Prize 1.000.000
5
Outline of the talk
  • Previous work
  • Graph partitioning problem
  • Our approach
  • Modularity
  • Reduction
  • Multilevel graph partitioning
  • Experimental results
  • Conclusions

6
Previous Work
  • Two main classes
  • Agglomerative Methods (addition of edges)
  • Divisive Methods (removal of edges)
  • Algorithms based on
  • Laplacian Matrix
  • Centrality measures
  • Flow models
  • Random walks
  • Resistor networks
  • Optimization
  • Not fast enough or inaccurate

7
Graph Partitioning Problem
  • Given a graph G(V, E), find a partition such that
  • The partition is balanced (i.e., the number of
    vertices of all subsets are roughly equal)
  • Cut size is minimized (i.e., the number of the
    edges with endpoints in different subsets is
    minimized)
  • Previous Work
  • Kernighan-Lin algorithm
  • Spectral partitioning
  • Multilevel algorithms

8
Kernighan - Lin Algorithm
  • Find an initial random partition
  • Improve by a greedy procedure that swaps pairs of
    vertices from different partitions
  • Minimize the size of the cut set

u
v
9
Graph Partitioning vs Graph Clustering
  • Find Clusters
  • Community sizes may differ
  • Number of subsets varies
  • Minimize cut size
  • Equal number of vertices in each subset
  • Number of subsets is an input
  • Algorithms for graph partitioning can not be
    directly used to produce good quality clustering

10
Our approach
  • Convert original graph G into a complete graph G
  • Find min-cut of G using modified graph
    partitioning method
  • This will produce a good quality (high
    modularity) clustering for G

11
Modularity
  • A useful measure of clustering quality
  • Introduced by Newman 6
  • Modularity of a partitioning
  • (number of edges within communities)
  • (expected number of such edges)
  • We are trying to find a division of graph with
    high modularity

12
Reduction
Min-Cut Problem The problem of finding a
minimum cut in a complete edge-weighted graph G'
Graph Clustering Problem The problem of finding
a clustering of maximum modularity in G
13
Reduction
  • Maximize modularity of a partitioning
  • (number of edges within communities)
  • (expected number of such edges)

Graph Clustering Problem Maximize modularity
Minimize (- modularity) (cut size)
(expected cut size)
Min-Cut Problem Minimize cut size
14
Random Graph Models
pij the probability that there is an edge
between vertices i and j in a random graph from a
given distribution
  • Erdos - Renyi Model

Chung - Lu Model
15
Multilevel graph partitioning
  • Fast and an accurate method for producing
    high-quality partitions
  • Consists of the three phases
  • Coarsening phase
  • Partitioning phase
  • Uncoarsening and refinement phase

16
Coarsening Phase
  • Find a maximal matching and collapse edges to a
    vertex
  • Recursive coarsening
  • lt G G1, G2, , Gk gt

17
Partitioning Phase
  • Greedy graph growing partitioning
  • Partition Gk

18
Uncoarsening and Refinement Phase
  • Project the partitioning Pi of Gi to Pi-1 of Gi-1
  • More degrees of freedom at Gi than Gi-1
  • Improve Pi using KL algorithm

19
Implementation
  • Our implementation is based on the graph
    partitioning package METIS 3 that employs a
    multilevel strategy
  • Convert the graph partitioning algorithm into a
    clustering one
  • The optimal clustering might not be balanced.
  • We ignore the restrictions that control the
    sizes of the parts.
  • The number of the parts in the optimal clustering
    is not known.
  • We employ a recursive bisection procedure.
  • The original graph G might be sparse, while the
    transformed one G' is complete. Our algorithm
    does not explicitly generate G.

20
Modularity Erdos - Renyi Model
  • (- Modularity) cut size n1n2p

(- Modularity) cut size (n11)(n2-1)p
n1
n2
Erdos - Renyi Model
21
Modularity Chung - Lu Model
  • (- Modularity) cut size w1w2/2m

(- Modularity) cut size
(w1 w(v))(w2 - w(v))/2m
w1
w2
wi Sum of degrees in partition i
22
Analysis
  • Time Complexity O(nm)
  • Experiments
  • Random Graphs
  • k-community graphs
  • nd.edu

23
Experiment I Random Graphs
  • We generated random graphs with 128 vertices and
    4 communities of size 32 each
  • The expected degree of any vertex is 16
  • Out degree varies

24
Experiment II k-community graphs
  • We generated graphs with k communities
  • Size of each community is 100
  • Expected number of edges in the community is
    equal to expected number of edges going outside
    from community.
  • Probability of an edge in communities varies
    between 0.5 and 0.1.
  • Results show that graphs are clustered especially
    99 correctly.

25
Experiment III nd.edu
  • Data consists of the complete map of the nd.edu
    domain, which contains 325,729 document and
    1090108 links
  • Our algorithm clusters this graph into 280
    clusters with modularity 0.925579
  • This high modularity indicates strong community
    structure in the graph
  • We show the dendrogram generated by our
    algorithm.
  • The size of rectangles are proportional to size
    of communities.

26
Conclusions
  • Community structure detection problem
  • A scalable algorithm
  • Based on multilevel graph partitioning
  • Uses modularity as a quality measure
Write a Comment
User Comments (0)
About PowerShow.com