A scalable multilevel algorithm for community structure detection - PowerPoint PPT Presentation

About This Presentation

Title:

A scalable multilevel algorithm for community structure detection

Description:

A scalable multilevel algorithm for community structure detection Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 27

Provided by: Tev92

Category:

more less

Transcript and Presenter's Notes

Title: A scalable multilevel algorithm for community structure detection

1
A scalable multilevel algorithm for community
structure detection

Melih Onus Hristo Djidjev
Arizona State University Los Alamos National
Laboratory

Models and Algorithms for the Web Graph (WAW
2006) November 29 December 2, 2006
2
Community Structure Detection Problem

The problem of identifying communities in a
network is usually modeled as a graph clustering
problem
Vertices correspond to individual items
Edges describe relationships
The communities correspond to subgraphs
Dense connections between vertices from the same
subgraph
Fewer connections between vertices in different
subgraphs

3
Motivation Why to detect communities?

Analyze and understand the information contained
in the huge amount of data available on the WWW
Finding related commercial items
Recommendation systems
Important for
Social networks
Ad-hoc networks
Protein interaction networks
Genetic networks

4
Motivation Why to detect communities?

Predict how much someone going to love a
movie based on their movie preferences

Grand Prize 1.000.000
5
Outline of the talk

Previous work
Graph partitioning problem
Our approach
Modularity
Reduction
Multilevel graph partitioning
Experimental results
Conclusions

6
Previous Work

Two main classes
Agglomerative Methods (addition of edges)
Divisive Methods (removal of edges)
Algorithms based on
Laplacian Matrix
Centrality measures
Flow models
Random walks
Resistor networks
Optimization
Not fast enough or inaccurate

7
Graph Partitioning Problem

Given a graph G(V, E), find a partition such that
The partition is balanced (i.e., the number of
vertices of all subsets are roughly equal)
Cut size is minimized (i.e., the number of the
edges with endpoints in different subsets is
minimized)
Previous Work
Kernighan-Lin algorithm
Spectral partitioning
Multilevel algorithms

8
Kernighan - Lin Algorithm

Find an initial random partition
Improve by a greedy procedure that swaps pairs of
vertices from different partitions
Minimize the size of the cut set

u
v
9
Graph Partitioning vs Graph Clustering

Find Clusters
Community sizes may differ
Number of subsets varies

Minimize cut size
Equal number of vertices in each subset
Number of subsets is an input

Algorithms for graph partitioning can not be
directly used to produce good quality clustering

10
Our approach

Convert original graph G into a complete graph G
Find min-cut of G using modified graph
partitioning method
This will produce a good quality (high
modularity) clustering for G

11
Modularity

A useful measure of clustering quality
Introduced by Newman 6
Modularity of a partitioning
(number of edges within communities)
(expected number of such edges)
We are trying to find a division of graph with
high modularity

12
Reduction
Min-Cut Problem The problem of finding a
minimum cut in a complete edge-weighted graph G'
Graph Clustering Problem The problem of finding
a clustering of maximum modularity in G
13
Reduction

Maximize modularity of a partitioning
(number of edges within communities)
(expected number of such edges)

Graph Clustering Problem Maximize modularity
Minimize (- modularity) (cut size)
(expected cut size)
Min-Cut Problem Minimize cut size
14
Random Graph Models
pij the probability that there is an edge
between vertices i and j in a random graph from a
given distribution

Erdos - Renyi Model

Chung - Lu Model
15
Multilevel graph partitioning

Fast and an accurate method for producing
high-quality partitions

Consists of the three phases
Coarsening phase
Partitioning phase
Uncoarsening and refinement phase

16
Coarsening Phase

Find a maximal matching and collapse edges to a
vertex
Recursive coarsening
lt G G1, G2, , Gk gt

17
Partitioning Phase

Greedy graph growing partitioning
Partition Gk

18
Uncoarsening and Refinement Phase

Project the partitioning Pi of Gi to Pi-1 of Gi-1
More degrees of freedom at Gi than Gi-1
Improve Pi using KL algorithm

19
Implementation

Our implementation is based on the graph
partitioning package METIS 3 that employs a
multilevel strategy
Convert the graph partitioning algorithm into a
clustering one
The optimal clustering might not be balanced.
We ignore the restrictions that control the
sizes of the parts.
The number of the parts in the optimal clustering
is not known.
We employ a recursive bisection procedure.
The original graph G might be sparse, while the
transformed one G' is complete. Our algorithm
does not explicitly generate G.

20
Modularity Erdos - Renyi Model

(- Modularity) cut size n1n2p

(- Modularity) cut size (n11)(n2-1)p
n1
n2
Erdos - Renyi Model
21
Modularity Chung - Lu Model

(- Modularity) cut size w1w2/2m

(- Modularity) cut size
(w1 w(v))(w2 - w(v))/2m
w1
w2
wi Sum of degrees in partition i
22
Analysis

Time Complexity O(nm)
Experiments
Random Graphs
k-community graphs
nd.edu

23
Experiment I Random Graphs

We generated random graphs with 128 vertices and
4 communities of size 32 each
The expected degree of any vertex is 16
Out degree varies

24
Experiment II k-community graphs

We generated graphs with k communities
Size of each community is 100
Expected number of edges in the community is
equal to expected number of edges going outside
from community.
Probability of an edge in communities varies
between 0.5 and 0.1.
Results show that graphs are clustered especially
99 correctly.

25
Experiment III nd.edu

Data consists of the complete map of the nd.edu
domain, which contains 325,729 document and
1090108 links
Our algorithm clusters this graph into 280
clusters with modularity 0.925579
This high modularity indicates strong community
structure in the graph
We show the dendrogram generated by our
algorithm.
The size of rectangles are proportional to size
of communities.

26
Conclusions