Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies - PowerPoint PPT Presentation

About This Presentation
Title:

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies

Description:

Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County – PowerPoint PPT presentation

Number of Views:209
Avg rating:3.0/5.0
Slides: 36
Provided by: Aks78
Category:

less

Transcript and Presenter's Notes

Title: Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies


1
Detecting Communities Via Simultaneous Clustering
of Graphs and Folksonomies Akshay Java Anupam
Joshi Tim Finin University of Maryland,
Baltimore County
KDD 2008 Workshop on Web Mining and Web Usage
Analysis
2
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

3
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

4
Social Media Describes the online technologies
and practices that people use to share
opinions, insights, experiences, and
perspectives and engage with each other.
Wikipedia
5
Social Media Graphs
G (V,E) describing the relationships between
different entities (People, Documents,
etc.) G ltV,T,Rgt a tri-partite graph that
expresses how entities Tag some resource
6
What is a Community
Political Blogs
A community in the real world is identified in a
graph as a set of nodes that have more links
within the set than outside it.
Twitter Network
Facebook Network
7
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

8
Community DetectionClustering Approach
  • Clustering Approach
  • Agglomerative/Hierarchical
  • Topological Overlap Similarity is measured in
    terms of number of nodes that both i and j link
    to. (Razvasz et al.)

9
Community DetectionClustering Approach
  • Clustering Approach
  • Agglomerative/Hierarchical
  • Divisive/Partition based
  • Remove edges that have highest edge betweenness
    centrality

(Girvan-Newman Algorithm)
Political Books
10
Community DetectionSpectral Approach
Graph Laplacian
  • The graph can be partitioned using the
    eigenspectrum of the Laplacian. (Shi and Malik)
  • The second smallest eigenvector of the graph
    Laplacian is the Fiedler vector.
  • The graph can be recursively partitioned using
    the sign of the values in its Fielder vector.

Normalized Cuts
Cost of edges deleted to disconnect the graph
Total cost of all edges that start from B
11
Community DetectionCo-Clustering
  • Spectral graph bipartitioning
  • Compute graph laplacian using
  • Where
  • is the document by term matrix
  • (Dhillon et al.)

12
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

13
Social Media Graphs
Links Between Nodes and Tags
Links Between Nodes
Simultaneous Cuts
14
Communities in Social Media
A community in the real world is identified in a
graph as a set of nodes that have more links
within the set than outside it and share similar
tags.
15
Clustering Tags and Graphs
Nodes
Tags
Tags
Tags
Nodes
Nodes
Tags
Nodes
Fiedler Vector Polarity
ß 0 is like co-clustering, ß 1 Equal
importance to blog-blog and blog-tag, ßgtgt 1 NCut
16
Clustering Tags and Graphs
Clustering Only Links
Clustering Links Tags
ß 0 is like co-clustering, ß 1 Equal
importance to blog-blog and blog-tag, ßgtgt 1 NCut
17
Clustering Tags and Graphs
Clustering Only Links
Clustering Links Tags
18
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

19
Datasets
  • Citeseer
  • Agents, AI, DB, HCI, IR, ML
  • Words used in place of tags
  • Blog data
  • derived from the WWE/Buzzmetrics dataset
  • Tags associated with Blogs derived from
    del.icio.us
  • For dimensionality reduction 100 topics derived
    from blog homepages using LDA (Latent Dirichilet
    Allocation)
  • Pairwise similarity computed
  • RBF Kernel for Citeseer
  • Cosine for blogs

20
Citeseer Data
Accuracy 36
Accuracy 62
Higher accuracy by adding tag information
21
Citeseer Data
NCut
SimCut
  • SimCut Results in
  • Higher intra-cluster similarity
  • Lower inter-cluster similarity

22
Citeseer Data
NCut
True
SimCut
  • Constrains cuts based on both
  • Link Structure
  • Tags

23
Blog Data
NCut
SimCut
  • SimCut Results in
  • Higher intra-cluster similarity
  • Lower inter-cluster similarity

24
Blog Data
NCut
SimCut
35 Clusters
  • Ncut
  • Few, Large clusters with low intra-cluster
    similarity
  • SimCut
  • Moderate size clusters higher intra-cluster
    similarity

25
Effect of Number of Tags, Clusters
Citeseer
Mutual Information compares clusters to ground
truth
More tags help, to an extent Lower mutual
information if only the graph is used
26
Effect of Number of Tags, Clusters
Blogs
Mutual Information compares clusters to
content-based clusters (no tags/graph)
More tags help, to an extent Lower mutual
information if only the graph is used
27
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

28
Future Work
  • Evaluating SimCut algorithm on derived feature
    types like named entities, sentiments and
    opinions, links to main stream media.
  • For a dataset with ground truth, a comparison of
    graph based, text based and graphtag based
    clustering
  • Evaluating effect of varying ß

29
Outline
  • Introduction
  • Community Detection
  • Clustering Approach
  • Spectral Approach
  • Co-Clustering
  • Simultaneous Clustering
  • Evaluation
  • Future Work
  • Conclusions

30
Conclusions
  • Many Social Media sites allow users to tag
    resources
  • Incorporating folksonomies in community detection
    can yield better results
  • SimCut can be easily implemented and relates to
    Ncut with two simultaneous objectives
  • Minimize number of node-node edges being cut
  • Minimize number of node-tag edges being cut
  • Detected communities can be associated with
    meaningful, descriptive tags

31
Thanks!
32
http//ebiquity.umbc.edu http//socialmedia.
typepad.com
33
More Tags
Only Graph
SimCut
34
Citeseer (Community Size, Similarity)
35
Blogs (Community Size, Similarity)
Write a Comment
User Comments (0)
About PowerShow.com