Efficient Identification of Overlapping Communities - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Identification of Overlapping Communities

Description:

Compare run time of new vs. old. Compare cluster quality of new vs. old ... Preferential attachment quality. LA ordering run time. LA ordering quality ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 50
Provided by: jeffrey105
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Identification of Overlapping Communities


1
Efficient Identification of Overlapping
Communities
  • Jeffrey Baumes
  • Mark Goldberg
  • Malik Magdon-Ismail

Rensselaer Polytechnic Institute, Troy, NY
2
Outline
  • Communities as clusters
  • What is a cluster?
  • Cluster seed procedure (LA)
  • Cluster refinement procedure (IS2)
  • Experimental results
  • Conclusions and future work

3
Communities as clusters
  • Malicious groups use large communication networks
    for planning and coordination
  • Their goal remain undetected
  • Our goal sift through communications for
    suspicious patterns, using structure only, not
    content

4
Communities as clusters
  • Detecting all social groups (malicious or not)
    will aide in searching for hidden groups
  • Social groups tend to communicate densely
  • Approach Find social groups by finding clusters
    in the graph of the communication network

Add external edges
likely a social group
A communicates with B
likely not a social group
actor A
actor B
5
What is a cluster?
  • Many partitioning algorithms exist
  • Social groups often overlap
  • Instead define clusters as locally optimal with
    respect to density

overlapping clustering
partitioning
6
Two-stage process
communication network
seed procedure
seed clusters
refinement procedure
final clusters
7
Original procedures
communication network
Rank Removal (RaRe)
seed clusters
Iterative Scan (IS)
Jeffrey Baumes, Mark Goldberg, Mukkai
Krishnamoorthy, Malik Magdon-Ismail, Nathan
Preston. "Finding Communities by Clustering a
Graph into Overlapping Subgraphs", International
Conference on Applied Computing (IADIS 2005), Feb
22-25, Algarve, Portugal.
final clusters
8
Proposed new procedures
communication network
Link Aggregate (LA)
seed clusters
Iterative Scan 2 (IS2)
final clusters
9
Link Aggregate (LA)
  • Order the nodes (two routines are used)
  • Pass through the nodes
  • For each node, add it to the clusters it
    improves, or start a new cluster

10
LA procedure
11
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
12
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
13
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
14
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
15
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
16
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
17
Iterative Scan (IS)
  • Old refinement procedure
  • Traverses entire node list, adding / removing
    nodes which increase the density
  • Repeats the process until no improvements are
    possible
  • May be inefficient in sparse networks\
  • Guaranteed to be locally optimal

18
Iterative Scan 2 (IS2)
  • New refinement procedure
  • Traverses neighborhood of cluster only, adding /
    removing nodes which increase the density
  • Repeats the process until no improvements are
    possible
  • More efficient in sparse networks in spite of
    overhead, less efficient in dense networks

19
IS2 procedure
20
IS2 procedure
21
IS2 procedure
22
IS2 procedure
23
IS2 procedure
24
Experimental results
  • Compare run time of new vs. old
  • Compare cluster quality of new vs. old
  • Compare on different network types
  • Random
  • Preferential attachment
  • Real-world
  • Compare possible actor orderings for LA

25
RaRe vs. LA run time
New RaRe
Original RaRe
LA
New RaRe
LA
26
IS vs. IS2 run time
Define IS IS for dense graphs, IS2 for sparse
graphs
27
Old vs. new quality
New RaRe ? IS
New RaRe ? IS
LA ? IS2
LA ? IS2
28
Preferential attachment
New RaRe ? IS
New RaRe ? IS
LA ? IS2
LA ? IS2
29
Real-World Networks
  • Ratio new/old
  • (LA?IS)/(RaRe?IS)

IS
IS2
IS2
IS2
IS
30
LA ordering
31
Conclusions and future work
  • Overlapping clustering may be used to discover
    social groups in communication networks
  • The new algorithm is more efficient in many
    cases, while keeping the same or better quality
  • A unified algorithm should choose strategies and
    parameters based on network properties

32
Questions
33
Rank Removal
  • Existing seed procedure
  • Removes highly connected nodes until network is
    broken into small clusters
  • Adds removed nodes back into clusters it is
    well-connected to
  • Two main inefficiencies
  • Computed Page Rank at each iteration
  • Computed connected components at each iteration
  • Page Rank could be computed once, but
    reprocessing connected components is crucial

34
LA procedure detail
35
IS2 procedure detail
36
RaRe vs. LA
37
RaRe vs. LA
38
RaRe vs. LA
39
IS vs. IS2
40
IS vs. IS2
41
IS vs. IS2
42
Run time RaRe vs. LA
43
Run time IS vs. IS2
44
Cluster quality
45
Cluster quality
46
Preferential attachment run time
47
Preferential attachment quality
48
LA ordering run time
49
LA ordering quality
Write a Comment
User Comments (0)
About PowerShow.com