Title: On Triangulation-based Dense Neighbourhood Graph Discovery
1On Triangulation-based Dense Neighbourhood Graph
Discovery
Nan Wang Jingbo Zhang Kian-Lee Tan Anthony K. H. Tung
School of Computing National University of
Singapore
2Outline
- Motivation
- Related Work
- Terms Definition
- Triangulation based DN-graph mining
- Semi-streaming DN-graph model
- Experimental Study
- Future Work and Conclusion
3Motivation
- Define dense graph pattern from the perspective
that considers both the size of the substructure
and the minimum level of interactions between
vertices. - Locate dense patterns within unsolvable
restricted resources for large scale graphs.
4Related Work
- Other Dense Patterns
- Clique/Quasi-Clique
- High Degree Patterns
- Dense Bipartite Patterns
- Heavy Patterns
- Triangle Counting
- CSV
- Density-based closed cliques discovery and a
linear fashion visualization.
5Terms Definition
6Terms Definition (contd)
7DN-graph
Proof
8Restrictions on the minimal size of the shared
neighborhood
9DN-graph and Other Dense Patterns
DN-Graph
10DN-graph and Closed Clique
Proof
11Computation Bottleneck in DN-graph Mining
Most sub-graphs are not DN-graphs Most of these
operations are redundant
12How to tackle the bottleneck ?
- Reduce number of joins
- Local maximal feature two DN-graphs share no
edge. - All edges sharing common vertices and local
maximal ? values comprising of the DN-graph - Locating DN-graph using ?(e) value
- All edges within DN-graph have equal ?(e) , noted
as ?max - All edges connecting to neighboring vertices have
a smaller ? values ?(e) ?(u,v) lt ?max while u
not in G, v in G - Use approximating methods to compute ?(e)
efficiently
13 e
14Graph Triangulation
- Given a graph triangle, the upper bound of the
other two edges can be used to tighten the
density estimation of the third edge.
?(w,v) 3
w
v
?(u,w) 3
?(u,v)5
u
15Triangulation Based DN-graph Mining
- DN-graph Mining Algorithm
- Step One Sort vertices according to their
degrees. - Step Two Generate triangles in a streaming
fashion. - Step Three Obtain the local density information
gradually along the triangle streams. - Initial Upper Bound TC(e) the number of
triangles an edge participates in.
16Counting of Supporting Nodes
Not Supporting Node
n2
n3
n1
5
6
8
n4
4
7
5
5
3
a
b
4
17 18Convergence
Converge
First Iteration
Second Iteration
Initialization
Two Support Vertices
One Support Vertex
2
V5
The local maximal neighborhood size ??2
2
??(V2V3) decreases by one
??(V2V6) decreases by one
??(V3V6) decreases by one
V6
3
2
2
Edge(e) Edge(e)
2
3
2
V3
2
1
3
2
V1V2
V2V6
V4
3
2
1
V1V3
V3V6
1
4
3
2
2
4
3
V2V3
V3V5
2
2
2
2
V2V4
V2V5
V2
V3V4
V5V6
V1
1
2
2
2
V4V6
19Semi-Streaming Graph Model
- Graph vertices fit into main memory, while edges
are in the secondary storage, in the form of
adjacency list. - Random access in primary storage (i.e. memory)
and only sequential access in secondary storage. - As a feasible solution towards a streaming graph
G(V,E), it should not exceed log V scans of Gs
adjacency list.
20DN-graph mining in semi-streaming model
- Estimating shared neighbor size using min-wise
independent set property. - Min-wise independent set property Two sets A, B
over a universe X, and a uniformly chosen
permutation p over X. If there is a total order
in X, then the probability that min(p(A))
min(p(B)) is the same as the Jaccard Coefficient
J(A, B) (n(A)nn(B))/ (n(A)Un(B)). - We can use that to estimate shared neighbor size
(n(A)nn(B)).
21Experimental Setting
- Quad-Core AMD Opteron(tm) processor 8356
- 128GB memory
- 700 GB hard disk
- OS Windows Server 2003
22Experimental Study
- Comparison with CSV on Stock Market Dataset
23Convergence
- Dataset Flickr graph (1.7million vertices and
22.6 million edges) - Running time per iteration is between 55 minutes
to 1 hour.
24StreamDN Performance on Flickr Dataset
- StreamDN over-estimates with respect to BiTriDN
algorithms results by 72 during the first 66
scans. - StreamDN can handle streaming setting with
reasonable accuracy.
25DN-graph Semantics in Various Domain
26Future work and Conclusion
- DN-graph
- DN-graph Mining Problem
- Semi-streaming Approach
- Future Work
27Thank You Questions
28Reference
- WSTT08 N. Wang, P. Srinivasan, K.-L. Tan, and
A.K.H. Tung. CSV visualizing and mining cohesive
subgraphs. In SIGMOD08, pages 445458, 2008. - WZTT11 N. Wang, J. Zhang, K.-L. Tan, and A.K.H.
Tung. On triangulation-based dense neighbourhood
graph discovery. In VLDB11, volume 4, 2011. - ABC04 P. Aloy, BaPttcher, H. Ceulemans, C.
Leutwein, C. Mellwig, S. Fischer, and A.C. Gavin.
Structure-based assembly of protein complexes in
yeast. volume 303, pages 20262029, 2004. - ATH03 I. Akihiro, W. Takashi, and M. Hiroshi.
Complete mining of frequent patterns from graphs
Mining graph data. volume 50, pages 321354,
Hingham, MA, USA, 2003. Kluwer Academic
Publishers. - BBP06 V. Boginski, S. Butenko, and Pardalos.
P.M. Mining market data a network approach.
Computers and Operations Research,
33(11)31713184, 2006. - GRT05 D. Gibson, K. Ravi, and A. Tomkins.
Discovering large dense sub- graphs in massive
graphs. In VLDB05, pages 721732, Trondheim,
Norway, 2005. - Bla94 R.E. Blake. Partitioning graph matching
with constraints. volume 27, pages 439446, 1994.
29Reference (cont.)
- DT99 L. Dehaspe and H. Toivonen. Discovery of
frequent datalog patterns. Data Mining and
Knowledge Discovery, 3(7-36), 1999. - HCD94 L. Holder, D. Cook, and S. Djoko.
Substructure discovery in the SUBDUE system. In
Proceedings of the Workshop on Knowledge
Discovery in Databases, pages 169180, 1994. - MARW90 E.M. Mitchell, P.J. Artymiuk, D.W. Rice,
and P. Willett. Use of techniques derived from
graph theory to compare secondary structure
motifs in proteins. Journal of Molecular Biology,
212151166,1990. - MK01 K. Michihiro and G. Karypis. Frequent
subgraph discovery. In ICDM01, pages 313320,
2001. - RRRT99 K. Ravi, Prabhakar R., Sridhar R., and A
Tomkins. Trawling the web for emerging
cyber-communities. In Computer Networks, pages
14811493, 1999. - SK98 A. Srivastav and W. Katja. Finding dense
subgraphs with semidefinite programming. In
APPROX 98, pages 181191, London, UK, 1998.
Springer-Verlag. - ZWZK Z. Zeng, J. Wang, L. Zhou, and G. Karypis.
Coherent closed quasi-clique discovery from large
dense graph databases. In KDD06, Philadelphia,
USA.
30Proof A DN-graph is a local maximum graph
31Proof DN-graph and Closed Clique
322
2
3
2
2
3
2
1
4
2
1
332
2
3
2
2
3
2
1
4
2
1