On Triangulation-based Dense Neighbourhood Graph Discovery - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

On Triangulation-based Dense Neighbourhood Graph Discovery

Description:

On Triangulation-based Dense Neighbourhood Graph Discovery Nan Wang Jingbo Zhang Kian-Lee Tan Anthony K. H. Tung School of Computing National University of Singapore – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 33

Provided by: suchen

Learn more at: http://www.vldb.org

Category:

more less

Transcript and Presenter's Notes

Title: On Triangulation-based Dense Neighbourhood Graph Discovery

1
On Triangulation-based Dense Neighbourhood Graph
Discovery
Nan Wang Jingbo Zhang Kian-Lee Tan Anthony K. H. Tung
School of Computing National University of
Singapore
2
Outline

Motivation
Related Work
Terms Definition
Triangulation based DN-graph mining
Semi-streaming DN-graph model
Experimental Study
Future Work and Conclusion

3
Motivation

Define dense graph pattern from the perspective
that considers both the size of the substructure
and the minimum level of interactions between
vertices.
Locate dense patterns within unsolvable
restricted resources for large scale graphs.

4
Related Work

Other Dense Patterns
Clique/Quasi-Clique
High Degree Patterns
Dense Bipartite Patterns
Heavy Patterns
Triangle Counting
CSV
Density-based closed cliques discovery and a
linear fashion visualization.

5
Terms Definition
6
Terms Definition (contd)

7
DN-graph

Proof
8
Restrictions on the minimal size of the shared
neighborhood
9
DN-graph and Other Dense Patterns
DN-Graph
10
DN-graph and Closed Clique

Proof
11
Computation Bottleneck in DN-graph Mining

Most sub-graphs are not DN-graphs Most of these
operations are redundant
12
How to tackle the bottleneck ?

Reduce number of joins
Local maximal feature two DN-graphs share no
edge.
All edges sharing common vertices and local
maximal ? values comprising of the DN-graph
Locating DN-graph using ?(e) value
All edges within DN-graph have equal ?(e) , noted
as ?max
All edges connecting to neighboring vertices have
a smaller ? values ?(e) ?(u,v) lt ?max while u
not in G, v in G
Use approximating methods to compute ?(e)
efficiently

e
14
Graph Triangulation

Given a graph triangle, the upper bound of the
other two edges can be used to tighten the
density estimation of the third edge.

?(w,v) 3
w
v
?(u,w) 3
?(u,v)5
u
15
Triangulation Based DN-graph Mining

DN-graph Mining Algorithm
Step One Sort vertices according to their
degrees.
Step Two Generate triangles in a streaming
fashion.
Step Three Obtain the local density information
gradually along the triangle streams.
Initial Upper Bound TC(e) the number of
triangles an edge participates in.

16
Counting of Supporting Nodes
Not Supporting Node
n2
n3
n1
5
6
8
n4
4
7
5
5
3
a
b

4
17

18
Convergence
Converge
First Iteration
Second Iteration
Initialization
Two Support Vertices
One Support Vertex
2
V5
The local maximal neighborhood size ??2
2
??(V2V3) decreases by one
??(V2V6) decreases by one
??(V3V6) decreases by one
V6
3
2
2
Edge(e) Edge(e)

2
3
2
V3
2
1
3
2
V1V2
V2V6
V4
3
2
1
V1V3
V3V6
1
4
3
2
2
4
3
V2V3
V3V5
2
2
2
2
V2V4
V2V5
V2
V3V4
V5V6
V1
1
2
2
2
V4V6
19
Semi-Streaming Graph Model

Graph vertices fit into main memory, while edges
are in the secondary storage, in the form of
adjacency list.
Random access in primary storage (i.e. memory)
and only sequential access in secondary storage.
As a feasible solution towards a streaming graph
G(V,E), it should not exceed log V scans of Gs
adjacency list.

20
DN-graph mining in semi-streaming model

Estimating shared neighbor size using min-wise
independent set property.
Min-wise independent set property Two sets A, B
over a universe X, and a uniformly chosen
permutation p over X. If there is a total order
in X, then the probability that min(p(A))
min(p(B)) is the same as the Jaccard Coefficient
J(A, B) (n(A)nn(B))/ (n(A)Un(B)).
We can use that to estimate shared neighbor size
(n(A)nn(B)).

21
Experimental Setting

Quad-Core AMD Opteron(tm) processor 8356
128GB memory
700 GB hard disk
OS Windows Server 2003

22
Experimental Study

Comparison with CSV on Stock Market Dataset

23
Convergence

Dataset Flickr graph (1.7million vertices and
22.6 million edges)
Running time per iteration is between 55 minutes
to 1 hour.

24
StreamDN Performance on Flickr Dataset

StreamDN over-estimates with respect to BiTriDN
algorithms results by 72 during the first 66
scans.
StreamDN can handle streaming setting with
reasonable accuracy.

25
DN-graph Semantics in Various Domain
26
Future work and Conclusion

DN-graph
DN-graph Mining Problem
Semi-streaming Approach
Future Work

27
Thank You Questions
28
Reference

WSTT08 N. Wang, P. Srinivasan, K.-L. Tan, and
A.K.H. Tung. CSV visualizing and mining cohesive
subgraphs. In SIGMOD08, pages 445458, 2008.
WZTT11 N. Wang, J. Zhang, K.-L. Tan, and A.K.H.
Tung. On triangulation-based dense neighbourhood
graph discovery. In VLDB11, volume 4, 2011.
ABC04 P. Aloy, BaPttcher, H. Ceulemans, C.
Leutwein, C. Mellwig, S. Fischer, and A.C. Gavin.
Structure-based assembly of protein complexes in
yeast. volume 303, pages 20262029, 2004.
ATH03 I. Akihiro, W. Takashi, and M. Hiroshi.
Complete mining of frequent patterns from graphs
Mining graph data. volume 50, pages 321354,
Hingham, MA, USA, 2003. Kluwer Academic
Publishers.
BBP06 V. Boginski, S. Butenko, and Pardalos.
P.M. Mining market data a network approach.
Computers and Operations Research,
33(11)31713184, 2006.
GRT05 D. Gibson, K. Ravi, and A. Tomkins.
Discovering large dense sub- graphs in massive
graphs. In VLDB05, pages 721732, Trondheim,
Norway, 2005.
Bla94 R.E. Blake. Partitioning graph matching
with constraints. volume 27, pages 439446, 1994.

29
Reference (cont.)

DT99 L. Dehaspe and H. Toivonen. Discovery of
frequent datalog patterns. Data Mining and
Knowledge Discovery, 3(7-36), 1999.
HCD94 L. Holder, D. Cook, and S. Djoko.
Substructure discovery in the SUBDUE system. In
Proceedings of the Workshop on Knowledge
Discovery in Databases, pages 169180, 1994.
MARW90 E.M. Mitchell, P.J. Artymiuk, D.W. Rice,
and P. Willett. Use of techniques derived from
graph theory to compare secondary structure
motifs in proteins. Journal of Molecular Biology,
212151166,1990.
MK01 K. Michihiro and G. Karypis. Frequent
subgraph discovery. In ICDM01, pages 313320,
2001.
RRRT99 K. Ravi, Prabhakar R., Sridhar R., and A
Tomkins. Trawling the web for emerging
cyber-communities. In Computer Networks, pages
14811493, 1999.
SK98 A. Srivastav and W. Katja. Finding dense
subgraphs with semidefinite programming. In
APPROX 98, pages 181191, London, UK, 1998.
Springer-Verlag.
ZWZK Z. Zeng, J. Wang, L. Zhou, and G. Karypis.
Coherent closed quasi-clique discovery from large
dense graph databases. In KDD06, Philadelphia,
USA.