On Triangulation-based Dense Neighbourhood Graph Discovery - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

On Triangulation-based Dense Neighbourhood Graph Discovery

Description:

On Triangulation-based Dense Neighbourhood Graph Discovery Nan Wang Jingbo Zhang Kian-Lee Tan Anthony K. H. Tung School of Computing National University of Singapore – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 33
Provided by: suchen
Learn more at: http://www.vldb.org
Category:

less

Transcript and Presenter's Notes

Title: On Triangulation-based Dense Neighbourhood Graph Discovery


1
On Triangulation-based Dense Neighbourhood Graph
Discovery
Nan Wang Jingbo Zhang Kian-Lee Tan Anthony K. H. Tung
School of Computing National University of
Singapore
2
Outline
  • Motivation
  • Related Work
  • Terms Definition
  • Triangulation based DN-graph mining
  • Semi-streaming DN-graph model
  • Experimental Study
  • Future Work and Conclusion

3
Motivation
  • Define dense graph pattern from the perspective
    that considers both the size of the substructure
    and the minimum level of interactions between
    vertices.
  • Locate dense patterns within unsolvable
    restricted resources for large scale graphs.

4
Related Work
  • Other Dense Patterns
  • Clique/Quasi-Clique
  • High Degree Patterns
  • Dense Bipartite Patterns
  • Heavy Patterns
  • Triangle Counting
  • CSV
  • Density-based closed cliques discovery and a
    linear fashion visualization.

5
Terms Definition
6
Terms Definition (contd)
  •  

7
DN-graph
  •  

Proof
8
Restrictions on the minimal size of the shared
neighborhood
9
DN-graph and Other Dense Patterns
DN-Graph
10
DN-graph and Closed Clique
  •  

Proof
11
Computation Bottleneck in DN-graph Mining
  •  

Most sub-graphs are not DN-graphs Most of these
operations are redundant
12
How to tackle the bottleneck ?
  • Reduce number of joins
  • Local maximal feature two DN-graphs share no
    edge.
  • All edges sharing common vertices and local
    maximal ? values comprising of the DN-graph
  • Locating DN-graph using ?(e) value
  • All edges within DN-graph have equal ?(e) , noted
    as ?max
  • All edges connecting to neighboring vertices have
    a smaller ? values ?(e) ?(u,v) lt ?max while u
    not in G, v in G
  • Use approximating methods to compute ?(e)
    efficiently

13
 
  •  

 
e
14
Graph Triangulation
  • Given a graph triangle, the upper bound of the
    other two edges can be used to tighten the
    density estimation of the third edge.

?(w,v) 3
w
v
?(u,w) 3
?(u,v)5
u
15
Triangulation Based DN-graph Mining
  • DN-graph Mining Algorithm
  • Step One Sort vertices according to their
    degrees.
  • Step Two Generate triangles in a streaming
    fashion.
  • Step Three Obtain the local density information
    gradually along the triangle streams.
  • Initial Upper Bound TC(e) the number of
    triangles an edge participates in.

16
Counting of Supporting Nodes
Not Supporting Node
n2
n3
n1
5
6
8
n4
4
7
5
5
3
a
b
 
4
17
 
18
Convergence
Converge
First Iteration
Second Iteration
Initialization
Two Support Vertices
One Support Vertex
2
V5
The local maximal neighborhood size ??2
2
??(V2V3) decreases by one
??(V2V6) decreases by one
??(V3V6) decreases by one
V6
3
2
2
Edge(e) Edge(e)






2
3
2
V3
2
1
3
2
V1V2
V2V6
V4
3
2
1
V1V3
V3V6
1
4
3
2
2
4
3
V2V3
V3V5
2
2
2
2
V2V4
V2V5
V2
V3V4
V5V6
V1
1
2
2
2
V4V6
19
Semi-Streaming Graph Model
  • Graph vertices fit into main memory, while edges
    are in the secondary storage, in the form of
    adjacency list.
  • Random access in primary storage (i.e. memory)
    and only sequential access in secondary storage.
  • As a feasible solution towards a streaming graph
    G(V,E), it should not exceed log V scans of Gs
    adjacency list.

20
DN-graph mining in semi-streaming model
  • Estimating shared neighbor size using min-wise
    independent set property.
  • Min-wise independent set property Two sets A, B
    over a universe X, and a uniformly chosen
    permutation p over X. If there is a total order
    in X, then the probability that min(p(A))
    min(p(B)) is the same as the Jaccard Coefficient
    J(A, B) (n(A)nn(B))/ (n(A)Un(B)).
  • We can use that to estimate shared neighbor size
    (n(A)nn(B)).

21
Experimental Setting
  • Quad-Core AMD Opteron(tm) processor 8356
  • 128GB memory
  • 700 GB hard disk
  • OS Windows Server 2003

22
Experimental Study
  • Comparison with CSV on Stock Market Dataset

23
Convergence
  • Dataset Flickr graph (1.7million vertices and
    22.6 million edges)
  • Running time per iteration is between 55 minutes
    to 1 hour.

24
StreamDN Performance on Flickr Dataset
  • StreamDN over-estimates with respect to BiTriDN
    algorithms results by 72 during the first 66
    scans.
  • StreamDN can handle streaming setting with
    reasonable accuracy.

25
DN-graph Semantics in Various Domain
26
Future work and Conclusion
  • DN-graph
  • DN-graph Mining Problem
  • Semi-streaming Approach
  • Future Work

27
Thank You Questions
28
Reference
  • WSTT08 N. Wang, P. Srinivasan, K.-L. Tan, and
    A.K.H. Tung. CSV visualizing and mining cohesive
    subgraphs. In SIGMOD08, pages 445458, 2008.
  • WZTT11 N. Wang, J. Zhang, K.-L. Tan, and A.K.H.
    Tung. On triangulation-based dense neighbourhood
    graph discovery. In VLDB11, volume 4, 2011.
  • ABC04 P. Aloy, BaPttcher, H. Ceulemans, C.
    Leutwein, C. Mellwig, S. Fischer, and A.C. Gavin.
    Structure-based assembly of protein complexes in
    yeast. volume 303, pages 20262029, 2004.
  • ATH03 I. Akihiro, W. Takashi, and M. Hiroshi.
    Complete mining of frequent patterns from graphs
    Mining graph data. volume 50, pages 321354,
    Hingham, MA, USA, 2003. Kluwer Academic
    Publishers.
  • BBP06 V. Boginski, S. Butenko, and Pardalos.
    P.M. Mining market data a network approach.
    Computers and Operations Research,
    33(11)31713184, 2006.
  • GRT05 D. Gibson, K. Ravi, and A. Tomkins.
    Discovering large dense sub- graphs in massive
    graphs. In VLDB05, pages 721732, Trondheim,
    Norway, 2005.
  • Bla94 R.E. Blake. Partitioning graph matching
    with constraints. volume 27, pages 439446, 1994.

29
Reference (cont.)
  • DT99 L. Dehaspe and H. Toivonen. Discovery of
    frequent datalog patterns. Data Mining and
    Knowledge Discovery, 3(7-36), 1999.
  • HCD94 L. Holder, D. Cook, and S. Djoko.
    Substructure discovery in the SUBDUE system. In
    Proceedings of the Workshop on Knowledge
    Discovery in Databases, pages 169180, 1994.
  • MARW90 E.M. Mitchell, P.J. Artymiuk, D.W. Rice,
    and P. Willett. Use of techniques derived from
    graph theory to compare secondary structure
    motifs in proteins. Journal of Molecular Biology,
    212151166,1990.
  • MK01 K. Michihiro and G. Karypis. Frequent
    subgraph discovery. In ICDM01, pages 313320,
    2001.
  • RRRT99 K. Ravi, Prabhakar R., Sridhar R., and A
    Tomkins. Trawling the web for emerging
    cyber-communities. In Computer Networks, pages
    14811493, 1999.
  • SK98 A. Srivastav and W. Katja. Finding dense
    subgraphs with semidefinite programming. In
    APPROX 98, pages 181191, London, UK, 1998.
    Springer-Verlag.
  • ZWZK Z. Zeng, J. Wang, L. Zhou, and G. Karypis.
    Coherent closed quasi-clique discovery from large
    dense graph databases. In KDD06, Philadelphia,
    USA.

30
Proof A DN-graph is a local maximum graph
  •  

31
Proof DN-graph and Closed Clique
32
2
2
3
2
2
3
2
1
4
2
1
33
2
2
3
2
2
3
2
1
4
2
1
Write a Comment
User Comments (0)
About PowerShow.com