CLAN: An algorithm for mining closed cliques from large dense graph database - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CLAN: An algorithm for mining closed cliques from large dense graph database

Description:

CLAN: An algorithm for mining closed cliques from large dense graph database. ICDE 2006 ... We develop a new algorithm, CLAN, to mine the frequent closed cliques. ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 21
Provided by: non102
Category:

less

Transcript and Presenter's Notes

Title: CLAN: An algorithm for mining closed cliques from large dense graph database


1
  • CLAN An algorithm for mining closed cliques from
    large dense graph database
  • ICDE 2006

2
Outline
  • Introduction
  • Problem definition
  • CLAN
  • Empirical results
  • Conclusion

3
Introduction
  • Most previously proposed frequent graph mining
    algorithms mine the complete set of frequent (or
    closed) subgraphs without considering the
    underlying topology of these subgraphs.
  • Mining complete set of all frequent subgraphs
    wastes considerable computing power and space.
  • Most of this research only studies mining clique
    patterns from a single graph (ex Maximum clique
    or enumerating all the cliques) , which is
    different from the problem studied in this paper,
    that is, mining frequent closed cliques from a
    set of graph transactions.
  • We develop a new algorithm, CLAN, to mine the
    frequent closed cliques.
  • Several effective pruning methods are proposed to
    prune the search space, while the clique closure
    checking scheme is used to remove the non-closed
    clique patterns.

4
Problem definition
back1 back2
5
Problem definition
6
Problem definition
  • D graph transaction database which consists of
    a set of undirected graph transactions.
  • G graph transaction, a tuple G V ,E ,LV ,FV
    , LV is the set of vertex labels, and
  • FV V -gt LV maps the vertices to their
    labels.
  • C clique, a set of fully connected labeled
    vertices and denoted by a tuple C V ,LV ,
    FV ,a clique of size n is also called a
    n-clique.
  • A clique C1 V1 , LV1 , FV1 is isomorphic to
    another clique C2 V2 , LV2 , FV2 iff
  • V1 V2 and there exists a bijection f V1
    -gt V2 s.t.
  • If a clique C is isomorphic to a subgraph of
    another clique C, C is called a subclique of C
    and C is called a superclique of C.
  • Embedding exist a fully connected subgraph g of
    a graph G can form a clique and g is isomorphic
    to a clique C, we call g an embedding of C in G.

7
Problem definition
  • C is supported by a graph G, if there is at least
    one embedding of C in G.
  • supD(C) the number of graph transactions in
    graph database D that support C.
  • If supD(C) min_sup, C is called a frequent
    clique.
  • If there does not exist another clique C s.t. C
    C and supD(C) supD(C) , C is called a
    closed clique

fig1
8
CLAN
  • Canonical form
  • Definition 4.1 (Canonical form)
  • Given a clique C, its canonical form is defined
    as the minimum string among all its possible
    strings and denoted by CFC
  • If two strings canonical forms are identical,
    they are isomorphic.
  • A string Sa a1a2an is called a substring of
    another string Sb b1b2 bm(denoted by Sa
    Sb), if there exist integers 1 i1 lt i2 lt lt
    in m s.t. a1bi1 ,, anbin.

9
CLAN
  • Lemma 4.1 (subclique relationship test)
  • Given any two cliques, Ca and Cb , with
    canonical forms CFCa and CFCb , respectively. Ca
    Cb holds iff CFCa CFCb holds.
  • The subclique relationship between two cliques
    can be converted to a substring relationship
    between their corresponding canonical forms.

10
CLAN
  • Lattice-like structure
  • canonical formsupport
  • Edge mean a direct subclique
  • relationship.
  • Mining frequent cliques becomes how to traverse
    the lattice-like structure to enumerate frequent
    cliques
  • DFS
  • a2,ab2, abc2, abcd2,abd2,
  • abcd2, ac2, abc2, abcd2,

11
CLAN
  • Structural redundancy pruning
  • Lemma 4.2 (Prefix closure property of canonical
    form)
  • Given any clique C and its canonical form CFC,
    any non-empty prefix of CFC represents the
    canonical form of a certain clique.
  • EX
  • For the current prefix clique C with canonical
    form CFC , let the last vertex label of CFC be c.
  • Grow C in order to mine larger cliques, we
    require C be extended only with vertices whose
    labels are lexicographically no smaller than c.
  • If the current prefix clique is ac2, we can
    only grow it with vertices whose labels are from
    c, d, e

12
CLAN
  • The 19 frequent cliques in running example will
    be mined in such an order
  • a2, ab2, abc2, abcd2, abd2, ac2, acd2,
    ad2, b2,bc2, bcd2, bd2, bde2, be2, c2,
    cd2, d2, de2, e2
  • Pseudo low-degree vertex pruning
  • Observation 4.1 (Low-degree vertex pruning)
  • No vertex with a degree lower than (k-1) can be
    contained in a k-clique.

13
CLAN
  • Clique closure checking scheme
  • If there exists any vertex with a label ß ,
    which can be used to grow C to get a
    (k1)-clique, C s.t. supD(C) supD(C) , C
    must be non-closed.
  • ß is called a new extension vertex if ß ak, or
    an old extension vertex if ßltak
  • Lemma 4.3 (Clique closure checking)
  • If there exists no new extension vertex nor old
    extension vertex w.r.t. a pre?x k-clique C, C
    must be closed, otherwise C must be non-closed.

fig4
14
CLAN
  • Let the current prefix clique be C, the number of
    vertices in C be k,
  • Canonical form be CFC
  • Assume there are totally m embeddings of C in the
    graph database G and the set of embeddings is
    denoted by EMB(C) ,
  • For any embedding ,a vertex v is
    an extension to ,if there exists an
    edge between v and every vertex in
  • Let denote the set of extension vertices
    of the ith embedding .
  • A vertex v is called a fully
    connected vertex in , if there is an edge
    between vertex v
  • and any vertex in
  • The set of fully connected vertex labels in
    is denoted by
  • and let
  • If , and
    and we call a non-closed
    extension vertex label w.r.t. clique C.

15
CLAN
16
Empirical results
  • The stock market data w.r.t. a certain period of
    time can be converted to a graph based on the
    cross correlations of price fluctuations.
  • Each stock is a vertex whose label is the
    corresponding stock index
  • If correlation coefficient , there is an
    edge.

17
Empirical results
18
Empirical results
19
Empirical results
20
Conclusion
  • Its efficient for large dense graph databases
  • Clique closure checking scheme
  • Bipartite?
Write a Comment
User Comments (0)
About PowerShow.com