New Algorithms for Enumerating All Maximal Cliques - PowerPoint PPT Presentation

About This Presentation
Title:

New Algorithms for Enumerating All Maximal Cliques

Description:

Title: PowerPoint Presentation Last modified by: uno Created Date: 1/1/1601 12:00:00 AM Document presentation format: Other titles – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 23
Provided by: acjp
Category:

less

Transcript and Presenter's Notes

Title: New Algorithms for Enumerating All Maximal Cliques


1
New Algorithms for Enumerating All Maximal
Cliques
  • Kazuhisa Makino Takeaki Uno
  • Osaka University National
    Institute of
  • JAPAN
    Informatics, JAPAN
  • 9/Jul/2004 SWAT 2004

2
Background
  • Recently, Enumeration algorithms are interesting
  • There are still many unsolved nice problems
  • (unlike to ordinal
    discrete algorithms)
  • Recent increase of computer power makes
  • many enumeration problems practically
    solvable
  • ? many applications have been appearing,
  • such as, genome, data mining, clustering,
    so on
  • Some (theoretical) algorithms use enumeration
    as subroutines
  • (recognition of perfect graph)

3
Background (cont.)
  • My institute has 100 researchers of informatics
  • At least 5 researchers (independently) use
    implementations of enumeration algorithms
  • Suppose that there are 100,000 researchers of
    informatics
  • in the world

5000 researchers use enumeration algorithms ?????
4
Problems and Results
  • Problem1 for a given graph G(V, E),
  • enumerate all maximal
    cliques in G
  • Problem2 for a given bipartite graph G(V1?V2,
    E),
  • enumerate all maximal bipartite
    cliques in G
  • ( Problem2 is a special case of
    Problem1 )
  • We propose algorithms for solving these
    problems,
  • reduce the time complexity in dense cases and
    sparse cases.
  • Computational experiments for random graphs and
    real-world data

5
Difficulty
  • Consider branch-and-bound type enumeration
  • divide maximal cliques into two groups
  • maximal cliques including v / not including v
  • If a group includes no maximal clique, ? cut
    off the branch
  • ? Finding a maximal clique not including given
    vertices of S
  • is NP-Complete
  • ? Can not cut off subproblems(branches)
  • including no maximal clique

v1?K
v1?K
v2?K
v2?K
6
Existing Studies and Ours
  • O(VE) Tsukiyama, Ide, Ariyoshi
    Shirakawa,
  • O(VE), lexicographic order Johnson,
    Yanakakis Papadimitriou
  • O(a(G)E) Chiba Nishizeki
  • ( a(G) arboricity of G with m/(n-1) ? a(G)
    ?m1/2 )
  • many heuristic algorithms in data mining, for
    bipartite case

Ours O(V2.376) (dense case) O(?4)
(sparse case) O((?)4 ?3 ) (? vertices have
degree gt ? ) O(?3) (bipartite case) O(?2)
(bipartite case with using much memory)
7
Enumeration of Maximal Cliques
  • Improved version of algorithm of Tsukiyama et.
    al.
  • Idea Construct a route on all maximal cliques to
    be traversed
  • For a maximal clique K of G ( V, E )
  • C (K) lexicographically maximum maximal clique
    including K
  • K?i vertices of K with indices ? i
  • i(K) minimum index s.t. C(K?i) C(K?i1)
  • parent of a maximal clique K C(K?i(K)-1)
  • parent is lexicographically larger than K

Lexicographically larger
9
4
1
11
7
1,2,3 gt 1,2,4
10
3
1,3,6 gt 1,4,5
2
K
8
6
i(K)
5
8
Graph Representation of Relation
  • Parent-child relation is acyclic
  • ? graph representation forms a tree
    (enumeration tree)

Visit all maximal cliques by depth-first search
need to find children of a maximal clique
9
Child of Maximal Clique
  • G(vi) vertices adjacent to vi
  • Ki C ( K?i n G(vi) ? vi )
  • H is a child of K only if H Ki for some
    igti(K)
  • (H is a child of K if the parent of Ki is K )
  • i(Ki) i

construct Ki in O(E) time construct parent
in O(E) time ( O(?2
) time) for ii(K)1,,V in O(VE) time ?
enumerate O(VE) time per maximal
clique
K,i(K)6
9
4
1
11
7
10
3
2
8
6
5
10
Characterization of Child
  • The parent of Ki K ?
  • (1) no vj , jlti is adjacent to all vertices in
    K?i nG(vi) ? vi
  • (2) no vj , jlti is adjacent to all vertices in
    K?inG(vi) ? K?j
  • (1) is not satisfied ? Ki and parent of Ki
    includes vj?K
  • (2) is not satisfied ? parent of Ki includes
    vj?K

K 3,4,7,9 K10
3,7,10 K?5
3,4 K ?7nG(v10) 3,7
7
4
9
10
3
K ?10nG(v10)
? v10
11
Use of Matrix Multiplication
  • Check the conditions (1) and (2) by matrix
    multiplication
  • (1) no vj , jlti is adjacent to all vertices in K
    ?i nG(vi) ? vi
  • ith row of left ? K?inG(vi)?vi
  • jth column of right ? G(vj)
  • ij cell of product ? K?inG(vi)?vi n
    G(vj)

K?inG(vi)?vi ?
G(vj) n K ?i nG(vi) ? vi
K?inG(vi)?vi
G(vj)
Condition (2) can be checked in the same way
Checked in O( V2.368 ) time ? time complexity
is O( V2.368 ) for each
12
Sparse Cases
  • If vi is adjacent to no vertex in K
  • ? Ki C ( K?i n G(vi) ? vi ) C (vi)
  • ? parent of Ki C ( C (vi) ?i )
  • If C (vi) ?i f, parent of Ki is K0
  • If C (vi) ?i ?f, (1) is not satisfied
  • ? If K ? K0, Ki is not a child of K
  • Since K??1 , at most ?(?1) vertices are
    adjacent to K
  • Each Ki takes O(?2) time to construct the
    parent

? max. degree
O(?4 ) per maximal clique
13
Bipartite Clique
  • Enumerate maximal bipartite cliques in G (V1
    ?V2 ,E )
  • ( maximal cliques in G (V1 ?V2 , E ?V1 V1
    ?V2V2 ))
  • ? enumerated in O( V2.368 ) time for each
  • But a sparse bipartite graph will be dense
  • ? need some improvements for sparse cases

V1
V2
14
Fast Construction of Ki
  • For any maximal bipartite clique K
  • K nV2 nv?K nV1 G(v)
  • K nV1 nv?K nV2 G(v)
  • KinV1 for all i are computed in O(?2) time
  • Ki for all i are computed in O(?3) time

Kv1
Kv6
v1 v2 v5 v6
G(1)
G(2)
G(3)
G(4)
V1
1
2
3
4
V2
15
Checking the Parent
  • Put small indices to V1 , large indices to V2
  • ? Ki is a child of K ? Ki?i K?i
  • ? checked in O(?) time


1
V1
2
3
V1-1
V1
V2

V11
V12
V1
V2
Enumerated in O(?3) time for each
O(?2) by using memory
16
Computational Experiments
  • for graphs randomly generated
  • vertex vi is connected to vertices from i-r to
    ir with probability 1/2

Faster than Tsukiyamas algorithm
Computation time is linear in maximum degree
17
Benchmark Problems
  • Problem of finding frequent closed item sets
    from database
  • ? equivalent to maximal bipartite clique
    enumeration
  • Used on KDDcup (data mining algorithm
    competition )
  • BMS-WebView1 (from Web-log data)
  • V 60,000, ave. degree 2.5
  • BMS-WebView2 (from Web-log data)
  • V 80,000, ave. degree 5
  • BMS-POS (from POS data)
  • V 510,000, ave. degree 6
  • IBM-Artificial (artificial data)
  • V 100,000 , ave.degree 10

18
Results
19
Conclusion and Future Work
  • Proposed fast algorithms for enumerating
  • maximal cliques O(V2.376), O(?4 ), O((?)4
    ?3 )
  • maximal bipartite cliques O(V2.376), O(?3 ),
    O(?2)
  • Examined benchmark problems of data mining,
  • and showed that our algorithm performs well.
  • Future work
  • Can we improve more? What is the difficulty ?
  • Can we enumerate other maximal (minimal) graph
    objects ?
  • Can we apply matrix multiplication to other
    enumeration problems ?
  • What can be enumerated efficiently in practice
    ?

20
Frequent Sets
  • Input graph
  • An item and a customer is connected
  • iff the customer purchased the item
  • In a maximal bipartite clique
  • Customers have similar favorites
  • Items frequently purchased together
  • Agrawal et al. 96, Zaki et al. 02, Pei 00, Han
    00,

customer1 customer2 customer3 customer4
beer nappy milk
21
Few Large Degree Vertices
  • Very few vertices (denoted by T) have large
    degrees
  • Divide the maximal cliques into two groups
  • (a) cliques not included in T
  • (b) cliques included in T
  • (a) can be enumerated in O(?4) time
  • Maximal clique K in the induced graph by T is
  • a maximal clique of G ? K is not included in any
    of (a)
  • ? O(T3) time for each

small degree lt ?
large degree
O(?4 T3 ) per maximal clique
22
Avoid Duplications by Using Memory
  • We can avoid duplications by storing all maximal
    bipartite cliques
  • From K nV1 G(K nV2) , we store all K nV1
  • 1. Get a K from memory (which is un-operated)
  • 2. generate all KinV1
  • 3. Store each KinV1 if it is not in memory
  • 4. Go to 1 if a maximal clique is un-operated

Enumerated in O(?2) time for each
Write a Comment
User Comments (0)
About PowerShow.com