Frequent Subgraph Discovery - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Frequent Subgraph Discovery

Description:

FSG algo. Canonical Labeling. Conclusion. Introduction. n Most of existing data mining algorithms assume that the data is. represented via ... – PowerPoint PPT presentation

Number of Views:423
Avg rating:3.0/5.0
Slides: 15
Provided by: makingCsi
Category:

less

Transcript and Presenter's Notes

Title: Frequent Subgraph Discovery


1
Frequent Subgraph Discovery
  • Michihiro Kuramochi and George Karypis
  • ICDM 2001

2
Outline
  • Introduction
  • FSG algo.
  • Canonical Labeling
  • Conclusion

3
Introduction
  • n Most of existing data mining algorithms assume
    that the data is
  • represented via
  • n Transactions (set of items)
  • n Sequence of items or events
  • n Multi-dimensional vectors
  • n Time series
  • n Scientific datasets with layers, hierarchy,
    geometry,
  • and arbitrary relations can not be
    accurately modeled using
  • such frameworks.
  • n e.g. chemical compounds

4
  • Graphs can
  • accurately model
  • and represent
  • scientific data sets
  • Graphs are suitable
  • for capturing
  • arbitrary relations
  • between the various
  • elements

5
Example
6
FSG(Frequent Subgraph Discovery Algorithm)
  • Level-by-level approach Incremental on the number
    of edges of the frequent subgraphs(like Apriori)
  • Counting of frequent single and double edge
    subgraphs
  • For finding frequent size k-subgraphs(k3)
  • Candidate generation
  • Joining two size (k-1) subgraphs similar to each
    other
  • Candidate pruning by downward closure property
  • Frequency counting
  • Check if a subgraph is contained in a transaction
  • Repeat the steps for k k 1
  • Increase the size of subgraphs by one edge

7
FSG ApproachCandidate Generation
  • Generate a size k-subgraph by merging two size
    (k-1) subgraphs

8
FSG ApproachCandidate Pruning
  • Pruning of size k-candidates
  • For all the (k-1)-subgraphs
  • of a size k-canidate,
  • check if downward
  • closure property
  • holds

9
FSG ApproachFrequency Counting
  • For each subgraph to scan each one of the
    transaction graphs and determine if it is
    contained or not using subgraph isomorphism
  • In Apriori,the frequency counting is performed
    substantially faster by building a hash-tree of
    candidate itemsets
  • To scan each transaction to determine which of
    the itemsets in the hash-tree it supports

10
FSG ApproachFrequency Counting
  • Keep track of the TID-List
  • Perform subgraph isomorphism only on the
    intersection of the TID lists of the parent
    frequent subgraphs of size k-1

11
FSG ApproachFrequency Counting
  • Perform only subgraph_isomorphism(c, T1)

12
Key to PerformanceCanonical Labeling
  • Given a graph, we want to find a unique order of
    vertices, by permuting rows and columns of its
    adjacency matrix

13
Canonical Labeling
  • Find the vertex order so that the matrix becomes
    lexicographically the Largest when we compare in
    the column-wise way

14
Conclusion
  • During both candidate generation and frequency
    counting, what FSG essentially does is to solve
    subgraph isomorphism
  • Canonical Labeling can reduce our search space
  • Using TID List
  • Reduce the number of subgraph isomorphism checks
  • Suitable for sparse graph transactions
Write a Comment
User Comments (0)
About PowerShow.com