Top Down FP-Growth for Association Rule Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Top Down FP-Growth for Association Rule Mining

Description:

Among them, Apriori is the classical algorithm in frequent pattern mining. Better than previous algorithms though, Apriori suffers drawbacks, such as ... – PowerPoint PPT presentation

Number of Views:674
Avg rating:3.0/5.0
Slides: 28
Provided by: llt7
Category:

less

Transcript and Presenter's Notes

Title: Top Down FP-Growth for Association Rule Mining


1
Top Down FP-Growth for Association Rule Mining
  • By
  • Ke Wang

2
Introduction
  • Classically, for rule A ? B
  • support computed by count( AB )
  • frequent --- if pass minimum support threshold
  • confidence computed by count( AB ) / count(A )
  • confident if pass minimum confidence threshold
  • How to mine association rules?
  • find all frequent patterns
  • generate rules from the frequent patterns

3
Introduction
  • Limitations of current research
  • use uniform minimum support threshold
  • only use support as pruning measure
  • Our contribution
  • improve efficiency
  • adopt multiple minimum supports
  • introduce confidence pruning

4
Related work -- Frequent pattern mining
  • Apriori algorithm
  • method use anti-monotone property of support to
    do pruning, i.e.
  • if length k pattern is infrequent, its length k1
    super-pattern can never be frequent
  • FP-growth algorithm--better than Apriori
  • method
  • build FP-tree to store database
  • mine FP-tree in bottom-up order

5
Related work -- Association rule mining
  • Fast algorithms trying to guarantee completeness
    of frequent patterns
  • Parallel algorithms association rule based
    query languages
  • Various association rule mining problems
  • multi-level multi-dimension rule
  • constraints on specific item

6
TD-FP-Growth for frequent pattern mining
  • Similar tree structure as FP-growth
  • Compressed tree to store the database
  • nodes on each path of the tree are globally
    ordered
  • Different mining method VS.FP-growth
  • FP-growth bottom-up tree mining
  • TD-FP-Growth top-down tree mining

7
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Construct a FP-tree
Entry value count side-link
a b c e 3 3 3 3
8
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
FP-growth bottom-up mining
Mining order e, c, b, a
item Head of node-link
a b c e
9
TD-FP-Growth for frequent pattern mining
  • FP-growth bottom-up mining

item Head of node-link
b c
? drawback!
10
TD-FP-Growth for frequent pattern mining
  • TD-FP-Growth adopt top-down mining strategy
  • motivation avoid building extra databases and
    sub-trees as FP-growth does
  • method process nodes on the upper level before
    those on the lower level
  • result any modification happened on the upper
    level nodes would not affect the lower level nodes

See example ?
11
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Mining order a, b, c, e
Entry value count side-link
a b c e 3 3 3 3
CT-tree and header table H
12
CT-tree for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Entry value count side-link
a b 2 2
Entry value count side-link
a b c e 3 3 3 3
13
CT-tree for frequent pattern mining
  • Completeness
  • for entry i in H, we mine all the frequent
    patterns that end up with item i, no more and no
    less
  • Complete set of frequent patterns
  • a
  • b
  • c , b, c , a, c
  • e , b, e , c, e , b, c, e

14
TD-FP-Growth for frequent pattern mining
  • Comparing to FP-growth, TD-FP-Growth is
  • Space saving
  • only one tree and a few header tables
  • no extra databases and sub-trees
  • Time saving
  • does not build extra databases and sub-trees
  • walk up path only once to update count
    information for nodes on the tree and build
    sub-header-tables.

15
TD-FP-Growth for association rule mining
  • Assumptions
  • There is a class-attribute in the database
  • Items in the class-attribute called class-items,
    others are non-class-items
  • Each transaction is associated a class-item
  • Only class-item appears in the right-hand of the
    rule

Transaction ID non-class-attribute class-attribute
1 a, b C1
2 d C2
3 e, d, f C3

example rule a, b ? Ci
16
TD-FP-Growth for association rule mining--multi
mini support
  • Why?
  • Use uniform minimum support, computation of count
    considers only number of appearance
  • Uniform minimum support is unfair to items that
    appears less but worth more.
  • Eg. responder vs. non-responder
  • How?
  • Use different support threshold for different
    class

17
TD-FP-Growth for association rule mining -- multi
mini support
  • multiple VS. uniform
  • C1 4, C 2 2
  • rules with relative minsup 50 proportional to
    each class -- multiplier in performance
  • uniform minimum support absolute minsup 1
  • 11 nodes tree, 23 rules
  • multiple minimum supports absolute minsup1 2
    absolute minsup2 1
  • 7 nodes tree, 9 rules
  • more effective and space-saving
  • time-saving --- show in performance

c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 c, e,
C2 b, c, d, C1
18
TD-FP-Growth for association rule mining--conf
pruning
  • Motivation
  • make use of the other constraint of association
    rule confidence, to speed up mining
  • Method
  • confidence is not anti-monotone
  • introduce acting constraint of confidence, which
    is anti-monotone
  • push it inside the mining process

19
TD-FP-Growth for association rule mining--conf
pruning
conf(A ? B) count(AB) / count(A) gt minconf ?
count(AB) gt count(A) minconf ?
count(AB) gt minsup minconf (anti-monotone
weaker)
--- the acting constraint of confidence for the
original confidence constraint of rule A ? B
  • support of rule is computed by count(A)
  • count(AB) class-count of itemset A related to
    class B

20
TD-FP-Growth for association rule mining--conf
pruning
c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 a, c,
d, C2 minsup 2 minconf 60
Entry value i count (i) count(i,C1) count(i,C2) side-link
a b c e f 2 2 3 2 3 1 1 2 1 3 1 1 1 1 0
count(e) gt minsup However, both count(e, C1)
count(e, C2) lt minsup minconf ? terminate
mining for e!
Entry value i count (i) count(i,Ci) count(i,C2) side-link
b 2 1 1
21
Performance
  • Choose several data sets from UC_Irvine Machine
    Learning Database Repository
    http//www.ics.uci.edu/mlearn/MLRepository.htm
    l.

name of dataset of transactions of items in each transaction class distribution of distinct items
Dna-train 2000 61 23.2, 24.25, 52.55 240
Connect-4 67557 43 9.55, 24.62, 65.83 126
Forest 581012 13 0.47, 1.63, 2.99, 3.53, 6.15, 36.36, 48.76 15916
22
Performancefrequent pattern
23
Performance mine rules with multiple minimum
supports
FP-growth is only for frequent pattern mining
relative minsup, proportional to each class
24
Performance mine rules with confidence pruning
25
Conclusions and future work
  • Conclusions of TD-FP-Growth algorithm
  • more efficient in finding both frequent patterns
    and association rules
  • more effective in mining rules by using multiple
    minimum supports
  • Introduce a new pruning method confidence
    pruning, and push it inside the mining process
    thus further speed up mining

26
Conclusions and future work
  • Future work
  • Explore other constraint-based association rule
    mining method
  • Mine association rules with item concept
    hierarchy
  • Apply TD-FP-Growth to applications based on
    association rule mining
  • Clustering
  • Classification

27
Reference
  • (1)   R. Agrawal, T. Imielinski, and A. Swami.
    Mining association rules between sets of items in
    large databases. Proc. 1993 ACM-SIGMOD Int. Conf.
    on Management of Data (SIGMOD93), pages 207-216,
    Washington, D.C., May 1993.
  • (2)   U. M. Fayyad, G. Piatetsky-Shapiro, P.
    Smyth, and R. Uthurusamy (eds.). Advances in
    Knowledge Discovery and Data Mining. AAAI/MIT
    Press, 1996.
  • (3)   H. Toivonen. Sampling large databases for
    association rules. Proc. 1996 Int. Conf. Very
    Large Data Bases (VLDB96), pages 134-145,
    Bombay, India, September 1996.
  • (4)   R. Agrawal and S. Srikant. Mining
    sequential patterns. Proc. 1995 Int. Conf. Data
    Engineering (ICDE95), pages 3-14, Taipei,
    Taiwan, March 1995.
  • (5)   J. Han, J. Pei and Y. Yin. Mining Frequent
    Patterns without Candidate Generation. Proc. 2000
    ACM-SIGMOD Int. Conf. on Management of Data
    (SIGMOD00), pages 1-12, Dallas, TX, May 2000.
  • (6) J. Han, J. Pei, G. Dong, and K. Wang.
    Efficient Computation of Iceberg Cubes with
    Complex Measures. Proc. 2001 ACM-SIGMOD Int.
    Conf., Santa Barbara, CA, May 2001.
  • And more!
Write a Comment
User Comments (0)
About PowerShow.com