Fast Algorithms for Mining Association Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Algorithms for Mining Association Rules

Description:

Fast Algorithms for Mining Association Rules ... Proceed inductively on itemset size Apriori Algorithm: 1. Base case: Begin with all minsup itemsets of size 1 (L1) ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 13
Provided by: whj
Category:

less

Transcript and Presenter's Notes

Title: Fast Algorithms for Mining Association Rules


1
Fast Algorithms for Mining Association Rules
  • Rakesh Agrawal and Ramakrishnan Srikant
  • VLDB '94
  • presented by
  • kurt partridge
  • cse 590db
  • oct 4, 1999

2
Mining Association Rules
  • DB of "Basket Data"
  • TID items
  • 100 1 3 4
  • 200 2 3 5
  • 300 1 2 3 5
  • 400 2 5
  • association rules
  • 1 gt 3
  • 2,3 gt 5
  • 2,5 gt 3

association rule metrics
3
General Strategy
  • Step I Find all itemsets with minimum support
    (minsup)
  • Step II Generate rules from minsup'ed itemsets

4
Step I Finding Minsup Itemsets
  • Key fact Adding items to an itemset never
    increases its support
  • General Strategy Proceed inductively on itemset
    size
  • Apriori Algorithm
  • 1. Base case Begin with all minsup itemsets of
    size 1 (L1)
  • 2. Without peeking at the DB, generate candidate
    itemsets ofsize k (Ck) from Lk-1
  • 3. Remove candidate itemsets that contain
    unsupported subsets
  • 4. Further refine Ck using the database to
    produce Lk

5
Algorithm to Guess Itemsets
  • Naïve way
  • Extend all itemsets with all possible items
  • More sophisticated
  • Join Lk-1 with itself, adding only a single,
    final item
  • e.g. 1 2 3, 1 2 4, 1 3 4, 1 3 5, 2, 3,
    4 produces1 2 3 4 and 1 3 4 5
  • Remove itemsets with an unsupported subset
  • e.g. 1 3 4 5 has an unsupported subset 1 4
    5 if minsup 50
  • Use the database to further refine Ck

6
Example
7
Part II Generating Rules
  • Key fact
  • Moving items from the antecedent to the
    consequent never changes support, and never
    increases confidence
  • Algorithm
  • For each itemset IS with minsup
  • Find all minconf rules with a single consequent
    of the form (IS - L1 gt L1 )
  • Guess candidate consequents Ck by appending items
    from IS - Lk-1 to Lk-1
  • Verify confidence of each rule IS - Ck gt Ck
    using known itemset support values

repeat
8
Other Details
  • Itemset hash trees for subset testing
  • Buffering
  • Variations
  • Fewer database passes, itemsets from multiple
    iterations
  • AprioriTID -- exclude unnecessary database
    records
  • AprioriHybrid -- use either Apriori or AprioriTID
  • Future Work
  • Multiple ISA Taxonomies
  • constraints on rules (e.g. of items)

9
Subsequent Papers
  • Mining sequenced rules
  • Finding "interesting" rules
  • Efficiently handling long itemsets
  • Integration with query optimizers
  • Adjustments to handle dense/relational databases
  • Apply constraints to further filter association
    rules

10
Questions
  • How are rules ranked? Do the minsup and minconf
    find interesting rules? Do they omit any
    interesting rules?
  • What about maximum support?
  • How well will this approach work for other
    problems (e.g. clustering, classification)?

11
Apriori
12
Apriori
  • Join operation
  • Subset filtering
Write a Comment
User Comments (0)
About PowerShow.com