Apriori Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Apriori Algorithm

Description:

(Agrawal, Imielinski & Swami: SIGMOD '93) ... What itemsets do you count? Search ... the cost of checking whether a candidate itemset is contained in a ... – PowerPoint PPT presentation

Number of Views:2518
Avg rating:3.0/5.0
Slides: 18
Provided by: ramakr1
Learn more at: http://www.cs.cmu.edu
Category:
Tags: algorithm | apriori | is | swami | what

less

Transcript and Presenter's Notes

Title: Apriori Algorithm


1
Apriori Algorithm
  • Rakesh Agrawal
  • Ramakrishnan Srikant

2
Association Rules(Agrawal, Imielinski Swami
SIGMOD 93)
  • I i1, i2 , im a set of literals, called
    items.
  • Transaction T a set of items such that T ? I.
  • Database D a set of transactions.
  • A transaction T contains X, a set of some items
    in I, if X ? T.
  • An association rule is an implication of the form
    X ? Y, where X, Y ? I.
  • Support of transactions in D that contain X ?
    Y.
  • Confidence Among transactions that contain X,
    what also contain Y.
  • Find all rules that have support and confidence
    greater than user-specified minimum support and
    minimum confidence.

3
Computing Association Rules Problem
Decomposition
  • Find all sets of items that have minimum support
    (frequent itemsets).
  • Use the frequent itemsets to generate the desired
    rules.
  • confidence ( X ? Y ) support ( X ? Y ) /
    support ( X )

What itemsets should you count? How do you
count them efficiently?
4
What itemsets do you count?
  • Search space is exponential.
  • With n items, nCk potential candidates of size k.
  • Anti-monotonicity Any superset of an infrequent
    itemset is also infrequent (SIGMOD 93).
  • If an itemset is infrequent, dont count any of
    its extensions.
  • Flip the property All subsets of a frequent
    itemset are frequent.
  • Need not count any candidate that has an
    infrequent subset (VLDB 94)
  • Simultaneously observed by Mannila et al., KDD
    94
  • Broadly applicable to extensions and
    restrictions.

5
Apriori Algorithm Breadth First Search
6
Apriori Algorithm Breadth First Search
7
Apriori Algorithm Breadth First Search
8
Apriori Algorithm Breadth First Search
9
Apriori Algorithm Breadth First Search
10
Apriori Algorithm Breadth First Search
11
APRIORI Candidate Generation(VLDB 94)
  • Lk Frequent itemsets of size k, Ck Candidate
    itemsets of size k
  • Given Lk, generate Ck1 in two steps
  • Join Step Join Lk with Lk, with the join
    condition that the first k-1 items should be the
    same and l1k lt l2k.

L3
a b c
a b d
a c d
a c e
b c d
C4
a b c d
a c d e
12
APRIORI Candidate Generation(VLDB 94)
  • Lk Frequent itemsets of size k, Ck Candidate
    itemsets of size k
  • Given Lk, generate Ck1 in two steps
  • Join Step Join Lk with Lk, with the join
    condition that the first k-1 items should be the
    same and l1k lt l2k.
  • Prune Step Delete all candidates which have a
    non-frequent subset.

C4
a b c d
a c d e
L3
a b c
a b d
a c d
a c e
b c d
13
How do you count?
  • Given a set of candidates Ck, for each
    transaction T
  • Find all members of Ck which are contained in T.
  • Hash-tree data structure VLDB 94
  • C2
  • T c, e, f
  • a b c d
    e f g

a, b
e, f
e, g

14
How do you count?
  • Given a set of candidates Ck, for each
    transaction T
  • Find all members of Ck which are contained in T.
  • Hash-tree data structure VLDB 94
  • C2
  • T c, e, f
  • a b c d
    e f g


a, b
e, f
e, g
15
How do you count?
  • Given a set of candidates Ck, for each
    transaction T
  • Find all members of Ck which are contained in T.
  • Hash-tree data structure VLDB 94
  • C2
  • T c, e, f
  • a b c d
    e f g


a, b
e, f
e, g
f
g
Recursively construct hash tables if number of
itemsets is above a threshold.

16
Impact
  • Concepts in Apriori also applied to many
    generalizations, e.g., taxonomies, quantitative
    Associations, sequential Patterns, graphs,
  • Over 3000 citations in Google Scholar.

17
Subsequent Algorithmic Innovations
  • Reducing the cost of checking whether a candidate
    itemset is contained in a transaction
  • TID intersection.
  • Database projection, FP Growth
  • Reducing the number of passes over the data
  • Sampling Dynamic Counting
  • Reducing the number of candidates counted
  • For maximal patterns constraints.
  • Many other innovative ideas
Write a Comment
User Comments (0)
About PowerShow.com