Mining Generalized Association Rules - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Mining Generalized Association Rules

Description:

Mining Generalized Association Rules. R. Srikant & R. Agrawal (IBM) Presentation by: Colin Cherry ... How can we get them efficiently? How can we reduce rule ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Mining Generalized Association Rules


1
Mining Generalized Association Rules
  • R. Srikant R. Agrawal (IBM)
  • Presentation by Colin Cherry

2
Objectives
  • What are generalized association rules?
  • Why do we care?
  • How can we get them efficiently?
  • How can we reduce rule redundancy?
  • Is the efficient method any good?

3
Motivation
  • Association rules find rules of the form
  • X?Y, where X and Y are sets of items
  • What if there is structure over your items?
  • Structure can be used to generalize

4
Hierarchy Example
Beverage
Soft Drink

Cola

Pepsi
Coke


5
Hierarchy Example
On Sale
Not On Sale

  • Goal of this paper
  • Given hierarchies over items
  • Capture interesting rules at all levels of
    multiple hierarchies

6
Simple Fix
  • Just add parents to each transaction.
  • Coke, 7-up, ranch Doritos, bananas
  • would become
  • Coke, 7-up, ranch Doritos, bananas, Doritos,
    cola, clear pop, soft drink, chips, junk food,
    fruit, produce

7
Fix Contd
  • Run Apriori on expanded database
  • Redefine association rules
  • Make sure
  • X?Y
  • X?Y
  • Y contains no ancestors of any item in X

8
Problems with the fix
  • Counting may slow down
  • Total number of items average transaction size
    will grow
  • Could get a lot of redundant rules
  • Milk ? Cereal (70)
  • Skim Milk ? Cereal (70)
  • Do we care?

9
An Efficient Algorithm
  • Cumulate
  • Filtering ancestors added to transactions
  • Hierarchy-aware itemset pruning
  • For more complicated, speculative algorithms, see
    paper

10
Filtering Ancestors
  • Not counting soft drink? Dont add it.
  • Only add ancestors that are in at least one of
    the candidate itemsets
  • Delete any items we are not counting
  • Not counting Doritos? Replace with chips
  • Each iteration
  • Pre-compute the ancestors for each item

11
Itemset Pruning
  • No sense counting both coke,cola,chips and
    coke,chips, theyll always be the same
  • Take out coke,cola during count size2 and
    youll never have to deal with it

12
Reducing Redundancy
  • Milk ? Cereal (8 sup, 70 conf)
  • Skim Milk ? Cereal (2 sup, 70 conf)
  • If Skim Milk accounts for 1/4 of Milk sales, then
    the 2nd rule is redundant
  • Expected support and confidence (wrt hierarchy)
    will define interesting

13
Close Ancestors
  • An itemset Z is an ancestor of Z if
  • Z Z with some items replaced by ancestors
  • Z has the same number of items as Z
  • Z is a close ancestor of Z if
  • No ancestor of Z has Z as an ancestor
  • Take coke,bananas as Z
  • Zcola, bananas is a close ancestor
  • Zsoft drink, bananas is not close
  • Zcola,fruit is not close

14
Interestingness
  • A rule X?Y is interesting if for all interesting,
    close ancestors X?Y
  • Sup(X,Y) gt RExpSup(X,YX,Y)
  • or
  • Conf(X?Y) gt RExpConf(X?YX?Y)
  • R is defined by the user

15
Putting it all together
  • 1 is interesting - has no ancestor
  • 2 is interesting - twice expected support
  • 3 is not interesting
  • Has exactly expected support according to closest
    ancestor (2)

16
Experiments
  • Lots of experiments on artificial data in paper.
  • Well look at the results of using Cumulate on
    real data
  • Compare to the quick fix - just adding in
    ancestors to transactions

17
Supermarket
18
Department Store
19
Interestingness Results
  • Hierarchical Interestingness pruning
  • R 25 resulted in pruning roughly 40 of the
    rules
  • R 50 resulted in pruning roughly 50 of the
    reuslts
  • Pruning had a significant impact!

20
Objectives Revisited
  • What are generalized association rules?
  • Rules aware of hierarchies over items
  • Why do we care?
  • Support can be low for individual items
  • How can we get them efficiently?
  • Cumulate algorithm - hierarchy aware counting
  • How can we reduce rule redundancy?
  • Check surprise with respect to ancestors
  • Is the efficient method any good?
  • Yeap!

21
Questions?
  • ?

22
Hierarchy Example
Impulse
Fridge

Beverage

Cans
Bottles


23
Pros
  • Rules over items low in the tree may not have
    minimum support
  • Can raise min support
  • Shoot for fewer, more general rules
  • BUT You can catch rules at any level of the
    hierarchy

24
Data Sets
  • Supermarket
  • 500,000 items
  • 1.5 million transactions
  • Hierarchy has 4 levels, 118 roots
  • Department Store
  • 200,000 items
  • 500,000 transactions
  • Hierarchy has 7 levels, 89 roots

25
Summary
  • Nothing ground-breaking in this paper
  • But, it provides a solid, efficient method for
    working with hierarchies
  • Generalization is a powerful tool to have
    available in association rules
Write a Comment
User Comments (0)
About PowerShow.com