Association Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Association Rules

Description:

Only a small subset of derived rules might be meaningful/useful ... Confidence is not discriminative enough criterion. Beyond original support & confidence ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 25
Provided by: mir135
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Association Rules


1
Association Rules Correlations
  • Basic concepts
  • Efficient and scalable frequent itemset mining
    methods
  • Apriori, and improvements
  • FP-growth
  • Rule postmining visualization and validation
  • Interesting association rules.

2
Rule Validations
  • Only a small subset of derived rules might be
    meaningful/useful
  • Domain expert must validate the rules
  • Useful tools
  • Visualization
  • Correlation analysis

3
Visualization of Association Rules Plane Graph
4
Visualization of Association Rules (SGI/MineSet
3.0)
5
Pattern Evaluation
  • Association rule algorithms tend to produce too
    many rules
  • many of them are uninteresting or redundant
  • confidence(A ?B) p(BA) p(A B)/p(A)
  • Confidence is not discriminative enough criterion
  • Beyond original support confidence
  • Interestingness measures can be used to
    prune/rank the derived patterns

6
Application of Interestingness Measure
7
Computing Interestingness Measure
  • Given a rule X ? Y, information needed to compute
    rule interestingness can be obtained from a
    contingency table

Contingency table for X ? Y
Y Y
X f11 f10 f1
X f01 f00 fo
f1 f0 T
  • Used to define various measures
  • support, confidence, lift, Gini, J-measure,
    etc.

8
Drawback of Confidence
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
9
Statistical-Based Measures
  • Measures that take into account statistical
    dependence

Does X lift the probability of Y? i.e.
probability of Y given X over probability of Y.
This is the same as interest factor I 1
independence, Igt 1 positive association (lt1
negative)
)

(
X
Y
P

Lift
)
(
Y
P
)
,
(
Y
X
P

Interest
)
(
)
(
Y
P
X
P
-

)
(
)
(
)
,
(
Y
P
X
P
Y
X
P
PS
Many other measures PS Piatesky-Shapiro
10
Example Lift/Interest
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
  • Association Rule Tea ? Coffee
  • Confidence P(CoffeeTea) 0.75
  • but P(Coffee) 0.9
  • Lift 0.75/0.9 0.8333 (lt 1, therefore is
    negatively associated)

11
Drawback of Lift Interest
Statistical independenceIf P(X,Y)P(X)P(Y) gt
Lift 1
Y Y
X 10 0 10
X 0 90 90
10 90 100
Y Y
X 90 0 90
X 0 10 10
90 10 100
  • Lift favors infrequent items
  • Other criteria proposed Gini, J-measure, etc.

12
There are lots of measures proposed in the
literature Some measures are good for certain
applications, but not for others What criteria
should we use to determine whether a measure is
good or bad? What about Apriori-style support
based pruning? How does it affect these measures?
13
Association Rules Correlations
  • Basic concepts
  • Efficient and scalable frequent itemset mining
    methods
  • Apriori, and improvements
  • FP-growth
  • Rule derivation, visualization and validation
  • Multi-level Associations
  • Summary

14
Multiple-Level Association Rules
  • Items often form hierarchy.
  • Items at the lower level are expected to have
    lower support.
  • Rules regarding itemsets at
  • appropriate levels could be quite useful.
  • Transaction database can be encoded based on
    dimensions and levels
  • We can explore shared multi-level mining

15
Mining Multi-Level Associations
  • A top_down, progressive deepening approach
  • First find high-level strong rules
  • milk bread
    20, 60.
  • Then find their lower-level weaker rules
  • 2 milk wheat
    bread 6, 50.
  • Variations at mining multiple-level association
    rules.
  • Level-crossed association rules
  • 2 milk Wonder wheat bread
  • Association rules with multiple, alternative
    hierarchies
  • 2 milk Wonder bread

16
Multi-level Association Uniform Support vs.
Reduced Support
  • Uniform Support the same minimum support for all
    levels
  • One minimum support threshold. No need to
    examine itemsets containing any item whose
    ancestors do not have minimum support.
  • Lower level items do not occur as frequently.
    If support threshold
  • too high ? miss low level associations
  • too low ? generate too many high level
    associations
  • Reduced Support reduced minimum support at lower
    levels
  • There are 4 search strategies
  • Level-by-level independent
  • Level-cross filtering by k-itemset
  • Level-cross filtering by single item
  • Controlled level-cross filtering by single item

17
Uniform Support
Multi-level mining with uniform support
Milk support 10
Level 1 min_sup 5
2 Milk support 6
Skim Milk support 4
Level 2 min_sup 5
Back
18
Reduced Support
Multi-level mining with reduced support
Level 1 min_sup 5
Milk support 10
2 Milk support 6
Skim Milk support 4
Level 2 min_sup 3
Back
19
Multi-level Association Redundancy Filtering
  • Some rules may be redundant due to ancestor
    relationships between Example
  • milk ? wheat bread support 8, confidence
    70
  • Say that 2Milk is 25 of milk sales, then
  • 2 milk ? wheat bread support 2, confidence
    72
  • We say the first rule is an ancestor of the
    second rule.
  • A rule is redundant if its support is close to
    the expected value, based on the rules
    ancestor.

20
Multi-Level Mining Progressive Deepening
  • A top-down, progressive deepening approach
  • First mine high-level frequent items
  • milk (15), bread
    (10)
  • Then mine their lower-level weaker frequent
    itemsets
  • 2 milk (5),
    wheat bread (4)
  • Different min_support threshold across
    multi-levels lead to different algorithms
  • If adopting the same min_support across
    multi-levels
  • then toss t if any of ts ancestors is
    infrequent.
  • If adopting reduced min_support at lower levels
  • then examine only those descendents whose
    ancestors support is frequent/non-negligible.

21
Association Rules Correlations
  • Basic concepts
  • Efficient and scalable frequent itemset mining
    methods
  • Apriori, and improvements
  • FP-growth
  • Rule derivation, visualization and validation
  • Multi-level Associations
  • Temporal associations and frequent sequences
  • Other association mining methods
  • Summary
  • Temporal associations and frequent sequences
    later

22
Other Association Mining Methods
  • CHARM Mining frequent itemsets by a Vertical
    Data Format
  • Mining Frequent Closed Patterns
  • Mining Max-patterns
  • Mining Quantitative Associations e.g., what is
    the implication between age and income?
  • Constraint-base association mining
  • Frequent Patterns in Data Streams very
    difficult problem. Performance is a real issue
  • Constraint-based (Query-Directed) Mining
  • Mining sequential and structured patterns

23
Summary
  • Association rule mining
  • probably the most significant contribution from
    the database community in KDD
  • New interesting research directions
  • Association analysis in other types of data
    spatial data, multimedia data, time series data,
  • Association Rule Mining for Data Streams a very
    difficult challenge.

24
Statistical Independence
  • Population of 1000 students
  • 600 students know how to swim (S)
  • 700 students know how to bike (B)
  • 420 students know how to swim and bike (S,B)
  • P(S?B) 420/1000 0.42
  • P(S) ? P(B) 0.6 ? 0.7 0.42
  • P(S?B) P(S) ? P(B) gt Statistical independence
  • P(S?B) gt P(S) ? P(B) gt Positively correlated
  • P(S?B) lt P(S) ? P(B) gt Negatively correlated
Write a Comment
User Comments (0)
About PowerShow.com