CSE 980: Data Mining - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CSE 980: Data Mining

Description:

An infrequent pattern is an itemset or rule whose support is less ... Application: LA-Times. 19. Application: LA-Times... 20. Application: Reuters-21578 news ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 25
Provided by: Computa3
Category:
Tags: cse | data | la | mining | times

less

Transcript and Presenter's Notes

Title: CSE 980: Data Mining


1
CSE 980 Data Mining
  • Lecture 12 Extension to Association Analysis
    Formulation

2
Mining Infrequent Patterns
  • An infrequent pattern is an itemset or rule whose
    support is less than minsup threshold
  • When do infrequent patterns become interesting?
  • Negative correlation
  • P(A,B) ltlt P(A)P(B)
  • e.g Windows vs Linux
  • Exception rules
  • (FireYes) ? (AlarmOff) may be infrequent, it is
    interesting because it may suggest a faulty alarm
    system
  • Challenge
  • There is an enormous number of infrequent patterns

3
Negative Associations
  • A negative itemset X is an itemset that satisfies
    the following properties
  • X A ? B where A is a set of positive items and
    B is a set of negative items
  • Support, s(X) minsup
  • A negative association rule r is a rule extracted
    from a negative itemset X and satisfies the
    following properties
  • S(X) minsup
  • Confidence(r) minconf
  • Example tea ? coffee

4
Negative vs Frequent Patterns
Venn diagram includes all possible patterns
extracted from a given itemset Negative patterns
are either negative itemsets or negative
association rules
5
Negative vs Frequent Patterns
minsup 40 minconf 50
  • Coke ? Milk (support40, conf 100)
  • ? frequent and strong association rule
  • Beer ? Milk (support40 conf 67, but
    support(Milk)80)
  • ?negatively-correlated pattern
  • Milk ? Eggs (support 80, conf
    100)
  • ? negative association rule

6
Approach 1 Using Negative Items
  • Computationally expensive
  • Tends to produce many uninteresting negative
    associations

7
Approach 1 Using Negative Items
Size 3
B
Size 2
B
A
A
Support of A,B, A,B and A,B can be very
large
C
C
8
Approach 2 Using Positive Itemsets
  • Boulicaut et al 2000
  • Compute support of negative itemsets based on the
    support of positive itemsets
  • e.g. X Y ? Z
  • e.g. s(ABCD) s(AB)-s(ABC)-s(ABD)s(ABCD)
  • To use this formula
  • Need to use a very low support threshold, or
  • Use approximation

s(X) support of X
9
Approach 3 Using Domain Knowledge
  • Approach
  • Compute expected support using item taxonomy
  • If actual support much lower than expected
    support, then declare it as a negative itemset
  • Challenges
  • there could be multiple taxonomies (based on
    brand, size, etc)
  • limited to nodes that are directly connected to
    the frequent itemsets

Suppose C and G are frequent
10
Approach 3 Using Domain Knowledge
  • A negative itemset is a set of items whose actual
    support is significantly lower than its expected
    support
  • Negative association rule X ? Y
  • Rule interest measure
  • Approach
  • Find frequent itemsets at each level of the
    taxonomy
  • Identify candidate negative itemsets based on the
    frequent itemsets found and their item taxonomy
  • Count actual support of candidate itemsets and
    retain only the negative itemsets
  • Generate negative association rules from negative
    itemsets

11
Approach 4 Indirect Association
a
M
b
THEN a and b are expected to occur frequently
together
  • a and b are indirectly associated via mediator M
  • M identifies the context in which the negative
    association is interesting

12
When does Indirect Association become interesting?
For all pairs of items
With Mediator
No Mediator
FM
FN
Frequent
Minimum itempair support
IM
IN
If
Infrequent
IM/FM IN/FN
then Indirect Association is not surprising
mediator thresholds
13
Finding Interesting Negative Associations
With Mediator
No Mediator
  • IM/FM is small
  • IM/IN is small
  • ? Indirect Association is interesting

Frequent
FM
FN
Infrequent
IN
IM
14
Finding Interesting Negative Association
Indirect Association is interesting when minimum
itempair support threshold is small. But, if
threshold is too low, very few indirect
associations are obtained.
15
Grouping Indirect Associations
  • Indirect associations can be grouped together
    into more compact structures if they have same
    mediator

Check degree of association
16
Mining Indirect Associations
Join step
Prune step
17
Application LA-Times
18
Application LA-Times
19
Application LA-Times
20
Application Reuters-21578 news
  • Indirect association can identify different
    contexts of a word

21
Application Reuters-21578 news
22
Application Reuters-21578 news
23
Application Retail Data
  • Indirect association can identify competing
    (sometimes) complementary items

24
Application Retail Data
  • Note There is no checkered-flag border wallpaper
Write a Comment
User Comments (0)
About PowerShow.com