Title: CSEngMtCpEng 404 Data Mining
1CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
- Daniel C. St. Clair, PhD Christopher Merz, PhD
- University of MO Rolla Mastercard
- Lect 10 Association Rules
2Lecture 10 Contents
- Questions
- Association Rules
- Semester project discussion
CM
CM
3Questions/Announcements
- Just a reminder Lecture notes are INCOMPLETE
you will need to complete during lecture - When submitting problems, label the files
- Lltgt_ltprobgt.ppt
4Schedule
5The Knowledge Discovery Process
- Ch. 4.1 7.1
- Quinlan C 4.5
- Experimental Design
- Quinlan C 4.5
- Unknown values
- Pruning
- Neural networks
- Classical methods
- Association Rules
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
6Tools for Data Mining Knowledge Discovery
Machine Learning C4.5 (ID3) Association
Rules CN2 Genetic AQ17 EBL CBR
Neural Networks Backprop RBF
SOM ART2
Statistical/ Mathematical Correlation Regression
Logistic Reg. Clustering Autoclass
Fuzzy Set Theory Fuzzy Logic Fuzzy ID3 ..
7Topics to Be Covered in Association Rules
- What is an association?
- Metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
8What is an Association Rule?
Implication
Not necessarily for prediction
Antecedent(s)
Consequent(s)
E.g., 70 of people who buy soda and beer also
buy nachos
9Why use Association Rules?
- Data Understanding
- Which attributes are strongly related?
- People who shop at home depot also shop at?
- Model Understanding
- Compare / contrast highest-scoring decile with
lowest-scoring decile - Can help to explain the behavior of complex
models like neural networks - Segmentation
- Cross marketing
10Example Applications
- Catalog design
- place the pillows near the couch section
- Add-on sales
- people who bought this book also bought
- Store layout
- place birthday candles near birthday cards
11What kind of data are used?
Apples 3 Dog food 15 Flea collar 4
- Transaction data
- store receipt
- Attribute-value data
- project data
- Typically categorical, but may be used with
partitioned numeric attributes
12Class Reading
- Fast Algorithms for Mining Association Rules
(1994) Â Â Rakesh Agrawal, Ramakrishnan
Srikant Proc. 20th Int. Conf. Very Large Data
Bases, VLDB - Sections 1-2.1.1
13Topics to Be Covered in Association Rules
- What is an association?
- Definitions and Metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
14Definitions
- Let
- I be the set of items
- D by the set of transactions
- T be a transaction
- where
15Confidence
- The rule X gt Y has confidence c if c of
transactions in D that contain X also contain Y
16Support
- The rule X gt Y has support s in the transaction
set D if s of transaction in D contain X union Y
17Graphic Example
XgtY
D
txns with items in Y
txns with items in X
.
.
.
.
.
.
.
.
.
.
.
.
Support 3/18 Confidence 3/8
.
.
.
.
.
.
18Example
Rule Diaper, Milk gt Beer Support 3, 4 /
all 2/5 .4 Confidence 3, 4 / 3, 4,
5 2/3 .66
19Topics to Be Covered in Association Rules
- What is an association?
- Definitions and metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
20Apriori Principle
- Collect single item counts. Find large items.
- Find candidate pairs, count them gt large pairs
of items. - Find candidate triplets, count them gt large
triplets of items, and so on - Guiding principle Every subset of a frequent
item set has to be frequent - Used to prune many candidate sets
Adapted from R. Grossman, C. Kamath, And V.
Kumar, Data Mining for Scientific And
Engineering Applications
21Illustrating Apriori Principle
Items (1-itemsets)
Minimum Support 3
Triplets (3-itemsets)
Example from R. Grossman, C. Kamath, And V.
Kumar, Data Mining for Scientific And
Engineering Applications
22Discovering All Association Rules
- Find all item sets that have transaction support
above minimum support - The support of an item set is the number of
transactions that contain the item set - Items with minimum support are called large item
sets
23Discovering All Association Rules
- Use the large item sets to generate the rules
- For every large item set l, find all non-empty
subsets - For every such subset, a, output a rule of the
form a gt (l - a) - Keep rules such that
- support(l) / support(a) gt min_confidence
24Rule Generation
- What are all the rule candidates for the item set
Beer, Bread, Diapers?
25Topics to Be Covered in Association Rules
- What is an association?
- Definitions and metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
26Stadium Heartburn Data
- beer hotdogs ice_cream
- beer nachos soda
- hotdogs ice_cream nachos
- beer hotdogs ice_cream nachos
- hotdogs ice_cream
- beer hotdogs nachos
- nachos soda
- beer hotdogs ice_cream nachos
- ice_cream nachos soda
- beer hotdogs ice_cream
27Example
- What association rule has the highest confidence
with nachos as a consequent? - ??? gt nachos
- or
- nachos lt ???
28Soda gt Nachos
D
txns with nachos
txns with soda
7
3
2
4
9
8
6
Support 3/10 Confidence 3/3
1
10
5
29Results
Confidence
Support
- nachos lt- soda (30.0, 100.0)
- hotdogs lt- beer (60.0, 83.3)
- ice_cream lt- hotdogs (70.0, 85.7)
- hotdogs lt- ice_cream (70.0, 85.7)
- nachos lt- soda beer (10.0, 100.0)
30Topics to Be Covered in Association Rules
- What is an association?
- Definitions and metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
31Attribute-value Example
- Take several attributes from Work at Home data
set - Convert them to descriptive values (i.e., not
PESMJ1-1) - Execute apriori to explore the associations in
the data - Which rules are intuitive?
32Results (Part 1)
- no_other_electronics lt- (100.0, 93.3)
- telephone lt- flex_hours (14.6, 84.0)
- no_other_electronics lt- flex_hours (14.6,
90.0) - telephone lt- reason_business_conducted_from_home
(19.2, 90.2) - no_other_electronics lt- reason_business_conducted_
from_home (19.2, 92.8)
33Results (Part 2)
- no_fax lt- no_telephone (23.3, 93.2)
- no_other_electronics lt- no_telephone (23.3,
97.8) - telephone lt- no_flex_hours (25.5, 82.5)
- no_other_electronics lt- no_flex_hours (25.5,
93.8) - no_other_electronics lt- reason_nature_of_job
(26.5, 94.3) - email_or_internet lt- fax (35.3, 85.9)
- telephone lt- fax (35.3, 95.5)
34Topics to Be Covered in Association Rules
- What is an association?
- Definitions and metrics
- The apriori algorithm
- A market basket example
- An attribute-value example
- Using the apriori algorithm
35Using Apriori
- Weka Explorer panel
- Associate tab
- In-class example
36Homework 10
- In the Stadium Heartburn Data,
- a.) derive an association rule with two items in
the antecedent with the consequent ice_cream?
Compute the support and confidence. - b.) What are 5 possible rules for the item set
beer, hotdogs, nachos? Compute their support
and confidence. (Assume min_confidence0) - Choose 4 or 5 categorical attributes from your
class project data set and run apriori in Weka.
List 3 rules that you find interesting?