CSEngMtCpEng 404 Data Mining - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CSEngMtCpEng 404 Data Mining

Description:

beer hotdogs ice_cream. 2002 by D. C. St. Clair. CS 404 Data Mining & Knowledge ... b.) What are 5 possible rules for the item set {beer, hotdogs, nachos} ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 37
Provided by: danst90
Category:

less

Transcript and Presenter's Notes

Title: CSEngMtCpEng 404 Data Mining


1
CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
  • Daniel C. St. Clair, PhD Christopher Merz, PhD
  • University of MO Rolla Mastercard
  • Lect 10 Association Rules

2
Lecture 10 Contents
  • Questions
  • Association Rules
  • Semester project discussion

CM
CM
3
Questions/Announcements
  • Homework from last week
  • Just a reminder Lecture notes are INCOMPLETE
    you will need to complete during lecture
  • When submitting problems, label the files
  • Lltgt_ltprobgt.ppt

4
Schedule
5
The Knowledge Discovery Process
  • Ch. 4.1 7.1
  • Quinlan C 4.5
  • Experimental Design
  • Quinlan C 4.5
  • Unknown values
  • Pruning
  • Neural networks
  • Classical methods
  • Association Rules

Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
6
Tools for Data Mining Knowledge Discovery
Machine Learning C4.5 (ID3) Association
Rules CN2 Genetic AQ17 EBL CBR
Neural Networks Backprop RBF
SOM ART2
Statistical/ Mathematical Correlation Regression
Logistic Reg. Clustering Autoclass
Fuzzy Set Theory Fuzzy Logic Fuzzy ID3 ..
7
Topics to Be Covered in Association Rules
  • What is an association?
  • Metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

8
What is an Association Rule?
Implication
Not necessarily for prediction
  • nachos lt- soda beer

Antecedent(s)
Consequent(s)
E.g., 70 of people who buy soda and beer also
buy nachos
9
Why use Association Rules?
  • Data Understanding
  • Which attributes are strongly related?
  • People who shop at home depot also shop at?
  • Model Understanding
  • Compare / contrast highest-scoring decile with
    lowest-scoring decile
  • Can help to explain the behavior of complex
    models like neural networks
  • Segmentation
  • Cross marketing

10
Example Applications
  • Catalog design
  • place the pillows near the couch section
  • Add-on sales
  • people who bought this book also bought
  • Store layout
  • place birthday candles near birthday cards

11
What kind of data are used?
Apples 3 Dog food 15 Flea collar 4
  • Transaction data
  • store receipt
  • Attribute-value data
  • project data
  • Typically categorical, but may be used with
    partitioned numeric attributes

12
Class Reading
  • Fast Algorithms for Mining Association Rules
    (1994)    Rakesh Agrawal, Ramakrishnan
    Srikant Proc. 20th Int. Conf. Very Large Data
    Bases, VLDB
  • Sections 1-2.1.1

13
Topics to Be Covered in Association Rules
  • What is an association?
  • Definitions and Metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

14
Definitions
  • Let
  • I be the set of items
  • D by the set of transactions
  • T be a transaction
  • where

15
Confidence
  • The rule X gt Y has confidence c if c of
    transactions in D that contain X also contain Y

16
Support
  • The rule X gt Y has support s in the transaction
    set D if s of transaction in D contain X union Y

17
Graphic Example
XgtY
D
txns with items in Y
txns with items in X
.
.
.
.
.
.
.
.
.
.
.
.
Support 3/18 Confidence 3/8
.
.
.
.
.
.
18
Example
Rule Diaper, Milk gt Beer Support 3, 4 /
all 2/5 .4 Confidence 3, 4 / 3, 4,
5 2/3 .66
19
Topics to Be Covered in Association Rules
  • What is an association?
  • Definitions and metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

20
Apriori Principle
  • Collect single item counts. Find large items.
  • Find candidate pairs, count them gt large pairs
    of items.
  • Find candidate triplets, count them gt large
    triplets of items, and so on
  • Guiding principle Every subset of a frequent
    item set has to be frequent
  • Used to prune many candidate sets

Adapted from R. Grossman, C. Kamath, And V.
Kumar, Data Mining for Scientific And
Engineering Applications
21
Illustrating Apriori Principle
Items (1-itemsets)
Minimum Support 3
Triplets (3-itemsets)
Example from R. Grossman, C. Kamath, And V.
Kumar, Data Mining for Scientific And
Engineering Applications
22
Discovering All Association Rules
  • Find all item sets that have transaction support
    above minimum support
  • The support of an item set is the number of
    transactions that contain the item set
  • Items with minimum support are called large item
    sets

23
Discovering All Association Rules
  • Use the large item sets to generate the rules
  • For every large item set l, find all non-empty
    subsets
  • For every such subset, a, output a rule of the
    form a gt (l - a)
  • Keep rules such that
  • support(l) / support(a) gt min_confidence

24
Rule Generation
  • What are all the rule candidates for the item set
    Beer, Bread, Diapers?

25
Topics to Be Covered in Association Rules
  • What is an association?
  • Definitions and metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

26
Stadium Heartburn Data
  • beer hotdogs ice_cream
  • beer nachos soda
  • hotdogs ice_cream nachos
  • beer hotdogs ice_cream nachos
  • hotdogs ice_cream
  • beer hotdogs nachos
  • nachos soda
  • beer hotdogs ice_cream nachos
  • ice_cream nachos soda
  • beer hotdogs ice_cream

27
Example
  • What association rule has the highest confidence
    with nachos as a consequent?
  • ??? gt nachos
  • or
  • nachos lt ???

28
Soda gt Nachos
D
txns with nachos
txns with soda
7
3
2
4
9
8
6
Support 3/10 Confidence 3/3
1
10
5
29
Results
Confidence
Support
  • nachos lt- soda (30.0, 100.0)
  • hotdogs lt- beer (60.0, 83.3)
  • ice_cream lt- hotdogs (70.0, 85.7)
  • hotdogs lt- ice_cream (70.0, 85.7)
  • nachos lt- soda beer (10.0, 100.0)

30
Topics to Be Covered in Association Rules
  • What is an association?
  • Definitions and metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

31
Attribute-value Example
  • Take several attributes from Work at Home data
    set
  • Convert them to descriptive values (i.e., not
    PESMJ1-1)
  • Execute apriori to explore the associations in
    the data
  • Which rules are intuitive?

32
Results (Part 1)
  • no_other_electronics lt- (100.0, 93.3)
  • telephone lt- flex_hours (14.6, 84.0)
  • no_other_electronics lt- flex_hours (14.6,
    90.0)
  • telephone lt- reason_business_conducted_from_home
    (19.2, 90.2)
  • no_other_electronics lt- reason_business_conducted_
    from_home (19.2, 92.8)

33
Results (Part 2)
  • no_fax lt- no_telephone (23.3, 93.2)
  • no_other_electronics lt- no_telephone (23.3,
    97.8)
  • telephone lt- no_flex_hours (25.5, 82.5)
  • no_other_electronics lt- no_flex_hours (25.5,
    93.8)
  • no_other_electronics lt- reason_nature_of_job
    (26.5, 94.3)
  • email_or_internet lt- fax (35.3, 85.9)
  • telephone lt- fax (35.3, 95.5)

34
Topics to Be Covered in Association Rules
  • What is an association?
  • Definitions and metrics
  • The apriori algorithm
  • A market basket example
  • An attribute-value example
  • Using the apriori algorithm

35
Using Apriori
  • Weka Explorer panel
  • Associate tab
  • In-class example

36
Homework 10
  • In the Stadium Heartburn Data,
  • a.) derive an association rule with two items in
    the antecedent with the consequent ice_cream?
    Compute the support and confidence.
  • b.) What are 5 possible rules for the item set
    beer, hotdogs, nachos? Compute their support
    and confidence. (Assume min_confidence0)
  • Choose 4 or 5 categorical attributes from your
    class project data set and run apriori in Weka.
    List 3 rules that you find interesting?
Write a Comment
User Comments (0)
About PowerShow.com