Data Mining II: Association Rule mining - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Data Mining II: Association Rule mining

Description:

Output: a Prime generalised relation. Method: ... Derive the prime relation P. 9/24/09. Acc 522 Fall 2001 Jagdish S. Gangolly. 6 ... – PowerPoint PPT presentation

Number of Views:371
Avg rating:3.0/5.0
Slides: 13
Provided by: alb3
Category:

less

Transcript and Presenter's Notes

Title: Data Mining II: Association Rule mining


1
Data Mining II Association Rule mining
Classification
  • Jagdish Gangolly
  • State University of New York at Albany

2
Data Mining II
  • Attribute-Oriented Induction
  • Mining association rules
  • Mining single-dimensional boolean association
    rules
  • Classification

3
Attribute-Oriented Induction I
  • Steps
  • Original query (in DMQL)
  • specify the database to be mined
  • specify relevant attributes
  • specify the relation to be mined
  • specify the concept in the hierarchy
  • Transformation of DMQL to relational query whose
    execution yields initial working relation.

4
Attribute-Oriented Induction II
  • Attribute removal/generalisation
  • removal rule remove attribute if
  • no generalisation operator on the attribute
    (large set of attribute values, but
    nogeneralisation operator)
  • higher level concepts in the hierarchy expressed
    in terms of other attributes (address example)
  • generalisation rule if there are many attribute
    values and there are generalisation operators,
    use them
  • attribute generalisation threshold control

5
Basic Algorithm for Attribute-Oriented induction
  • Input Relational database, DMQL query, a list
  • of attributes, a set of concept
    hierarchies,
  • attribute generalisation
    thresholds
  • Output a Prime generalised relation
  • Method
  • Collect task-relevant data into a working
    relation get W
  • Collect statistics on the working relation
  • Derive the prime relation P.

6
Mining association rules I
  • Some examples
  • Market basket analysis analysing customer buying
    habits
  • Intrusion detection by analysing user habits

7
Mining association rules II
  • Basic concepts
  • Set of items I
  • Task-relevant data D consisting of
    database transactions T ? I
  • An association rule is an implication of the form
  • A ? B
  • where A ? I, B ? I, A ? B ?
  • support(A ? B) P(A?B)
  • confidence(A ? B ) P(B/A)

8
Mining association rules II
  • Classification of association rules
  • Based on types of values
  • Booleancomputer ? financial-management-software
  • Quantitative association ruleage(X, 30..39) ?
    income(X, 42K..48K) ? buys(X,
    financial-management-software)
  • Based on dimensions of data involved in the
    rulebuys(X, computer) ? buys(X,
    financial-management-software)

9
Mining association rules III
  • Based on levels of abstraction age(X, 30..39)
    ? buys(X, laptop) age(X, 30..39) ? buys(X,
    computer)

10
Mining single-dimensional boolean association
rules I
  • Apriori algorithm for finding frequent itemsets
  • Apriori property (All nonempty subsets of a
    frequent itemset must also be frequent). If P(I)
    lt min_sup, then for any item A, P(I?A) lt min_sup
  • Steps
  • Join step A set of candidate k-itemsets, denoted
    by Ck , generated by joining Lk-1 with itself.
  • Prune step Prune Ck
  • Example 6-1 (p.232)

11
Classification I
  • Supervised learning
  • Training data
  • Test data
  • Training data is analysed to derive
    classification rules the test data are used to
    estimate the accuracy of classification rules
  • Unsupervised learning or clustering

12
Classification II
  • Preliminary steps
  • data cleaning (reduction of noise, missing
    values, etc.)
  • relevance analysis (feature selection)
  • data transformation (generalisation,
    normalisation)
  • Comparison/evaluation of methods
  • Predictive accuracy
  • speed
  • Robustness
  • Scalability
  • Interpretability
Write a Comment
User Comments (0)
About PowerShow.com