Fundamentos de Miner - PowerPoint PPT Presentation

About This Presentation
Title:

Fundamentos de Miner

Description:

Title: ART Author: Fernando Berzal Last modified by: Fernando Berzal Created Date: 7/16/2002 8:36:45 AM Document presentation format: Presentaci n en pantalla – PowerPoint PPT presentation

Number of Views:152
Avg rating:3.0/5.0
Slides: 45
Provided by: Fernando193
Category:

less

Transcript and Presenter's Notes

Title: Fundamentos de Miner


1
Fundamentos de Minería de Datos
  • Reglas de asociación

Fernando Berzalfberzal_at_decsai.ugr.es
2
Motivation
  • Association mining searches for interesting
    relationships among items in a given data set
  • EXAMPLES
  • Diapers and six-packs are bought together,
    specially on Thursday evening (a myth?)
  • A sequence such as buying first a digital camera
    and then a memory card is a frequent (sequential)
    pattern
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

3
Motivation
MARKET BASKET ANALYSIS The earliest form of
association rule mining Applications
Catalog design, store layout, cross-marketing
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

4
Definition
  • Item
  • In transactional databases
  • Any of the items included in a transaction.
  • In relational databases
  • (Attribute, value) pair
  • k-itemset
  • Set of k items
  • Itemset support support(I) P(I)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

5
Definition
  • Association rule
  • X ? Y
  • Support
  • support(X?Y) support(XUY) P(XUY)
  • Confidence
  • confidence(X?Y) support(XUY) / support(X)
  • P(YX)  
  • NOTE Both support and confidence are relative
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

6
Discovery
  • Association rule mining
  • Find all frequent itemsets
  • Generate strong association rules from the
    frequent itemsetsStrong association rules are
    those that satisfy both a minimum support
    threshold and a minimum confidence threshold.
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

7
Discovery
Apriori Observation All non-empty subsets
of a frequent itemset must also be
frequent Algorithm Frequent k-itemsets are
used to explore potentially frequent
(k1)-itemsets (i.e. candidates)
? Agrawal Skirant "Fast Algorithms for
Mining Association Rules", VLDB'94
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

8
Discovery
  • Apriori improvements (I)
  • Reducing the number of candidates? Park, Chen
    Yu "An Effective Hash-Based Algorithm for Mining
    Association Rules", SIGMOD'95
  • Sampling? Toivonen "Sampling Large Databases
    for Association Rules", VLDB'96? Park, Yu
    Chen "Mining Association Rules with Adjustable
    Accuracy", CIKM'97
  • Partitioning? Savasere, Omiecinski Navathe
    "An Efficient Algorithm for Mining Association
    Rules in Large Databases", VLDB'95
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

9
Discovery
  • Apriori improvements (II)
  • Transaction reduction? Agrawal Skirant "Fast
    Algorithms for Mining Association Rules", VLDB'94
    (AprioriTID)
  • Dynamic itemset counting? Brin, Motwani, Ullman
    Tsur "Dynamic Itemset Counting and Implication
    Rules for Market Basket Data", SIGMOD'97 (DIC)?
    Hidber "Online Association Rule Mining",
    SIGMOD'99 (CARMA)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

10
Discovery
Apriori-like algorithm TBAR (Tree-based
association rule mining)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

? Berzal, Cubero, Sánchez Serrano TBAR An
efficient method for association rule mining in
relational databases Data Knowledge
Engineering, 2001
11
Discovery TBAR
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

L1
7 instances wih A
6 instances with AB
L2
5 instances with AD
6 instances with BC
5 instances with ABD
L3
12
Discovery
An alternative to Apriori Compress the database
representing frequent items into a
frequent-pattern tree (FP-tree) ? Han, Pei
Yin "Mining Frequent Patterns without
Candidate Generation", SIGMOD'2000
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

13
Discovery
  • A challenge
  • When an itemset is frequent,all its subsets are
    also frequent
  • Closed itemset C There exists no proper
    super-itemset S such that support(S)support(C)
  • Maximal (frequent) itemset MM is frequent and
    there exists no super-itemset Y such that M?Y and
    Y is frequent.
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

14
Variations
  • Based on the kinds of patterns to be mined
  • Frequent itemset mining(transactional and
    relational data)
  • Sequential pattern mining(sequence data sets,
    e.g. bioinformatics)
  • Structured pattern mining(structured data, e.g.
    graphs)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

15
Variations
  • Based on the types of values handled
  • Boolean association rules
  • Quantitative association rules
  • Fuzzy association rules
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

? Delgado, Marín, Sánchez Vila Fuzzy
association rules General model and
applications IEEE Transactions on Fuzzy
Systems, 2003
16
Variations
  • More options
  • Generalized association rules(a.k.a. multilevel
    association rules)
  • Constraint-based association rule mining
  • Incremental algorithms
  • Top-k algorithms
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

ICDM FIMIWorkshop on Frequent Itemset Mining
Implementations http//fimi.cs.helsinki.fi/
17
Visualization
  • Integrated into data mining tools to help users
    understand data mining results
  • Table-based approache.g. SAS Enterprise Miner,
    DBMiner
  • 2D Matrix-based approache.g. SGI MineSet,
    DBMiner
  • Graph-based techniquese.g. DBMiner ball graphs
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

18
Visualization Tables
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

19
Visualization Visual aids
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

20
Visualization 2D Matrix
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

21
Visualization Graphs
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

22
Visualization VisAR
Based on parallel coordinates (Techapichetvanich
Datta, ADMA2005)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

23
Extensions
Confidence is not the best possibleinterestingne
ss measure for rules e.g. A very frequent item
will always appear in rule consequents,
regardless its true relationship with the rule
antecedent X went to war ? X did not serve in
Vietnam (from the US Census)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

24
Extensions
  • Desirable properties for interestingness
    measuresPiatetsky-Shapiro, 1991
  • P1 ACC(A?C) 0 when supp(A?C) supp(A)supp(C)
  • P2 ACC(A?C) monotonically increases with
    supp(A?C)
  • P3 ACC(A?C) monotonically decreases with supp(A)
    (or supp(C))
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

25
Extensions
  • Certainty factors
  • satisfy Piatetsky-Shapiros properties
  • are widely-used in expert systems
  • are not symmetric (as interest/lift)
  • can substitute conviction when CFgt0
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

? Berzal, Blanco, Sánchez VilaMeasuring the
accuracy and interest of association rules A new
framework", Intelligent Data Analysis, 2002
26
Extensions
References
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

? Hilderman Hamilton Evaluation of
interestingness measures for ranking discovered
knowledge. PAKDD, 2001
? Tan, Kumar Srivastava Selecting the right
objective measure for association analysis.
Information Systems, vol. 29, pp. 293-313, 2004.
? Berzal, Cubero, Marín, Sánchez, Serrano Vila
Association rule evaluation for classification
purposes TAMIDA2005
27
Applications
  • Two sample applications where associations rules
    have been successful
  • Classification (ART)
  • Anomaly detection (ATBAR)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

? Berzal, Cubero, Sánchez Serrano ART A
hybrid classification model Machine Learning
Journal, 2004
? Balderas, Berzal, Cubero, Eisman
Marín Discovering Hidden Association Rules
KDD2005, Chicago, Illinois, USA
28
Classification
  • Classification models based on association rules
  • Partial classification models
  • vg Bayardo
  • Associative classification models vg CBA
    (Liu et al.)
  • Bayesian classifiers
  • vg LB (Meretakis et al.)
  • Emergent patterns
  • vg CAEP (Dong et al.)
  • Rule trees
  • vg Wang et al.
  • Rules with exceptions
  • vg Liu et al.
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

29
Classification
GOAL Simple, intelligible, and robust
classification models obtained in an efficient
and scalable way MEANS
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

Decision Tree Induction Association Rule
Mining ART Association Rule Trees
30
ART Classification Model
IDEA Make use of efficient association rule
mining algorithms to build a decision-tree-shaped
classification model. ART Association Rule
Tree KEY Association rules else
branches Hybrid between decision trees and
decision lists
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

31
ART Classification Model
SPLICE
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

32
Construction
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

33
Construction
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR
  • Rule mining Candidate hypotheses
  • MinSupp Minimum support threshold
  • MinConf Minimum confidence threshold
  • Fixed threshold
  • Automatic selection

34
Construction
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR
  • Rule selection
  • Rules grouped by sets of attributes.
  • Preference criterion.

35
Example Dataset
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

36
Example Level 1 K 1
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR
  • LEVEL 1 Association rule mining
  • Minimum support threshold 20
  • Automatic confidence threshold selection

S1 if (Y0) then C0 with confidence 75 if
(Y1) then C1 with confidence 75 S2 if (Z0)
then C0 with confidence 75 if (Z1) then
C1 with confidence 75
37
Example Level 1 K 2
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR
  • LEVEL 1 Association rule mining
  • Minimum support threshold 20
  • Automatic confidence threshold selection

S1 if (X0 and Y0) then C0 (100) if (X0
and Y1) then C1 (100) S2 if (X1 and Z0)
then C0 (100) if (X1 and Z1) then C1
(100) S3 if (Y0 and Z0) then C0 (100)
if (Y1 and Z1) then C1 (100)
38
Example Level 1
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

LEVEL 1 Best rule set selection e.g. S1
S1 if (X0 and Y0) then C0 (100) if (X0
and Y1) then C1 (100)
X0 and Y0 C0 (2) X0 and Y1 C1 (2)
else ...
39
Example Level 1 ? Level 2
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

40
Example Level 2
ART classification model
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

LEVEL 2 Rule mining
S1 if (Z0) then C0 with confidence 100 if
(Z1) then C1 with confidence 100
RESULT
X0 and Y0 C0 (2) X0 and Y1 C1 (2)
else Z0 C0 (2) Z1 C1 (2)
41
Example ART vs. TDIDT
ART classification model
ART
TDIDT
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

42
Classifier accuracy
ART classification model gt Experimental results
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

43
Classifier complexity
ART classification model gt Experimental results
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

44
Training time
ART classification model gt Experimental results
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

45
I/O Operations - Scans
ART classification model gt Experimental results
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

46
I/O Operations - Records
ART classification model gt Experimental results
47
I/O Operations - Pages
ART classification model gt Experimental results
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

48
Final comments
ART classification model
  • Classification models
  • Acceptable accuracy
  • Reduced complexity
  • Attribute interactions
  • Robustness (noise primary keys)
  • Classifier building method
  • Efficient algorithm
  • Good scalability properties
  • Automatic parameter selection
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

49
Anomaly detection
  • It is often more interesting to find surprising
    non-frequent events than frequent ones
  • EXAMPLES
  • Abnormal network activity patterns in intrusion
    detection systems.
  • Exceptions to common rules in Medicine (useful
    for diagnosis, drug evaluation, detection of
    conflicting therapies)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

50
Anomaly detection
Anomalous association rule Confident rule
representing homogeneous deviations from common
behavior.
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

51
Anomaly detection
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

When X does not imply Y, then it usually implies
A (the Anomaly)
X
Y
?
confident
A
Anomalous association rule
X Y ? A
confident
52
Anomaly detection
X Y A1 Z1
X Y A1 Z2
X Y A2 Z3
X Y A2 Z1
X Y A3 Z2
X Y A3 Z3
X Y A Z
X Y3 A Z3
X Y3 A Z
X Y4 A Z
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

X ? Y is the dominant rule
X ? A when Yis the anomalous rule


53
Anomaly detection
Suzuki et al.s Exception Rules
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

X ? Y is an association rule
X ?
I
is the exception rule
Y
I is the interacting itemset
X ? I is the reference rule
  • Too many exceptions
  • The cause needs to be present

54
Anomaly detection ATBAR
Anomalous association rules
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

First scan
Second scan
55
Anomaly detection ATBAR
Anomalous association rules
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

First scan
Second scan
56
Anomaly detection ATBAR
Anomalous association rules Rule generation
is immediate from the frequent and extended
itemsets obtained by ATBAR
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

57
Anomaly detection Results
  • Experiments on health-related datasetsfrom the
    UCI Machine Learning Repository
  • Relatively small set of anomalous rules
    (typically, gt90 reduction with respect to
    standard association rules)
  • Reasonable overhead needed to obtain anomalous
    association rules(about 20 in ATBAR w.r.t. TBAR)
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

58
Anomaly detection Results
An example from the Census dataset
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

if WORKCLASS Local-gov then CAPGAIN
99999.0 , 99999.0 (7 out of 7) when not
CAPGAIN 0.0 , 20051.0
59
Anomaly detection Results
  • Anomalous association rules(novel
    characterization of potentially interesting
    knowledge)
  • An efficient algorithm for discovering anomalous
    association rules ATBAR
  • Some heuristics for filtering the discovered
    anomalous association rules
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR

60
Anomaly detection Future
  • Additional heuristics for focusing on interesting
    anomalies (maybe domain- or even
    application-specific).
  • Alternative measures for the evaluation and
    ranking of anomalous association rules
  • Certainty factors / Conviction
  • Motivation
  • Definition
  • Discovery
  • Variations
  • Visualization
  • Extensions
  • Applications
  • ART
  • ATBAR
Write a Comment
User Comments (0)
About PowerShow.com