Associative Classification of Imbalanced Datasets - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Associative Classification of Imbalanced Datasets

Description:

Downside of Support and Confidence. Mining Rules from Imbalanced Data Sets. Fisher's Exact Test ... A good (albeit stereotypical) rule is {Beer,Diaper} Male whose ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 34
Provided by: cha128
Category:

less

Transcript and Presenter's Notes

Title: Associative Classification of Imbalanced Datasets


1
Associative Classification of Imbalanced Datasets
  • Sanjay Chawla
  • School of IT
  • University of Sydney

2
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

3
Data Mining
  • Data Mining research has settled into an
    equilibrium involving four tasks

Pattern Mining (Association Rules)
Classification
DB
Clustering
Anomaly or Outlier Detection
ML
4
Association Rule Mining
  • In terms of impact nothing rivals association
    rule mining within the data mining community
  • SIGMOD 93 (4100 citations)
  • Agrawal, Imielinski, Swami
  • VLDB 94 (4900 Citations)
  • Agrawal, Srikant
  • C4.5 93 (7000 citations)
  • Ross Quinlan
  • Gibbs Sampling 84 (IEEE PAMI, 5000 citations)
  • Geman Geman
  • Content Addressable Network (3000)
  • Ratnasamy, Francis, Hadley, Karp

5
Association Rules (Agrawal, Imielinksi and
Swami, 93 SIGMOD)
  • An implication expression of the form X ? Y,
    where X and Y are itemsets
  • Example Milk, Diaper ? Beer
  • Rule Evaluation Metrics
  • Support (s)
  • Fraction of transactions that contain both X and
    Y
  • Confidence (c)
  • Measures how often items in Y appear in
    transactions thatcontain X

From Introduction to Data Mining, Tan,Steinbach
and Kumar
6
Mining Association Rules
  • Two-step approach
  • Frequent Itemset Generation
  • Generate all itemsets whose support ? minsup
  • Rule Generation
  • Generate high confidence rules from each frequent
    itemset, where each rule is a binary partitioning
    of a frequent itemset
  • Frequent itemset generation is computationally
    expensive

7
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

8
Associative Classifiers
  • Most of the Associative Classifiers are based on
    rules discovered using the support-confidence
    criterion.
  • The classifier itself is a collection of rules
    ranked using their support or confidence.

9
Associative Classifiers (2)
TID Items Gender
1 Bread, Milk F
2 Bread, Diaper, Beer, Eggs M
3 Milk Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer M
5 Bread, Milk, Diaper, Coke F
In a Classification task we want to predict the
class label (Gender) using the attributes
A good (albeit stereotypical) rule is
Beer,Diaper ? Male whose support is 60 and
confidence is 100
10
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

11
Imbalanced Data Set
  • In some application domains, Data Sets are
    Imbalanced
  • The proportion of samples from one class is much
    smaller than the other class/classes.
  • And the smaller class is the class of interest.
  • Support and confidence are biased toward the
    majority class, and do not perform well in such
    cases.

12
Downsides of Support
  • Support is biased towards the majority class
  • Eg classes yes, no, sup(yes)90
  • minSup gt 10 wipes out any rule predicting no
  • Suppose X ? no has confidence 1 and support 3.
    Rule discarded if minSup gt 3 even though it
    perfectly predicts 30 of the instances in the
    minority class!

13
Downside of Confidence(1)

20 5 25
70 5 75
90 10 100
Conf(A? C) 20/25 0.8 Support(A?C) 20/100
0.2 Correlation between A and C
Thus, when the data set is imbalanced a high
support and high confidence rule may not
necessarily imply that the antecedent and the
consequent are positively correlated.
14
Downside of Confidence (2)
  • Reasonable to expect that for good rules the
    antecedent and consequent are not independent!
  • Suppose
  • P(ClassYes) 0.9
  • P(ClassYesX) 0.9

15
Downsides of Confidence (3)
  • Another useful observation
  • Higher confidence (support) for a rule in the
    minority class implies higher correlation, and
    lower correlation in the minority class implies
    lower confidence, but neither of these apply for
    the majority class.
  • Confidence (support) tends to bias the majority
    class.

16
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

17
Contingency Table
  • A 2 2 Contingency Table for X ? y.
  • We will use the notation a, b c, d to
    represent this table.

18
Fisher Exact Test
  • Given a table, a, b c, d, Fisher Exact Test
    will find the probability (p-value) of obtaining
    the given table under the hypothesis that X, X
    and y, y are independent.
  • The margin sums (?rows, ?cols) are fixed.

19
Fisher Exact Test (2)
  • The p-value is given by
  • We will only use rules whose p-values are below
    the level of significant desired (e.g. 0.01).
  • Rules that pass this test are statistically
    significant in the positively associated
    direction (e.g. X ? y).

20
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

21
Class Correlation Ratio
  • In Class Correlation, we are interested in rules
    X ? y where X is more positively correlated with
    y than it is with y.
  • The correlation is defined by

where T is the number of transactions n.
22
Class Correlation Ratio (2)
  • We then use corr() to measure how correlated X is
    with y compared to y.
  • X and y are positively correlated if corr(X?y)gt1,
    and negatively correlated if corr(X?y)lt1.

23
Class Correlation Ratio (3)
  • Based on correlation corr(), we define the Class
    Correlation Ratio (CCR)
  • The CCR measures how much more positively the
    antecedent is correlated with the class it
    predicts (e.g. y), relative to the alternative
    class (e.g. y).

24
Class Correlation Ratio (4)
  • We only use rules with CCR higher than a desired
    threshold, so that no rules are used that are
    more positively associated with the classes they
    do not predict.

25
The two measurements
  • We perform the following tests to determine
    whether a potentially interesting rule is indeed
    interesting
  • Check the significant of a rule X ? y by
    performing the Fishers Exact Test.
  • Check whether CCR(X?y) gt 1.
  • Those rules that pass the above two tests are
    candidates for the classification task.

26
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

27
Search and Pruning Strategies
  • To avoid examining the whole set of possible
    rules, we use search strategies that ensure the
    concept of being potential interesting is
    anti-monotonic
  • X?y might be considered as potential interesting
    if and only if all X?yX in X have been found
    to be potentially interesting.

28
Search and Pruning Strategies (2)
  • The contingency table a, b c, d used to test
    for the significance of the rule X ? y in
    comparison to one of its generalizations X-z ?
    y for the Aggressive search strategy.

29
Example
  • Suppose we have already determined that the rules
    (A a1) ? 1 and (A a2) ? 1 are significant.
  • Now we want to test if X(A a1) (Aa2) ? 1 is
    significant
  • Then we carry out a FET and calculate the CCR on
    X and X Aa1 (i.e. z a2)and X and X-Aa2
    (i.e. z a1).
  • If the minimum of their p-value is less than the
    significance level, and their CCR is greater than
    1, we keep the X? 1 rule, otherwise we discard it.

30
Ranking Rules
  • Strength Score (SS)
  • In order to determine how interesting a rule is,
    we need a ranking (ordering) of the rules, and
    the ordering is defined by the Strength Score.

31
Overview
  • Data Mining Tasks
  • Associative Classifiers
  • Downside of Support and Confidence
  • Mining Rules from Imbalanced Data Sets
  • Fishers Exact Test
  • Class Correlation Ratio (CCR)
  • Searching and Pruning Strategies
  • Experiments

32
Experiments (Balanced Data)
  • The preceding approach is represented by
    SPARCCC.
  • The experiments on Balanced Data Sets show that
    the average accuracy of SPARCCC compares
    favourably to CBA and C4.5.
  • The table below is the prediction accuracy on
    balanced data sets.

33
Experiments (Imbalanced Data)
  • True Positive Rate (Recall/Sensitivity) is a
    better performance measure for imbalanced data
    sets.
  • SPARCCC overcomes other rule based techs such as
    CBA and CCCS.
  • The table below is True Positive Rate of the
    Minority Class on Imbalanced version of the
    Datasets.

34
References
  • Florian Verhein, Sanjay Chawla.Using
    Significant, Positively Associated and Relatively
    Class Correlated Rules For Associative
    Classification of Imbalanced Datasets.The 2007
    IEEE International Conference on Data Mining .
    Omaha NE, USA. October 28-31, 2007.
Write a Comment
User Comments (0)
About PowerShow.com