Title: Associative Classification of Imbalanced Datasets
1Associative Classification of Imbalanced Datasets
- Sanjay Chawla
- School of IT
- University of Sydney
2Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
3Data Mining
- Data Mining research has settled into an
equilibrium involving four tasks
Pattern Mining (Association Rules)
Classification
DB
Clustering
Anomaly or Outlier Detection
ML
4Association Rule Mining
- In terms of impact nothing rivals association
rule mining within the data mining community - SIGMOD 93 (4100 citations)
- Agrawal, Imielinski, Swami
- VLDB 94 (4900 Citations)
- Agrawal, Srikant
- C4.5 93 (7000 citations)
- Ross Quinlan
- Gibbs Sampling 84 (IEEE PAMI, 5000 citations)
- Geman Geman
- Content Addressable Network (3000)
- Ratnasamy, Francis, Hadley, Karp
5 Association Rules (Agrawal, Imielinksi and
Swami, 93 SIGMOD)
- An implication expression of the form X ? Y,
where X and Y are itemsets - Example Milk, Diaper ? Beer
- Rule Evaluation Metrics
- Support (s)
- Fraction of transactions that contain both X and
Y - Confidence (c)
- Measures how often items in Y appear in
transactions thatcontain X
From Introduction to Data Mining, Tan,Steinbach
and Kumar
6Mining Association Rules
- Two-step approach
- Frequent Itemset Generation
- Generate all itemsets whose support ? minsup
- Rule Generation
- Generate high confidence rules from each frequent
itemset, where each rule is a binary partitioning
of a frequent itemset - Frequent itemset generation is computationally
expensive
7Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
8Associative Classifiers
- Most of the Associative Classifiers are based on
rules discovered using the support-confidence
criterion. - The classifier itself is a collection of rules
ranked using their support or confidence.
9Associative Classifiers (2)
TID Items Gender
1 Bread, Milk F
2 Bread, Diaper, Beer, Eggs M
3 Milk Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer M
5 Bread, Milk, Diaper, Coke F
In a Classification task we want to predict the
class label (Gender) using the attributes
A good (albeit stereotypical) rule is
Beer,Diaper ? Male whose support is 60 and
confidence is 100
10Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
11Imbalanced Data Set
- In some application domains, Data Sets are
Imbalanced - The proportion of samples from one class is much
smaller than the other class/classes. - And the smaller class is the class of interest.
- Support and confidence are biased toward the
majority class, and do not perform well in such
cases.
12Downsides of Support
- Support is biased towards the majority class
- Eg classes yes, no, sup(yes)90
- minSup gt 10 wipes out any rule predicting no
- Suppose X ? no has confidence 1 and support 3.
Rule discarded if minSup gt 3 even though it
perfectly predicts 30 of the instances in the
minority class!
13Downside of Confidence(1)
20 5 25
70 5 75
90 10 100
Conf(A? C) 20/25 0.8 Support(A?C) 20/100
0.2 Correlation between A and C
Thus, when the data set is imbalanced a high
support and high confidence rule may not
necessarily imply that the antecedent and the
consequent are positively correlated.
14Downside of Confidence (2)
- Reasonable to expect that for good rules the
antecedent and consequent are not independent! - Suppose
- P(ClassYes) 0.9
- P(ClassYesX) 0.9
15Downsides of Confidence (3)
- Another useful observation
- Higher confidence (support) for a rule in the
minority class implies higher correlation, and
lower correlation in the minority class implies
lower confidence, but neither of these apply for
the majority class. - Confidence (support) tends to bias the majority
class. -
16Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
17Contingency Table
- A 2 2 Contingency Table for X ? y.
- We will use the notation a, b c, d to
represent this table.
18Fisher Exact Test
- Given a table, a, b c, d, Fisher Exact Test
will find the probability (p-value) of obtaining
the given table under the hypothesis that X, X
and y, y are independent. - The margin sums (?rows, ?cols) are fixed.
19Fisher Exact Test (2)
- We will only use rules whose p-values are below
the level of significant desired (e.g. 0.01). - Rules that pass this test are statistically
significant in the positively associated
direction (e.g. X ? y).
20Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
21Class Correlation Ratio
- In Class Correlation, we are interested in rules
X ? y where X is more positively correlated with
y than it is with y. - The correlation is defined by
where T is the number of transactions n.
22Class Correlation Ratio (2)
- We then use corr() to measure how correlated X is
with y compared to y. - X and y are positively correlated if corr(X?y)gt1,
and negatively correlated if corr(X?y)lt1.
23Class Correlation Ratio (3)
- Based on correlation corr(), we define the Class
Correlation Ratio (CCR)
- The CCR measures how much more positively the
antecedent is correlated with the class it
predicts (e.g. y), relative to the alternative
class (e.g. y).
24Class Correlation Ratio (4)
- We only use rules with CCR higher than a desired
threshold, so that no rules are used that are
more positively associated with the classes they
do not predict.
25The two measurements
- We perform the following tests to determine
whether a potentially interesting rule is indeed
interesting - Check the significant of a rule X ? y by
performing the Fishers Exact Test. - Check whether CCR(X?y) gt 1.
- Those rules that pass the above two tests are
candidates for the classification task.
26Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
27Search and Pruning Strategies
- To avoid examining the whole set of possible
rules, we use search strategies that ensure the
concept of being potential interesting is
anti-monotonic - X?y might be considered as potential interesting
if and only if all X?yX in X have been found
to be potentially interesting.
28Search and Pruning Strategies (2)
- The contingency table a, b c, d used to test
for the significance of the rule X ? y in
comparison to one of its generalizations X-z ?
y for the Aggressive search strategy.
29Example
- Suppose we have already determined that the rules
(A a1) ? 1 and (A a2) ? 1 are significant. - Now we want to test if X(A a1) (Aa2) ? 1 is
significant - Then we carry out a FET and calculate the CCR on
X and X Aa1 (i.e. z a2)and X and X-Aa2
(i.e. z a1). - If the minimum of their p-value is less than the
significance level, and their CCR is greater than
1, we keep the X? 1 rule, otherwise we discard it.
30Ranking Rules
- Strength Score (SS)
- In order to determine how interesting a rule is,
we need a ranking (ordering) of the rules, and
the ordering is defined by the Strength Score.
31Overview
- Data Mining Tasks
- Associative Classifiers
- Downside of Support and Confidence
- Mining Rules from Imbalanced Data Sets
- Fishers Exact Test
- Class Correlation Ratio (CCR)
- Searching and Pruning Strategies
- Experiments
32Experiments (Balanced Data)
- The preceding approach is represented by
SPARCCC. - The experiments on Balanced Data Sets show that
the average accuracy of SPARCCC compares
favourably to CBA and C4.5. - The table below is the prediction accuracy on
balanced data sets.
33Experiments (Imbalanced Data)
- True Positive Rate (Recall/Sensitivity) is a
better performance measure for imbalanced data
sets. - SPARCCC overcomes other rule based techs such as
CBA and CCCS. - The table below is True Positive Rate of the
Minority Class on Imbalanced version of the
Datasets.
34References
- Florian Verhein, Sanjay Chawla.Using
Significant, Positively Associated and Relatively
Class Correlated Rules For Associative
Classification of Imbalanced Datasets.The 2007
IEEE International Conference on Data Mining .
Omaha NE, USA. October 28-31, 2007.