Title: Direct Mining of Discriminative and Essential Frequent Patterns via Modelbased Search Tree
1Direct Mining of Discriminative and Essential
Frequent Patterns via Model-based Search Tree
How to find good features from semi-structured
raw data for classification
- Wei Fan, Kun Zhang, Hong Cheng,
- Jing Gao, Xifeng Yan, Jiawei Han,
- Philip S. Yu, Olivier Verscheure
2Feature Construction
- Most data mining and machine learning model
assume the following structured data - (x1, x2, ..., xk) -gt y
- where xis are independent variable
- y is dependent variable.
- y drawn from discrete set classification
- y drawn from continuous variable regression
- When feature vectors are good, differences in
accuracy among learners are not much. - Questions where do good features come from?
3Frequent Pattern-Based Feature Extraction
- Data not in the pre-defined feature vectors
- Transactions
- Biological sequence
- Graph database
Frequent pattern is a good candidate for
discriminative features So, how to mine them?
4FP Sub-graph
(example borrowed from George Karypis
presentation)
5Frequent Pattern Feature Vector Representation
P1 P2 P3 Data1 1 1 0 Data2
1 0 1 Data3 1 1 0 Data4 0 0 1
Mining these predictive features is an
NP-hard problem. 100 examples can get up to 1010
patterns Most are useless
6Example
- 192 examples
- 12 support (at least 12 examples contain the
pattern), 8600 patterns returned by itemsets - 192 vs 8600 ?
- 4 support, 92,000 patterns
- 192 vs 92,000 ??
- Most patterns have no predictive power and cannot
be used to construct features. - Our algorithm
- Find only 20 highly predictive patterns
- can construct a decision tree with about 90
accuracy
7Data in bad feature space
- Discriminative patterns
- A non-linear combination of single feature(s)
- Increase the expressive and discriminative power
of the feature space - An example
Data is non-linearly separable in (x, y)
8New Feature Space
0
1
1
ItemSet F x0,y0 Association rule F x0 ? y0
1
1
F
Mine Transform
1
1
0
x
1
1
y
Data is linearly separable in (x, y, F)
9Computational Issues
- Measured by its frequency or support.
- E.g. frequent subgraphs with sup 10 or 10
examples contain these patterns - Ordered enumeration cannot enumerate sup
10 without first enumerating all patterns gt
10. - NP hard problem, easily up to 1010 patterns for a
realistic problem. - Most Patterns are Non-discriminative.
- Low support patterns can have high
discriminative power. Bad! - Random sampling not work since it is not
exhaustive. - Most patterns are useless. Random sample patterns
(or blindly enumerate without considering
frequency) is useless. - Small number of examples.
- If subset of vocabulary, incomplete search.
- If complete vocabulary, wont help much but
introduce sample selection bias problem,
particularly to miss low support but high info
gain patterns
10Conventional Procedure
Two-Step Batch Method
- Mine frequent patterns (gtsup)
- Select most discriminative patterns
- Represent data in the feature space using such
patterns
- Build classification models.
Feature Construction and Selection
11Two Problems
- Mine step
- combinatorial explosion
2. patterns not considered if minsupport isnt
small enough
1. exponential explosion
12Two Problems
- Select step
- Issue of discriminative power
4. Correlation not directly evaluated on their
joint predictability
3. InfoGain against the complete dataset, NOT on
subset of examples
13Direct Mining Selection via Model-based Search
Tree
Feature Miner
Classifier
Compact set of highly discriminative
patterns 1 2 3 4 5 6 7 . . .
Global Support 1020/100000.02
Divide-and-Conquer Based Frequent Pattern Mining
Mined Discriminative Patterns
14Analyses (I)
- Scalability (Theorem 1)
- Upper bound
- Scale down ratio to obtain extremely low
support pat -
- Bound on number of returned features (Theorem 2)
15Analyses (II)
- Subspace is important for discriminative pattern
- Original set no-information gain if
- C1 and C0 number of examples belonging to class
1 and 0 - P1 number of examples in C1 that contains a
pattern a - P0 number of examples in C0 that contains the
same pattern a - Subsets could have info gain
- Non-overfitting
- Optimality under exhaustive search
16Experimental Studies
Itemset Mining (I)
17Experimental Studies
Itemset Mining (II)
- Accuracy of Mined Itemsets
4 Wins 1 loss
much smaller number of patterns
18Experimental Studies
Itemset Mining (III)
19Experimental Studies
Graph Mining (I)
- 9 NCI anti-cancer screen datasets
- The PubChem Project. URL pubchem.ncbi.nlm.nih.gov
. - Active (Positive) class around 1 - 8.3
- 2 AIDS anti-viral screen datasets
- URL http//dtp.nci.nih.gov.
- H1 CMCA 3.5
- H2 CA 1
20Experimental Studies
Graph Mining (II)
21Experimental Studies
Graph Mining (III)
AUC
11 Wins
10 Wins 1 Loss
22Experimental Studies
Graph Mining (IV)
- AUC of MbT, DT MbT VS Benchmarks
7 Wins, 4 losses
23Summary
- Model-based Search Tree
- Integrated feature mining and construction.
- Dynamic support
- Can mine extremely small support patterns
- Both a feature construction and a classifier
- Not limited to one type of frequent pattern
plug-play - Experiment Results
- Itemset Mining
- Graph Mining
- Software and Dataset available from
- www.cs.columbia.edu/wfan
24(No Transcript)