Title: The AIM-F Algorithm review
1The AIM-F Algorithm review
Presented by Sagi Shporer
2Frequent Itemset Problem
- Let Ii1,i2,,im be a set of items
- Let T?I be a transaction
- Let D be a dataset of n transactions.
- Task Find all X ? I s.t. support(X)minsupport
(e.g. there are at least minsupport transactions
for which X ?T).
3Example Frequent Itemsets
What itemsets are frequent itemsets (FI)?
a, b, c, d, e,
ab, ac, ad, bc, bd, be, cd, ce, de,
abc, abd, acd, bcd, cde,
abcd
4Previous research work
- Candidate set generate-and-test approach
- Apriori, VLDB 94, R. Agrawal.
- Sampling technique
- H. Toivonen
- Adaptive Support
- SLPMiner, ICDM 2002, M. Seno G. Karypis
- Data transform
- FP-tree, SIGMOD 2000, J. Han.
5General
- Goal Mining Frequent Itemsets
- Main features
- DFS generate-and-test
- Compressed vertical database
- Diffsets
- PEP
- Dynamic reordering
- Vector projection
- Optimized Initialization
6Enumeration tree
7Pruning - PEP
8An Example (Illustration only)
abcd
9Diffsets
- Let t(P) be the set of transactions (TIDs)
supporting P. - Define diffset d(PX)t(P)\t(X)
- Then support(PX)support(P)-d(PX)
10Diffsets
- How to Calculate support(PXY) using d(PX) and
d(PY) ? - support(PXY)support(PX)-d(PXY)
- d(PXY)d(PY) - d(PX)
11Example
t(X)
t(P)
t(Y)
d(PY)
d(PX)
d(PXY)
t(PXY)
12Contributions
- Dynamical use of various itemset mining
optimizations (Specifically diffsets). - Use of compressed vertical bit vector with
diffsets.
13Dynamic Optimization Usage
- Every optimization has strengths and weaknesses.
- Optimizations should be used only when they give
some benefit.
14Dynamic Optimization Usage Cont.
- Diffsets Start using diffsets only when d(PX) lt
t(PX) - Optimized Initialization Use only for sparse
datasets (when the number of 1s reach a
threshold)
15Compressed Bit Vector
- Sparse Vertical Bit Vector Hold only the needed
cells in the vertical bit vector
16Compressed Bit Vector Cont.
- Use of diffsets directly from the compressed form
- Faster than tid-list for dense datasets.
- Competitive with tid-list for sparse datasets
17Optimization Contributions
18Questions Comments