Title: Mining Frequent Patterns Using FP-Growth Method
1Mining Frequent Patterns Using FP-Growth Method
- Ivan Tanasic (itanasic_at_gmail.com)
- Department of Computer Engineering and Computer
Science, - School of Electrical Engineering,
- University of Belgrade
2- Mining Frequent Patternswithout Candidate
GenerationA Frequent-Pattern Tree Approach - Jiawei Han (UIUC)
- Jian Pei (Buffalo)
- Yiwen Yin (SFU)
- Runying Mao (Microsoft)
3Problem Definition
- Mining frequent patterns from a DB
- Frequent intemsets
- (milk bread)
- Frequent sequential patterns
- (computer -gt printer -gt paper)
- Frequent structural patterns
- (subgraphs, subtrees)
4Problem Importance 1/2
- Basic DM primitive
- Used for mining data relationships
- Associations
- Correlations
- Helps with basic DM tasks
- Classification
- Clustering
5Problem importance 2/2
- Association rules
- buys(laptop)gtbuys(mouse)support 2,
confidence 30
- Support of all transactions containing that
items - Confidence of transactions containing I1 that
contain I2
6Problem Trend
- Apriori speedup using techniques
- New data structures (trees)
- Association rule specific algorithms
- Specific AR algorithms (OneR, ZeroR)
- FP-Growth still widely used
7Existing Solutions 1/3 (Apriori)
- Agrawal et al. (1994)
- AP All nonempty subsets of a frequent itemset
must also be frequent - Starts from 1-itemsets
- Join prune (using AP min supp)
- Generates huge number of candidates
8Existing Solutions 2/3 (ECLAT)
- Zaki (2000)
- Equivalence CLass Transformation
- Vertical format item,TID_set instead of
TID,itemset - Intersects TID_sets of candidates
- TID_sets holds support info (no scans)
- Still generates candidates
9Existing Solutions 3/3 (TreeProjection)
- Agarwal et al. (2001)
- Creates a lexicographical treeand projects db
into sub-dbsbased on the patterns mined so far - Recursively mines subdatabases
- Less scalable then FP-Growth
10FP-Tree construction 1/6
Desc. supp. sort
11FP-Tree construction 2/6
Desc. supp. sort
T1I2,I1,I5
12FP-Tree construction 3/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
13FP-Tree construction 4/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3
14FP-Tree construction 5/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3
T4 I2, I1, I4
15FP-Tree construction 6/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3
T4 I2, I1, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I2, I1, I3, I5
T9 I2, I1, I3
16Mining of the FP-Tree 1/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
17Mining of the FP-Tree 2/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42
18Mining of the FP-Tree 3/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42
I3 I2,I12,I22,I12 I24,I12,I12 I2,I34,I1,I34,I2,I1,I32
19Mining of the FP-Tree 4/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42
I3 I2,I12,I22,I12 I24,I12,I12 I2,I34,I1,I34,I2,I1,I32
I1 I24 I24 I2,I14
20How much batter is it 1/3?
21How much batter is it 2/3?
22How much batter is it 3/3?
23Is it Original?
- A lot of methods try to improve Apriori
- Hashing
- Transaction reduction
- Partitioning
- Sampling
- TreeProjection uses similar structure,but it is
still a different method
24Importance over time
- Basic primitive(strong foundation for tall
building) - Performance gets very importantas databases are
getting huge - Scalability also
- FP-Growth has bothperformance and scalability
25Conclusion
- An important methodfor solving important DM
tasks - Fast
- Compact
- Scalable (db projection/tree on disk)
26Mining Frequent Patterns Using FPGrowth Method
- Ivan Tanasic (itanasic_at_gmail.com)
- Department of Computer Engineering and Computer
Science, - School of Electrical Engineering,
- University of Belgrade