Mining Frequent Patterns Using FP-Growth Method - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Frequent Patterns Using FP-Growth Method

Description:

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasi (itanasic_at_gmail.com) Department of Computer Engineering and Computer Science, School of Electrical ... – PowerPoint PPT presentation

Number of Views:390
Avg rating:3.0/5.0
Slides: 27
Provided by: Ivan171
Category:

less

Transcript and Presenter's Notes

Title: Mining Frequent Patterns Using FP-Growth Method


1
Mining Frequent Patterns Using FP-Growth Method
  • Ivan Tanasic (itanasic_at_gmail.com)
  • Department of Computer Engineering and Computer
    Science,
  • School of Electrical Engineering,
  • University of Belgrade

2
  • Mining Frequent Patternswithout Candidate
    GenerationA Frequent-Pattern Tree Approach
  • Jiawei Han (UIUC)
  • Jian Pei (Buffalo)
  • Yiwen Yin (SFU)
  • Runying Mao (Microsoft)

3
Problem Definition
  • Mining frequent patterns from a DB
  • Frequent intemsets
  • (milk bread)
  • Frequent sequential patterns
  • (computer -gt printer -gt paper)
  • Frequent structural patterns
  • (subgraphs, subtrees)

4
Problem Importance 1/2
  • Basic DM primitive
  • Used for mining data relationships
  • Associations
  • Correlations
  • Helps with basic DM tasks
  • Classification
  • Clustering

5
Problem importance 2/2
  • Association rules
  • buys(laptop)gtbuys(mouse)support 2,
    confidence 30
  • Support of all transactions containing that
    items
  • Confidence of transactions containing I1 that
    contain I2

6
Problem Trend
  • Apriori speedup using techniques
  • New data structures (trees)
  • Association rule specific algorithms
  • Specific AR algorithms (OneR, ZeroR)
  • FP-Growth still widely used

7
Existing Solutions 1/3 (Apriori)
  • Agrawal et al. (1994)
  • AP All nonempty subsets of a frequent itemset
    must also be frequent
  • Starts from 1-itemsets
  • Join prune (using AP min supp)
  • Generates huge number of candidates

8
Existing Solutions 2/3 (ECLAT)
  • Zaki (2000)
  • Equivalence CLass Transformation
  • Vertical format item,TID_set instead of
    TID,itemset
  • Intersects TID_sets of candidates
  • TID_sets holds support info (no scans)
  • Still generates candidates

9
Existing Solutions 3/3 (TreeProjection)
  • Agarwal et al. (2001)
  • Creates a lexicographical treeand projects db
    into sub-dbsbased on the patterns mined so far
  • Recursively mines subdatabases
  • Less scalable then FP-Growth

10
FP-Tree construction 1/6
Desc. supp. sort








  • Min support 2

11
FP-Tree construction 2/6
Desc. supp. sort
T1I2,I1,I5








12
FP-Tree construction 3/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4







13
FP-Tree construction 4/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3






14
FP-Tree construction 5/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3
T4 I2, I1, I4





15
FP-Tree construction 6/6
Desc. supp. sort
T1 I2, I1, I5
T2 I2, I4
T3 I2, I3
T4 I2, I1, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I2, I1, I3, I5
T9 I2, I1, I3
16
Mining of the FP-Tree 1/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52



17
Mining of the FP-Tree 2/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42


18
Mining of the FP-Tree 3/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42
I3 I2,I12,I22,I12 I24,I12,I12 I2,I34,I1,I34,I2,I1,I32

19
Mining of the FP-Tree 4/4
It. Conditional P. base Cond. FP-Tree Freq. Patterns Generated
I5 I2,I11,I2,I1,I31 I22, I12 I2,I52,I1,I52,I2,I1,I52
I4 I2,I11,I21 I22 I2,I42
I3 I2,I12,I22,I12 I24,I12,I12 I2,I34,I1,I34,I2,I1,I32
I1 I24 I24 I2,I14
20
How much batter is it 1/3?
  • Runtime on sparse data

21
How much batter is it 2/3?
  • Runtime on mixed data

22
How much batter is it 3/3?
  • Compactness

23
Is it Original?
  • A lot of methods try to improve Apriori
  • Hashing
  • Transaction reduction
  • Partitioning
  • Sampling
  • TreeProjection uses similar structure,but it is
    still a different method

24
Importance over time
  • Basic primitive(strong foundation for tall
    building)
  • Performance gets very importantas databases are
    getting huge
  • Scalability also
  • FP-Growth has bothperformance and scalability

25
Conclusion
  • An important methodfor solving important DM
    tasks
  • Fast
  • Compact
  • Scalable (db projection/tree on disk)

26
Mining Frequent Patterns Using FPGrowth Method
  • Ivan Tanasic (itanasic_at_gmail.com)
  • Department of Computer Engineering and Computer
    Science,
  • School of Electrical Engineering,
  • University of Belgrade
Write a Comment
User Comments (0)
About PowerShow.com