Title: Frequent Itemset Mining on Graphics Processors
1Frequent Itemset Mining on Graphics Processors
- Wenbin Fang , Mian Lu, Xiangye Xiao, Bingsheng
He1, Qiong Luo - Hong Kong Univ. of Sci. and Tech.
- Microsoft Research Asia1
-
Presenter Wenbin Fang
2Outline
- Contribution
- Introduction
- Design
- Evaluation
- Conclusion
2/33
3Contribution
- Accelerate the Apriori algorithm for Frequent
Itemset Mining using Graphics Processors (GPUs). - Two GPU implementations
- Pure Bitmap-based implementation (PBI)
processing entirely on the GPU. - Trie-based implementation (TBI) GPU/CPU
co-processing.
3/33
4Frequent Itemset Mining (FIM)
Finding groups of items, or itemsets that
co-occur frequently in a transaction database.
Minimum support 2 1-itemsets (frequent items) A
3 B 2 C 3 D 4
4/33
5Frequent Itemset Mining (FIM)
Aims at finding groups of items, or itemsets that
co-occur frequently in a transaction database.
Minimum support 2 1-itemsets (frequent
items) A, B, C, D 2-itemsets AB 2 AC 2 AD
3 BD 2 CD 3
5/33
6Frequent Itemset Mining (FIM)
Aims at finding groups of items, or itemsets that
co-occur frequently in a transaction database.
Minimum support 2 1-itemsets (frequent
items) A, B, C, D 2-itemsets AB, AC, AD, BD,
CD 3-itemsets ABD, ACD
6/33
7Graphics Processors (GPUs)
- Exist in commodity machines, mainly for graphics
rendering. - Specialized for compute-intensive, highly data
parallel apps. - Compared with CPUs, GPUs provide 10x faster
computational horsepower, and 10x higher memory
bandwidth.
CPU
GPU
--From NVIDA CUDA Programming Guide
7/33
8Programming on GPUs
- OpenGL/DirectX
- AMD CTM
- NVIDIA CUDA
SIMD parallelism (Single Instruction, Multiple
Data)
The many-core architecture model of the GPU
8/33
9Hierarchical multi-threaded in NVIDIA CUDA
Thread Block
Thread Block
Warp
Warp
Warp
Warp
Warp
Warp
A warp 32 GPU threads gt SIMD schedule unit.
of threads in a thread block
of thread blocks
9/33
10General Purpose GPU Computing (GPGPU)
- Applications utilizing GPUs
- Scientific computing
- Molecular Dynamics Simulation
- Weather forecasting
- Linear algebra
- Computational finance
- Folding_at_home, Seti_at_home
- Database applications
- Basic DB Operators SIGMOD04
- Sorting SIGMOD06
- Join SIGMOD08
10/33
11Our work
- As a first step, we consider the GPU-based
Apriori, with intention to extend to another
efficient FIM algorithm -- FP-growth. - Why Apriori?
- a classic algorithm for mining frequent itemsets.
- also applied in other data mining tasks, e.g.,
clustering, and functional dependency.
11/33
12The Apriori Algorithm
Input 1) Transaction Database 2) Minimum
support Output All frequent itemsets L1 All
frequent 1-itemsets k 2 While (Lk-1 ! empty)
//Generate candidate k-itemsets.
Ck lt- Self join on Lk-1 Ck lt-
(K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
Frequent 1-itemsets
Candidate 2-itemsets
Frequent 2-itemsets
Candidate 3-itemsets
Frequent 3-itemsets
Candidate (K-1)-itemsets
Frequent (K-1)-itemsets
Candidate K-itemsets
Frequent K-itemsets
12/33
13Outline
- Contribution
- Introduction
- Design
- Evaluation
- Conclusion
13/33
14 GPU-based Apriori
Input 1) Transaction Database 2) Minimum
support Output All frequent itemsets
Pure Bitmap-based Impl. (PBI)
Trie-based Impl. (TBI)
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
Itemsets bitmap Candidate generation on the GPU
Itemsets Trie Candidate generation on the CPU
Transactions bitmap Support counting on the GPU
Transactions bitmap Support counting on the GPU
14/33
15Horizontal and Vertical data layout
Vertical data layout
Horizontal data layout
Support counting is done on specific itemsets.
Scan all transactions
- Intersect two transaction lists.
- Count the number of transactions
- in the intersection result.
15/33
16Bitmap representation for transactions
of transactions
Intersection bitwise AND operation
Counting of 1s in a string of bits
of itemsets
16/33
17Lookup table
Lookup table
Bitmap representation for transactions
of 1s TABLE12 // decimal 12 // binary
1100 // (a string of bits)
1 byte
- Constant memory
- Cacheable
- 64 KB
- Shared by all GPU threads
216 65536
17/33
18Support Counting on the GPU (Cont.)
Thread block 1
Thread block 2
LOOKUP TABLE
2
- Intersect two transaction lists.
- Count the number of transaction
- in the intersection result.
18/33
19Support Counting on the GPU (Cont.)
19/33
Thread Block
Access vector type int4 In one instruction
Example
Thread 1
Thread 2
AB
int
int
int
int
int
int
int
int
AND
AND
AND
AND
AND
AND
AND
AND
AD
int
int
int
int
int
int
int
int
ABD
int
int
int
int
int
int
int
int
LOOKUP TABLE
Counts 2
Counts of 1s for every 16-bit integer
Parallel Reduce
Support for this itemset
Support2
20GPU-based Apriori
Input 1) Transaction Database 2) Minimum
support Output All frequent itemsets L1 All
frequent 1-itemsets k 2 While (Lk-1 ! empty)
//Generate candidate k-itemsets.
Join Subset test //Generate
frequent k-itemsets Support Counting
k 1
- Candidate Generation
- Join
- e.g., Join two 2-itemsets to obtain a candidate
3-itemset - AC JOIN AD gt ACD
- Subset test
- e.g., Test all 2-subsets of ACD AC, AD, CD
Support Counting on the GPU
20/33
21 GPU-based Apriori
Input 1) Transaction Database 2) Minimum
support Output All frequent itemsets
Pure Bitmap-based Impl. (PBI)
Trie-based Impl. (TBI)
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
Itemsets bitmap Candidate generation on the GPU
Itemsets Trie Candidate generation on the CPU
Itemsets bitmap Candidate generation on the GPU
Transactions bitmap Support counting on the GPU
Transactions bitmap Support counting on the GPU
21/33
22Pure Bitmap-based Impl. (PBI)
of items
Bitwise OR In Join (e.g., AB JOIN AD ABD)
Binary search In Subset test (e.g., 2-subsets
AB, AD, BD)
of itemsets
One GPU thread generates one candidate itemset.
22/33
23 GPU-based Apriori
Input 1) Transaction Database 2) Minimum
support Output All frequent itemsets
Pure Bitmap-based Impl. (PBI)
Trie-based Impl. (TBI)
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
L1 All frequent 1-itemsets k 2 While (Lk-1
! empty) //Generate candidate
k-itemsets. Ck lt- Self join on Lk-1
Ck lt- (K-1)-Subset test on Ck //Generate
frequent k-itemsets Lk lt- Support
Counting on Ck k 1
Itemsets bitmap Candidate generation on the GPU
Itemsets Trie Candidate generation on the CPU
Itemsets Trie Candidate generation on the CPU
Transactions bitmap Support counting on the GPU
Transactions bitmap Support counting on the GPU
23/33
24Trie-based Impl. (TBI)
Root
Depth 0
A
B
C
D
A
B
C
Depth 1
1-itemsets A, B, C, D
D
B
C
D
D
D
B
C
D
D
D
Depth 2
2-itemsets AB, AC, AD, BD, CD
AB
JOIN
AC
ABC
C
D
D
AB, AC, BC
Candidate 3-itemsets ABD, ACD
AB
JOIN
AD
ABD
AB, AD, BD
AC
JOIN
AD
ACD
On CPU 1, Irregular memory access 2, Branch
divergence
AC, AD, CD
24/33
25Outline
- Contribution
- Introduction
- Design
- Evaluation
- Conclusion
25/33
26Experimental setup
26/33
Platform
Experimental datasets
Density Avg. Length / items
27Apriori Implementations
Best Apriori implementation in FIMI repository.
(Frequent Itemset Mining Implementations
Repository)
27/33
28TBI-CPU vs GOETHALS
Dense Dataset - Chess
Sparse Dataset- Retail
The impact of using bitmap representation for
transactions in support counting.
1.2x 25.7x
28/33
29TBI-GPU vs TBI-CPU
Sparse Dataset- Retail
Dense Dataset - Chess
The impact of GPU acceleration in support
counting.
1.1x 7.8x
29/33
30PBI-GPU vs TBI-GPU
Sparse Dataset- Retail
Dense Dataset - Chess
The impact of bitmap-based itemset and trie-based
itemset in candidate generation.
PBI-GPU is faster in dense dataset. TBI-GPU is
better in sparse dataset.
30/33
31PBI-GPU/TBI-CPU vs BORGELT
Sparse Dataset- Retail
Dense Dataset - Chess
Comparison to the best Apriori implementation in
FIMI.
1.2x 24.2x
31/33
32Comparison to FP-growth
With minsup 1, 60, and 0.01
PARSEC benchmark
32/33
33Conclusion
- GPU-based Apriori
- Pure Bitmap-based impl.
- Bitmap Representation for itemsets.
- Bitmap Representation for transactions.
- GPU processing.
- Trie-based impl.
- Trie Representation for itemsets.
- Bitmap Representation for transactions.
- GPU CPU co-processing.
- Better than CPU-based Apriori.
- Still worse than CPU-based FP-growth
33/33
34Backup Slide
Time Breakdown
Time breakdown on dense dataset Chess
Time breakdown on dense dataset Retail