Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP

Description:

... cond_table = conditional_table_construct(); // construct a conditional ... Combine these tree (sum up) when mining finished. 16. Conclusion ... – PowerPoint PPT presentation

Number of Views:231

Avg rating:3.0/5.0

Slides: 18

Provided by: server13

Category:

more less

Transcript and Presenter's Notes

Title: Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP

1
Parallelizing FP-growth Frequent Patterns Mining
Algorithm Using OpenMP

Gengbin Zheng

2
Parallel Computing and Data Mining

Performance issues in data mining - scalability
Parallel Computing, What is it?
With parallel computing, a parallel program can
either
decrease the runtime to solve a problem
Increase the size of the problem that can be
solved.
Parallel Computing gives you more performance to
throw at your problems
Data mining problem can benefit from parallel
computing.
Intensive computation
Mining under time constraint

3
FP-Growth algorithm brief review

Mining frequent patterns without candidate
generation
Frequent pattern tree (FP-Tree)
FP-Tree based pattern growth mining method

main() (1) find frequent 1-items (2)
build a global FP-tree from transaction
database (3) fptree_mining() //
recursively mine FP-tree
4
FP-Growth Sequential algorithm

Procedure fptree_mining()
Input FP-tree item_table frequent 1-item
set cond
(1) for each item ain the frequent item_table do
(2) if tree at this level contains only a
single path
(3) then branch_mine() // generate
all patterns along a path
(4) else
(5) cond_table
conditional_table_construct() // construct a
conditional transaction table for the condition
and get frequent 1-item set in cond_item_table
(6) cond_fptree
fptree_conditional_build(cond_table) // build an
conditional FP-tree using conditional table
(7) cond cond a //
pattern growth by appending a
(8) fptree_mining(cond_fptree,
cond_item_table, cond, condpt1) // recursively
mine conditional fptree
(9)
(10)

5
OpenMP programming model

Multi-threaded programming model
Fork-Join Parallelism
Master thread spawns a team of threads
Parallelism is added incrementally i.e. the
sequential program evolves into a parallel
program.

6
OpenMP How to parallelize?

OpenMP is usually used to parallelize loops
Find most time consuming loops.
Split them up between threads.
Simple example

Split-up this loop between multiple threads
void main() double Res1000 for(int
i0ilt1000i) compute(Resi)
Sequential Program
void main() double Res1000 pragma omp
parallel for for(int i0ilt1000i)
compute(Resi) Parallel Program
7
OpenMP Synchronization

OpenMP is a shared memory model.
Threads communicate by sharing variables.
Unintended sharing of data can lead to race
conditions
The programs outcome changes as the threads are
scheduled differently (nondeterministic)
To control race conditions
Use synchronization to protect data conflicts.
Synchronization is expensive so
Change how data is stored to minimize the need
for synchronization.

8
Parallelization of FP-Growth

Task Parallelize the for loop gt parallel for
Parallelizable conceptually each iteration is a
subtask can execute in any order
But not ready yet due to race condition of global
variables
Two strategies to handle race condition
Privatize
Condition pattern variable, passed as argument in
recursive function, but only used in one
iteration
Critical section/Atomic
Global frequent counter array, updated
concurrently by all threads, use synchronization
to prevent conflicts

9
Load balance

Load imbalance problem
Work load in each iteration can vary dramatically
Dynamic Thread scheduling in OpenMP
Thread executes chunk of iterations and waits for
another assignment from work pool
Compiler support!
Better load balancing if sort iterations by work
load (hint)

Work Pool
Thread1
Thread2
Thread3
Thread0
10
GuideView screenshots
11
Optimization reduces sync overhead

Remove synchronization
Reduction on array
Reduction with SUM operation on an array
However, OpenMP reduction support is only for
scalable variable
Manually implement array based reduction
Manually allocate counter array for each thread
Each thread work on its own copy of array
Sum up when loop finishes

12
GuideView screenshots after optimization
13
(No Transcript)
14
Speedup
15
Future work

Mining result frequent patterns can be stored
into some compressed tree structure
Each thread maintains its own private compressed
tree to store result patterns
Combine these tree (sum up) when mining finished.

16
Conclusion

New parallelized FP-Growth program
Removed lock/synchronization from the parallel
region
good parallel speedup and parallel efficiency
Good data mining algorithm designed with
divide-and-conquer strategy is easier to
parallelize
OpenMP is efficient for parallelizing data mining
algorithm on SMP machines can improve in the
syntax of array privatization and reduction

17
Acknowledgements

Using Intel Threading Tools (Guide and GuideView)
from Intel (license from Intel)
Help from Intel Parallel Applications Center in
performance tuning tool

Write a Comment

User Comments (0)