Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP

Description:

... cond_table = conditional_table_construct(); // construct a conditional ... Combine these tree (sum up) when mining finished. 16. Conclusion ... – PowerPoint PPT presentation

Number of Views:227
Avg rating:3.0/5.0
Slides: 18
Provided by: server13
Category:

less

Transcript and Presenter's Notes

Title: Parallelizing FPgrowth Frequent Patterns Mining Algorithm Using OpenMP


1
Parallelizing FP-growth Frequent Patterns Mining
Algorithm Using OpenMP
  • Gengbin Zheng

2
Parallel Computing and Data Mining
  • Performance issues in data mining - scalability
  • Parallel Computing, What is it?
  • With parallel computing, a parallel program can
    either
  • decrease the runtime to solve a problem
  • Increase the size of the problem that can be
    solved.
  • Parallel Computing gives you more performance to
    throw at your problems
  • Data mining problem can benefit from parallel
    computing.
  • Intensive computation
  • Mining under time constraint

3
FP-Growth algorithm brief review
  • Mining frequent patterns without candidate
    generation
  • Frequent pattern tree (FP-Tree)
  • FP-Tree based pattern growth mining method

main() (1) find frequent 1-items (2)
build a global FP-tree from transaction
database (3) fptree_mining() //
recursively mine FP-tree
4
FP-Growth Sequential algorithm
  • Procedure fptree_mining()
  • Input FP-tree item_table frequent 1-item
    set cond
  • (1) for each item ain the frequent item_table do
  • (2) if tree at this level contains only a
    single path
  • (3) then branch_mine() // generate
    all patterns along a path
  • (4) else
  • (5) cond_table
    conditional_table_construct() // construct a
    conditional transaction table for the condition
    and get frequent 1-item set in cond_item_table
  • (6) cond_fptree
    fptree_conditional_build(cond_table) // build an
    conditional FP-tree using conditional table
  • (7) cond cond a //
    pattern growth by appending a
  • (8) fptree_mining(cond_fptree,
    cond_item_table, cond, condpt1) // recursively
    mine conditional fptree
  • (9)
  • (10)

5
OpenMP programming model
  • Multi-threaded programming model
  • Fork-Join Parallelism
  • Master thread spawns a team of threads
  • Parallelism is added incrementally i.e. the
    sequential program evolves into a parallel
    program.

6
OpenMP How to parallelize?
  • OpenMP is usually used to parallelize loops
  • Find most time consuming loops.
  • Split them up between threads.
  • Simple example

Split-up this loop between multiple threads
void main() double Res1000 for(int
i0ilt1000i) compute(Resi)
Sequential Program
void main() double Res1000 pragma omp
parallel for for(int i0ilt1000i)
compute(Resi) Parallel Program
7
OpenMP Synchronization
  • OpenMP is a shared memory model.
  • Threads communicate by sharing variables.
  • Unintended sharing of data can lead to race
    conditions
  • The programs outcome changes as the threads are
    scheduled differently (nondeterministic)
  • To control race conditions
  • Use synchronization to protect data conflicts.
  • Synchronization is expensive so
  • Change how data is stored to minimize the need
    for synchronization.

8
Parallelization of FP-Growth
  • Task Parallelize the for loop gt parallel for
  • Parallelizable conceptually each iteration is a
    subtask can execute in any order
  • But not ready yet due to race condition of global
    variables
  • Two strategies to handle race condition
  • Privatize
  • Condition pattern variable, passed as argument in
    recursive function, but only used in one
    iteration
  • Critical section/Atomic
  • Global frequent counter array, updated
    concurrently by all threads, use synchronization
    to prevent conflicts

9
Load balance
  • Load imbalance problem
  • Work load in each iteration can vary dramatically
  • Dynamic Thread scheduling in OpenMP
  • Thread executes chunk of iterations and waits for
    another assignment from work pool
  • Compiler support!
  • Better load balancing if sort iterations by work
    load (hint)

Work Pool
Thread1
Thread2
Thread3
Thread0
10
GuideView screenshots
11
Optimization reduces sync overhead
  • Remove synchronization
  • Reduction on array
  • Reduction with SUM operation on an array
  • However, OpenMP reduction support is only for
    scalable variable
  • Manually implement array based reduction
  • Manually allocate counter array for each thread
  • Each thread work on its own copy of array
  • Sum up when loop finishes

12
GuideView screenshots after optimization
13
(No Transcript)
14
Speedup
15
Future work
  • Mining result frequent patterns can be stored
    into some compressed tree structure
  • Each thread maintains its own private compressed
    tree to store result patterns
  • Combine these tree (sum up) when mining finished.

16
Conclusion
  • New parallelized FP-Growth program
  • Removed lock/synchronization from the parallel
    region
  • good parallel speedup and parallel efficiency
  • Good data mining algorithm designed with
    divide-and-conquer strategy is easier to
    parallelize
  • OpenMP is efficient for parallelizing data mining
    algorithm on SMP machines can improve in the
    syntax of array privatization and reduction

17
Acknowledgements
  • Using Intel Threading Tools (Guide and GuideView)
    from Intel (license from Intel)
  • Help from Intel Parallel Applications Center in
    performance tuning tool
Write a Comment
User Comments (0)
About PowerShow.com