Title: Issues in Parallelizing Pattern Mining Workloads
1Issues in Parallelizing Pattern Mining Workloads
- Shirish Tatikonda
- CSE 788Z11
05th December, 2006
2Outline
- Pattern mining algorithms
- Parallelization
- Issues
- Possible solutions
3Pattern Mining Algorithms
- Patterns
- Transactions
- Sequences
- Trees
- Graphs
- Goal
- Given a large database of transactions,
- a threshold s
- Need the of set of all frequent patterns
- Frequent of occurrences s
Walmart Data T1 Milk, Bread, Beer, Diaper T2
Milk, Beer T3 Milk, Bread, Ice cream T4 Bread,
Beer, Diaper s 3 Frequent Milk, Bread,
Beer, Bread,Beer
4Generic Approach
- Candidate Generation
- Level-wise approach
- Pattern growth approach
- Support Counting
- Evaluate each candidate
5Pattern growth approach
- Depth-first traversal
- Equivalence classes
- Seed pattern
- Growing method
- Apriori Principle
- For pruning
Search Space Traversal
Infrequent
Seed
Frequent
Pruned
6Parallelization
- Coarse-grain Equivalence Class Level
- Load balancing
- Fine-grain Each Pattern Level
- High overhead
P1
P2
P3
P4
P5
P6
Parallelizing equivalence classes
7Issues
- Considering only the shared-memory systems
- Load imbalance
- In case of coarse-grain parallelization
- Speculative Execution
- Large working sets
- In case of CMPs
- Algorithmic rather than architectural solutions
8Related Work
- Thread-Level Speculation (TLS)
- Mitosis compiler PLDI 2005
- Speculative thread level parallelism (SpMT)
- Speculative threads
- Executes a part of the original application
- When finished, speculation is verified
- Determines the spawning points
- Loops, basic blocks, subroutines etc.
- Inter-thread dependencies
- Synchronization
- Value prediction using pre-computation(p-) slices
- Compute p-slices using speculative optimizations
9Related Work
- POSH compiler PPOPP 2006
- Fully automated TLS compiler
- Three phases
- Task selection
- based on programs substructures
- value prediction
- Spawn hoisting
- Task refinement
- to improve the quality of chosen tasks
- through profiling
Parallelism
Pre-fetching
10Speculative Execution (1)
P2
P1
P3
11Speculative Execution (2)
- Performance improvement
- High
- Marginal
- None
- Negative
- Depends on dataset characteristics
- Characterization
- Application in data mining
- Speculatively execute future iterations
- Tracking followed by classification
12Large Working Sets
- Support Counting
- Evaluate a candidate pattern
- Scan a subset of database transactions
- Working set can potentially be large
- CMP (vs SMP)
- Less cache memory available to each processor
- Less main memory shared by all processors
Working set has to be reduced to
limit the bandwidth utilization
13(No Transcript)