On Benchmarking Frequent Itemset Mining Algorithms - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

On Benchmarking Frequent Itemset Mining Algorithms

Description:

Budapest University of Technology and Economics ... Computer-Based New Media Group, Institute for Computer Science. History ... good theoretical data model yet ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 21

Provided by: rczb

Category:

more less

Transcript and Presenter's Notes

Title: On Benchmarking Frequent Itemset Mining Algorithms

1
On Benchmarking Frequent Itemset Mining Algorithms

Balázs Rácz, Ferenc Bodon, Lars Schmidt-Thieme

Computer-Based New Media Group, Institute for
Computer Science
Computer and Automation Research Institute of the
Hungarian Academy of Sciences
Budapest University of Technology and Economics
2
History

Over 100 papers on Frequent Itemset Mining
Many of them claim to be the best
Based on benchmarks run against some publicly
available implementation on some datasets
FIMI03, 04 workshop extensive benchmarks with
many implementations and data sets
Serves as a guideline ever since
How fair was the benchmark and what did it
measure?

3
On FIMI contests

Problem 1 We are interested in the quality of
algorithms, but we can only measure
implementations.
No good theoretical data model yet for analytical
comparison
Well see later would need good hardware model
Problem 2 If we gave our algorithms and ideas to
a very talented and experienced low-level
programmer, that could completely re-draw the
current FIMI rankings.
A FIMI contest is all about the constant factor

4
On FIMI contests (2)

Problem 3 Seemingly unimportant implementation
details can hide all algorithmic features when
benchmarking.
These details are often unnoticed even by the
author and almost never published.

5
On FIMI contests (3)

Problem 4 FIM implementations are complete
suites of a basic algorithm and several
algorithmic/implementational optimizations.
Comparing such complete suites tells us what is
fast, but does not tell us why.
Recommendation
Modular programming
Benchmarks on the individual features

6
On FIMI contests (4)

Problem 5 All dense mining tasks run time is
dominated by I/O.
Problem 6 On dense datasets FIMI benchmarks
are measuring the ability of submitters to code
a fast integer-to-string conversion function.
Recommendation
Have as much identical code as possible
? library of FIM functions

7
On FIMI contests (5)

Problem 7 Run time differences are small
Problem 8 Run time varies from run to run
The very same executable on the very same input
Bug or feature of modern hardware?
What to measure?
Recommendation winner takes all evaluation of
a mining task is unfair

8
On FIMI contests (6)

Problem 9 Traditional run-time (memory need)
benchmarks do not tell us whether an
implementation is better than an other in
algorithmic aspects, or implementational
(hardware-friendliness) aspects.
Problem 10 Traditional benchmarks do not show
whether on a slightly different hardware
architecture (like AMD vs. Intel) the conclusions
would still hold or not.
Recommendation extend benchmarks

9
Library and pluggability

Code reusal, pluggable components, data
structures
Object oriented design
Do not sacrifice efficiency
No virtual method calls allowed in the core
Then how?
C templates
Allow pluggability with inlining
Plugging requires source code change, but several
versions can coexist
Sometimes tricky to code with templates

10
I/O efficiency

Variations of output routine
normal-simple renders each itemset and each item
separately to text
normal-cache caches the string representation of
item identifiers
df-buffered (depth-first) reuses the string
representation of the last line, appends the last
item
df-cache like df-buffered, but also caches the
string representation of item identifiers

11
(No Transcript)
12
Benchmarking desiderata

The benchmark should be stable, and
reproducible. Ideally it should have no
variation, surely not on the same hardware.
The benchmark numbers should reflect the actual
performance. The benchmark should be a fairly
accurate model of actual hardware.
The benchmark should be hardware-independent, in
the sense that it should be stable against the
slight variation of the underlying hardware
architecture, like changing the processor
manufacturer or model.

13
Benchmarking reality

Different implementations stress different
aspects of the hardware
Migrating to other hardware
May be better in one aspect, worse in another one
Ranking cannot be migrated between HW
Complex benchmark results are necessary
Win due to algorithmic or HW-friendliness reason?
Performance is not as simple as run time in
seconds

14
Benchmark platform

Virtual machine
How to define?
How to code the implementations?
Cost function?
Instrumentation (simulation of actual CPU)
Slow (100-fold slower than plain run time)
Accuracy?
Cost function?

15
Benchmark platform (2)

Run-time measurement
Performance counters
Present in all modern processor (since i586)
Count performance-related events real-time
PerfCtr kernel patch under Linux, vendor-specific
software under Windows
Problem measured numbers reflect the actual
execution, thus are subject to variation