Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

About This Presentation

Title:

Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

Description:

Title: Perancangan dan Pembuatan Perangkat Lunak Data Mining untuk Pencarian Kaidah Asosiasi dengan Metode Bottom-Up Author: Arif Djunaidy Last modified by – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 23

Provided by: Arif73

Learn more at: https://www.iiwas.org

Category:

more less

Transcript and Presenter's Notes

Title: Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

1
MINING ASSOCIATION RULES FROM LARGE DATABASES
USING THE LATTICE-BASED APPROACH AND HYBRID
SEARCH METHOD

Arif DjunaidyRully SoelaimanDaning Tyaspamadya

Faculty of Information Technology ITS - Surabaya
2
Background - 1

In data mining, association rules represent
relationships that may exist among items in their
transactional databases
Since, the association rules that can be
exploited may represent the customers behavior,
identification of the frequent itemsets and the
formation of the conditional implication rules
among items are paramount important to perform
Efficient algorithms capable of optimizing those
overheads in mining meaningful association rules
are therefore required
However, for large databases, the extraction of a
set of meaningful association rules may require
substantial memory and database scanning that may
in turn increase the overall computing time of
the mining process

3
Background - 2

The task of discovering all frequent associations
in very large databases is quite challenging
The search space is exponential in the number of
database attributes
With millions of database objects, the problem of
I/O minimization becomes paramount
Most current approaches are iterative in nature,
requiring multiple database scans
Most approaches use very complicated data
internal data structures, which have poor
locality and add additional space and computation
overheads

4
Key Features of Our Approach

All frequent itemsets are enumerated via simple
tid-list intersections
A lattice-theoretic approach is used to decompose
the original search space (lattice) into smaller
pieces (sub-lattices) that can be processed
independently and easier
The hybrid search strategy for enumerating the
frequent itemsets within each sub-lattice
Our approach is designed to involve only a few
database scans to minimize the I/O costs

5
Problem Statement - 1

An association rule can be written as A ? B,
where
A is an itemset called the antecedent or
left-hand side (LHS), and
B is an itemset called the consequent or
right-hand side (RHS)
The association mining task is to discover a set
of association rules among a large number of
objects in a given database

6
Problem Statement - 2

The basic and fundamental task of the mining
association rules application is to generate all
association rules X ? Y (X, Y are itemsets) that
can be extracted from the database. These rules
must satisfy both the support and confidence
constraints
Support constraint Sup (X ? Y),
Confidence constraint Sup (X ? Y) / Sup (X)
Sup(X), is defined as the number of transactions
in which it occurs as a subset
An itemset is categorized as a frequent itemset
if its support is more than a minimum support
(MinSup) supplied by a user
The confidence factor represents the conditional
probability that a transaction contains Y (given
that the transaction contains X)
An association rule is said to be confident if
its confidence factor value is more than the
minimum confidence (MinCof) supplied by the user.

7
Simple Example - 1

Consider the sales database of food store, where
the objects represent customers and itemsets
represent food
In this example, the discovered patterns are the
set of food frequently bought together by the
customers.
An example pattern found could be that, 60
percent of the customers who buy cereal also buy
milk
The store can then use this knowledge for shelf
placement, controlling the stock, etc.
There are many potential application areas for
association rule technology, which include
catalog design, customer segmentation, store
layout, and so on

8
Simple Example - 2
MinSup 50
MinCof 100
9
The Lattice-Based Approach - 1

We use the Lattice-Theoretic to
Identify all frequent itemsets
Count the support of association rules
Pre-req Construct the tid-list from the
transaction database

10
The Lattice-Based Approach - 2

Construct the powerset Lattice P(I)

MinSup 50
Maximal freq. itemsets
11
The Lattice-Based Approach - 3

Compute support of iternsets via tid-list
intersections

12
Hybrid Search for Freq. Itemsets - 1

Hybrid Search used to quickly enumerate all
frequent itemsets
Hybrid Search combines both the top-down and
bottom-up search strategies and is based on the
intuition that the greater the support of a
frequent itemset, the more likely it is to be a
part of a longer frequent itemset
The hybrid approach is divided in two main steps
Initial phase containing the atoms rearrangement,
and
The hybrid process itself for generating all
frequent itemsets. In the second step, the
recursion process is repeated until no more
frequent itemset can be generated

13
Hybrid Search for Freq. Itemsets - 2

The first step simply rearranges the atoms in
descending order of their supports. The sorting
algorithm is involved in this step
The second step starts by intersecting a pair of
atoms one at a time
The intersection process is started from a pair
of atoms each of which having the largest support
among others to produce a larger and longer
frequent itemset.
The process stops when an extension becomes
infrequent (i.e., itemset that does not satisfy
the minimum support requirement).
The second bottom-up phase is then entered

14
Hybrid Search for Freq. Itemsets - 3
Infrequent Itemsets (MinSup 50)
Infrequent Itemsets
15
Design of Application

16
Test Data
Statistics of Test Data

17
Experimental Results - 1

Number of k-itemsets
18
Experimental Results - 2

Number of Association Rules
19
Experimental Results - 3

Computing Time
20
Experimental Results - 4

Support Counting Performance
21
Experimental Results - 5

Comparison Results
22
Conclusions

Experimental results show that the use of this
approach as well as the hybrid search method can
speed-up the computing time compared to both
apriori-based algorithms as well as the similar
lattice-based approach that uses the bottom-up
search strategy
Another interesting advantage of using the
lattice-based algorithm is concerned with time
used for scanning the databases. In this
context, the lattice-based algorithms requires a
single database scan once only. Hence, the I/O
overhead can be maximally minimized
As far as the computing speed is concerned, it
seems that substantial computing time are still
required to execute large databases. Although,
the lattice-approach is relatively powerful, it
indicates that some other computing
methodologies, such as the parallel algorithms
using the distributed computing environments need
to be considered to solve the computing speed
problem