Mining Confident Rules Without Support Requirements - PowerPoint PPT Presentation

About This Presentation

Title:

Mining Confident Rules Without Support Requirements

Description:

CIKM01. 1. Mining Confident Rules Without Support Requirements. Ke Wang. Yu He. D. W. Cheung. F. Y. L. Chin. CIKM01. 2. Association Rules. Given a table over A1, ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 21

Provided by: css64

Category:

more less

Transcript and Presenter's Notes

Title: Mining Confident Rules Without Support Requirements

1
Mining Confident Rules Without Support
Requirements

Ke Wang
Yu He
D. W. Cheung
F. Y. L. Chin

2
Association Rules

Given a table over A1,,Ak, C
Find all rules Aiai Cc of minimum
confidence and minimum support
Support sup(Aiai) records containing Aiai
Confidence sup(Aiai Cc)/sup(Aiai)

3
Low Support Rules

Interesting rules unknown low
support
High support rules low confidence
Often, patterns are fragmented into many low
support rules

Find all rules above the minimum confidence
4
Confidence-based Pruning

Without minimum support, the classic
support-based pruning inapplicable
Confident rules are neither downward closed nor
upward closed
Need new strategies for pushing the confidence
requirement.

5
Confidence-based Pruning

r1 Ageyoung Buyyes
r2 Ageyoung, GenderM Buyyes
r3 Ageyoung, GenderF Buyyes

Observation 1 if r1 is confident, so is one of
r2 and r3 (specialized by Gender)
Observation 2 if no specialized rule of r1 is
confident, r1 can be pruned
6
Confidence-based Pruning

Level-wise rule generation Generate a candidate
rule x c only if for every attribute A not in
x c, some A-specialization of x c is
confident.

7
The algorithm

Input table T over A1,,Am,C, and miniconf
Output all confident rules
1. km
2. Rulek all confident m-rules
3. while kgt1 and Rulek is not empty do
4. generate Candk-1 from Rulek
5. compute the confidence of Candk-1 in one
pass of T
6. Rulek-1 all confident candidates in
Candk-1
7. k--
8. return all Rulek

8
Disk-based Implementation

Assumption T, Rulek, Candk-1 are stored on disk.
We focus on
generating Candk-1 from Rulek and
computing the confidence for Candk-1.
Key clustering T, Rulek, Candk-1 according to
attributes Ai

9
Clustering by Hash Partitioning

hi --- the hash function for attribute Ai, i1,,
m
Table T is partitioned into T-buckets
Rulek is partitioned into R-buckets
Candk-1 is partitioned into C-buckets
A bucket-id is a sequence of hash values involved
b1,bk

10
Pruning by Checking Bucket Ids

A tuple in a T-bucket supports a candidate in a
C-bucket only if the T-bucket id matches the
C-bucket id.
E.g., T-bucekt A1.1,A2.1,A3.2 matches C-buckets
A1.1, A3.2 and A1.1,A2.1
A C-bucket b1,,bk is nonempty only if for
every other attribute A, some R-bucket
b1,,bk,bA is nonempty

11
Hypergraph Hk-1

A vertex corresponds to a T-bucket
An edge corresponds to a C-bucket, which contains
a vertex if and only if the C-bucket matches the
T-bucket
Hk-1 is in memory.

12
The Optimal Blocking

Assume that we can read several T-buckets each
time, called a T-block.
For each T-block, we need to access the matching
C-buckets from disk.
We want the optimal blocking of T-blocks so that
the access of C-buckets is minimized.
This problem is NP-hard.

13
Heuristics

Heuristic I The more T-buckets match a C-bucket,
the higher priority such T-buckets should be in
the next T-block.
Heuristic II The more C-buckets matches a
T-bucket, the higher priority this T-bucket
should be in the next T-block.