Frequent Itemsets Mining for Database autoadministration

About This Presentation

Title:

Frequent Itemsets Mining for Database autoadministration

Description:

Frequent Itemsets Mining for Database auto-administration. Le Gruenwald ... search. Set of frequent itemsets. Index construction. Candidate. indexes. Queries ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 15

Provided by: etua

Category:

more less

Transcript and Presenter's Notes

Title: Frequent Itemsets Mining for Database autoadministration

1
Frequent Itemsets Mining for Database
auto-administration
International Database EngineeringApplications
Symposium
2
Plan

Introduction
State of the art
Our proposal
Experiment results
Conclusion and perspectives

3
Introduction

Database administration
Definition of conceptual schema
Definition of files and disc storage
Data access optimization (creation of indexes and
materialized views)
Minimize and automate the administration
functions
Auto-select indexes
Using data mining (frequent itemsets)

4
State of the art

Frank et al., 1992 (Georgia Tech)
Construction of an index graph based on a
workload
Dialogue with the query optimizer ? index cost
Choice of the least expensive indexes for the
workload
Chaudhuri et al., 1997-2001 (Microsoft)
Selection of indexable attributes based on a
workload
Construction of mono and multi-attribute
configuration
Dialogue with the query optimizer ? configuration
cost
Choice of the optimal configuration
Brunel, Rollin et al., 2001 (Lyon 2 - Oklahoma)
Based on the work of Frank et al.
First work about the extraction of workload
characteristics

5
Log file of transactions
Log file extraction
SQL query syntactic analyzer
Ensemble dattributs candidats
Construction of the Queries-attributes matrix
Queries-attributes matrix
Frequent Itemsets search
Database and workload
Set of frequent itemsets
Candidate indexes
Index construction
6
Queries-attributes matrix

For each query within the workload? extract
attributes present on the clauses
Where
Group by
Order By

Xij 1 if the attribute j appears in the clauses
of the query i Xij 0 in the other case
7
Example
Q1 Select from T1, T2 where A between 1 and
10 And C D Q2 Select from T1, T2 where B
like sample and C 5 and E lt 100 Q3 Select
from T1, T2 where A 30 and B gt 3 group by C
Having Sum(E) gt 2 Q4 Select from T1 where B
gt 2 and E in (3,2,5) Q5 Select from T1, T2
where A 30 and B gt 3 group by C Having Sum(E) gt
2 Q6 Select from T1, T2 where B gt 3 group by
C Having Sum(E) gt 2
Application of Close with a minimal support
minsup 2/6
8
Indexes construction

Building all indexes
For each frequent itemset corresponds an index
This method is feasible when the number of
indexes is relatively small
Building indexes on large tables
Reduce the number of indexes to build
The DBA defines whether a table is large or not

9
Database experiments

TPC-R Database
Transaction Processing Council
Decision support benchmark TPC-R
8 Tables
Size 1 GB
Accidentology Data Mart
BDD research group (ERIC)
One Fact table Accident and 4 Dimension tables
Place, Date, Condition and PersonResponsible
65 000 Records

10
TPC-R results- All indexes
11
TPC-R results - Indexes on large tables
12
Data Mart results
13
Conclusions

Use of data mining to determine an index
configuration
Mono et multi-attribute indexes are generated on
the fly
Response time improvements of 20 to 25 for
decision support worklod (TPC-R)
Response time improvements is about 14

14
Perspectives

Improve index selection by designing more
elaborated strategies (cost model, weighting the
workloads queries
Tests on large data warehouses
Comparison with the Microsoft technique (query
optimizer vs. frequent itemsets)
Extending or coupling our approach with other
performance optimization (materialized views,
physical clustering, ...)
Exploit other data mining algorithms (functional
dependencies or inclusion dependencies)