Frequent Itemsets Mining for Database autoadministration - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Frequent Itemsets Mining for Database autoadministration

Description:

Frequent Itemsets Mining for Database auto-administration. Le Gruenwald ... search. Set of frequent itemsets. Index construction. Candidate. indexes. Queries ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 15
Provided by: etua
Category:

less

Transcript and Presenter's Notes

Title: Frequent Itemsets Mining for Database autoadministration


1
Frequent Itemsets Mining for Database
auto-administration
International Database EngineeringApplications
Symposium
2
Plan
  • Introduction
  • State of the art
  • Our proposal
  • Experiment results
  • Conclusion and perspectives

3
Introduction
  • Database administration
  • Definition of conceptual schema
  • Definition of files and disc storage
  • Data access optimization (creation of indexes and
    materialized views)
  • Minimize and automate the administration
    functions
  • Auto-select indexes
  • Using data mining (frequent itemsets)

4
State of the art
  • Frank et al., 1992 (Georgia Tech)
  • Construction of an index graph based on a
    workload
  • Dialogue with the query optimizer ? index cost
  • Choice of the least expensive indexes for the
    workload
  • Chaudhuri et al., 1997-2001 (Microsoft)
  • Selection of indexable attributes based on a
    workload
  • Construction of mono and multi-attribute
    configuration
  • Dialogue with the query optimizer ? configuration
    cost
  • Choice of the optimal configuration
  • Brunel, Rollin et al., 2001 (Lyon 2 - Oklahoma)
  • Based on the work of Frank et al.
  • First work about the extraction of workload
    characteristics

5
Log file of transactions
Log file extraction
SQL query syntactic analyzer
Ensemble dattributs candidats
Construction of the Queries-attributes matrix
Queries-attributes matrix
Frequent Itemsets search
Database and workload
Set of frequent itemsets
Candidate indexes
Index construction
6
Queries-attributes matrix
  • For each query within the workload? extract
    attributes present on the clauses
  • Where
  • Group by
  • Order By

Xij 1 if the attribute j appears in the clauses
of the query i Xij 0 in the other case
7
Example
Q1 Select from T1, T2 where A between 1 and
10 And C D Q2 Select from T1, T2 where B
like sample and C 5 and E lt 100 Q3 Select
from T1, T2 where A 30 and B gt 3 group by C
Having Sum(E) gt 2 Q4 Select from T1 where B
gt 2 and E in (3,2,5) Q5 Select from T1, T2
where A 30 and B gt 3 group by C Having Sum(E) gt
2 Q6 Select from T1, T2 where B gt 3 group by
C Having Sum(E) gt 2
Application of Close with a minimal support
minsup 2/6
8
Indexes construction
  • Building all indexes
  • For each frequent itemset corresponds an index
  • This method is feasible when the number of
    indexes is relatively small
  • Building indexes on large tables
  • Reduce the number of indexes to build
  • The DBA defines whether a table is large or not

9
Database experiments
  • TPC-R Database
  • Transaction Processing Council
  • Decision support benchmark TPC-R
  • 8 Tables
  • Size 1 GB
  • Accidentology Data Mart
  • BDD research group (ERIC)
  • One Fact table Accident and 4 Dimension tables
    Place, Date, Condition and PersonResponsible
  • 65 000 Records

10
TPC-R results- All indexes
11
TPC-R results - Indexes on large tables
12
Data Mart results
13
Conclusions
  • Use of data mining to determine an index
    configuration
  • Mono et multi-attribute indexes are generated on
    the fly
  • Response time improvements of 20 to 25 for
    decision support worklod (TPC-R)
  • Response time improvements is about 14

14
Perspectives
  • Improve index selection by designing more
    elaborated strategies (cost model, weighting the
    workloads queries
  • Tests on large data warehouses
  • Comparison with the Microsoft technique (query
    optimizer vs. frequent itemsets)
  • Extending or coupling our approach with other
    performance optimization (materialized views,
    physical clustering, ...)
  • Exploit other data mining algorithms (functional
    dependencies or inclusion dependencies)
Write a Comment
User Comments (0)
About PowerShow.com