Caching Multi-dimensional Queries Using Chunks - PowerPoint PPT Presentation

About This Presentation
Title:

Caching Multi-dimensional Queries Using Chunks

Description:

Q60, Q80 and Q100 - 60, 80 and 100% of the queries access 20% of the cube ... Percentage of the total cost of queries saved due to hits in the cache ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 39
Provided by: karthikeya7
Category:

less

Transcript and Presenter's Notes

Title: Caching Multi-dimensional Queries Using Chunks


1
Caching Multi-dimensional Queries Using Chunks
  • Prasad Deshpande
  • Joint work with
  • Jeffrey F. Naughton
  • Karthikeyan Ramasamy
  • Amit Shukla

2
OLAP Schema
  • Star Schema of Fact and Dimension tables
  • Example
  • Product(pname, pid, pcategory)
  • Store(sname, sid, scity, sstate, country)
  • Date(dname, did, dday, dyear)
  • Sales(pid, sid, did, dollar_sales)

3
OLAP Queries
  • View data for some dimension members at different
    levels of aggregation
  • Example
  • Select pname, dmonth, sum(dollar_sales)
  • From Sales, Date, Product
  • Where pcategory clothes
  • AND dmonth in Jan, Feb, Mar
  • AND Sales.did Date.did
  • AND Sales.pid Product.pid
  • Group by pid, dmonth

4
Outline of Talk
  • OLAP data model and queries
  • Caching for OLAP queries
  • Chunk based caching
  • Chunked file organization
  • Implementation and performance results
  • Summary and future work

5
Motivation for Caching
  • Require interactive response time
  • Queries computationally expensive due to
    aggregation
  • Possible to exploit special properties of the
    OLAP data model and queries

6
Motivation
Select pname, dmonth, sum(dollar_sales) From
Sales, Date, Product Where pname
blaire_cotton_shirts AND dmonth in Jan,
Feb, Mar, Apr, May, Jun AND Sales.did
Date.did AND Sales.pid Product.pid Group by
pname, pmonth
Select pname, dmonth, sum(dollar_sales) From
Sales, Date, Product Where pname
blaire_cotton_shirts AND dmonth in Apr,
May, Jun, Jul, Aug, Sep AND Sales.did
Date.did AND Sales.pid Product.pid Group by
pname, pmonth
Overlapping Queries
7
Locality in OLAP Queries
  • Temporal
  • Hierarchical
  • Due to access patterns along the hierarchies in
    the dimensions
  • Data members related by the parent-child or
    sibling relationships are accessed together
  • Example Cities in Wisconsin

8
Classification
  • Unit of Caching
  • Query Level, Table level, Semantic Regions etc.
  • Nature of caching
  • Static, Dynamic
  • Desirable properties
  • Cache only relevant parts
  • Dynamic caching

9
Multi-dimensional Chunking
  • Multi-dimensional arrays are typically stored in
    chunked format
  • Distinct values of each dimension are divided
    into ranges
  • Chunks represent semantic regions

10
Chunking
Region
Date
Product
Schema (Product, Region, Date, Dollar Sales)
11
Chunk Caching
  • Motivation - partial/full reuse of results
  • Chunks -- Unit for caching
  • Query results
  • Split into a set of chunks
  • chunks are cached
  • Cache contains chunks at different levels of
    aggregation
  • Dynamic scheme
  • different replacement policies

12
Reusing Cached Chunks
Q1
Q3
Region
Q2
Product
13
Closure Property of Chunks
  • Mapping between chunks at different levels of
    aggregation leads to efficient computation of
    missing chunks
  • Example

0
12
14
13
15
8
11
1
9
10
Region
Region
5
6
7
4
2
3
3
0
1
2
Product
14
Issues
  • Imposing order on the domain of dimensions
  • Use hierarchy on the dimension
  • Better use of hierarchical locality
  • Example
  • Cities in Wisconsin are grouped together for
    Region
  • Size of the chunk
  • Granularity vs. Overhead

15
Chunked File Organization
  • Motivation
  • Reduce cost of a chunk miss
  • Chunk-based multi-dimensional arrays
  • Loss of relational access to the data
  • Apply chunking to relational tables
  • Data still stored as tuples
  • Tuples clustered on a chunk basis
  • A chunk index is built to get access based on
    chunk numbers

16
Advantages
  • Cost of accessing a chunk proportional to the
    size of chunk rather than entire table
  • Maintains relational interface
  • Achieves multi-dimensional clustering
  • Improves performance of Bitmap Index

17
Putting it Together
  • Analyze query selection predicates to get a list
    of chunk numbers
  • Group by clause specifies the level of
    aggregation (denoted by group by identifier)
  • (Group by id, Chunk number) is a key for looking
    up in the cache

18
Processing a Query
Client
OLAP Server
Cache
Backend
19
Implementation
  • Chunked file implemented in SHORE storage manager
    of the Paradise Database System
  • Tuples are stored in a fixed length record file
    called Fact file
  • Tuples are clustered on chunk number
  • A B-Tree is used to implement the chunk index

20
Replacement Policies
  • Simple LRU
  • Not very suitable for OLAP queries
  • Chunks at different levels of aggregation have
    different costs of computation
  • Benefit Based Policies
  • Associate a profit metric (benefit) with each
    chunk
  • Benefit of a chunk is measured by the fraction of
    base table it represents
  • Combined CLOCK scheme with benefit

21
Experimental Setup
  • Four dimensions with hierarchy sizes 3, 2, 3, 2
  • Base data size 500,000 tuples of 20 bytes each
  • Cube size 300 MB
  • Cache size 30 MB
  • Buffer pool size at backend 8MB
  • Platform Dual processor Pentium 133 MHz with
    128 MB memory
  • Query stream consisting of 1500 queries

22
Query Profile
  • Designated hot region - a large percentage of the
    queries access data in the hot region
  • Q60, Q80 and Q100 - 60, 80 and 100 of the
    queries access 20 of the cube
  • Proximity queries - model hierarchical locality
  • QRandom - 100 randomly generated
  • QEqual - 50 random, 50 proximity
  • Qproximity - 20 random, 80 proximity

23
Comparison With Query Caching
24
Comparison With Query Caching
25
Varying the Chunk Range
26
Replacement Policies
27
Summary and Future Work
  • Chunk based caching performs better than
    traditional query caching
  • Chunk range has to be chosen optimally
  • Benefit based LRU performs better than Simple LRU
  • Future work
  • Implement aggregation in the cache
  • Other cache policies such as pre-fetching

28
OLAP Data Model
  • OLAP data is multi-dimensional
  • Dimensions and Metrics
  • Example
  • Schema (Product, Region, Date, Dollar Sales)
  • Product, Region and Date are Dimensions
  • Dollar Sales is a metric
  • Dimensions have hierarchies

29
Previous work
  • Static caching of entire tables
  • Harinarayan, Rajaraman, Ullman
  • Materialized views
  • Srivastava, Dar, Jagadish, Levy
  • Limited by query containment

30
Benefits of Chunk Caching
  • Granularity of caching
  • Only frequently accessed chunks get cached
  • Query containment not necessary
  • Better reuse of cached results
  • Uniformity
  • Easier to reuse results

31
Bitmap performance
  • Bitmaps used for selection
  • Bits set may be scattered
  • Chunks lead to clustering of data
  • Fewer I/Os
  • Example

Product
Schema (Product, Region, Sales)
Select all tuples corresponding to Madison
Region
Madison
32
Bitmap Performance
33
Performance Metrics
  • Average query execution time
  • Cost saving ratio (CSR)
  • Percentage of the total cost of queries saved due
    to hits in the cache
  • Gives a more accurate measure of the benefit of
    caching than the hit ratio

34
Comparison With Query Caching
35
Comparison With Query Caching
36
Varying the Chunk Range
37
Replacement Policies
38
Putting it Together
  • Split chunk number list depending on chunks
    present in the cache
  • Chunks present in the cache are answered from the
    cache
  • Generate SQL statement for fetching missing
    chunks from the backend
  • Backend uses chunked file for efficient retrieval
    of missing chunks
Write a Comment
User Comments (0)
About PowerShow.com