Join Synopses for Approximate Query Answering - PowerPoint PPT Presentation

About This Presentation

Title:

Join Synopses for Approximate Query Answering

Description:

In this paper we demonstrate the difficulties in the traditional ... Scale factor 0.3 (database of about 300 megabytes). 296MHz UltraSparc-II. I/O - 5 MB/sec ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 25

Provided by: d059

Category:

more less

Transcript and Presenter's Notes

Title: Join Synopses for Approximate Query Answering

1
Join Synopses for Approximate Query Answering

Swarup Acharya, Philip B. Gibbons, Viswanath
Poosala, Sridhar Ramaswamy

By Vladimir Gamaley
2
Abstract

In large data environments its difficult to
provide fast and reliable answers.
In this paper we demonstrate the difficulties in
the traditional approach and propose the new
technics for evaluation and maintenance of Join
Synopses.

3
Introduction

Tradition query processing approach (exact answer
though minimal time use)
Not always an exact answer is needed
Sometimes appropriate answer is enough

4
Introduction (continued)

Schemes for providing approximate answers that
rely on basic relations alone suffer from serious
disadvantages.
Use of precomputed small sets of distinguished
joins.

5
Introduction (continued)

Careful allocation of place
Allocation heuristics
Providing approximate bounds
Join synopses maintenance
Experimental study results

6
AQUA System

The goal of AQUA system is to improve response
times by avoiding accesses to the original data.
Maintenance of small synopses of various samples
and histograms.

7
AQUA System (Components)

Statistics Collection
Query Rewriting
Maintenance

8
AQUA System (Architecture)
9
Problems with joins

Uniform random samples provide
Non uniform result samples
Small join results sizes

10
Problems with joins (example)
Base probability for tuple to be selected 1/r a1
and a2 - 1/r3 a1 and b1 - 1/r4 for k way foreign
join - 1/rk
11
Join Synopses
Foreign Key Join A two way join r1 r2 is a
foreign key join if the join attribute is a
foreign key in r1 (a key in r2). For k gt 2, a
k-way foreign join if there is an ordering
r1,r2..rk and for j 1,2,.. K, si-1 ri is a
2-way foreign join where si-1 is a relation
obtained joining r1, r2, ri-1
12
Join synopses
TPC-D scheme
13
Join synopses (continued)
Lemma 1 The subgraph of G on the k nodes in any
k - way foreign key join must be a connected
graph with a single root node
Lemma 2 There is a 1-1 correspondence between
tuples in a relation r1 and tuple in any k-way
foreign key join.
14
Join Synopses (continued)
Join Synopses For each node u in G,
corresponding to a relation r1, define J(u) to be
the output of the maximum foreign key join
r1,r2..rk with source k1. Let Su be a uniform
random sample of r1. Define a join synopses J(Su)
to be the output of join Su, r2, ..rk. The join
synopses for scheme consists of join synopses for
all us.
15
Join Synopses (continued)
Theorem Let r1,r2rk, kgt3 be an arbitrary k-way
foreign join, with source relation r1. Let u be
the node in G corresponding to r1 and let Su be a
uniform random sample of r1. Let A be the set of
attributes in r1,r1rk Then 1. J(Su) is a
uniform random sample of J(u) with Su tuples 2.
Join r1, r2rk is a projection of J(u) on the
attributes in r1, r2rk
16
Join Synopses (continued)
Lemma From a single join synopses for a node
whose maximum foreign key has k relations we can
extract uniform random sample of between k-1 to
2k-1 -1 distinct foreign key joins
Lemma For any node u whose maximum foreign key
join is a k-way join, number of tuples in its
renormalized join synopsis J(Su) is at most kSu
17
Space allocation strategies
ni - numbr of tuples allocated to the join fi-
fraction of queries for which the join is a
relation or the source of foreign key
join Theorem ni N N N/ si - size
of join tuple
18
Space allocation strategies
Heuristics EqJoin Equally between
relations CubeJoin In proportion to the cube
root of their join synopses tuple size PropJoin
In proportion to their join synopses size.
19
Maintenance of Join Synopses
Adding a tuple Deleting a tuple
20
Experiments Results
TestBed
TPC-D decision benchmark. Scale factor 0.3
(database of about 300 megabytes). 296MHz
UltraSparc-II. I/O - 5 MB/sec
21
Experiment 1accuracy - summary size
EquiBase, PropBase - produce answers only when
the summary size exceeds 1.5 of the
database. EquiJoin, PropJoin - good results even
for 0.1 of the database.
22
Experiment 2execution timing
Actual execution time - 122 seconds. The
response time increases with the summary
size. Query using Join Synopses needs in two
orders less time!
23
Related Work
Approximate query answering Statistical techniques
24
Conclusions
Schemes based on join synopses provide better
answer than those, based only on the basic
relations samples. Approximate answering is
becoming extremely important in new application
of data warehouses. However, there are still more
problems group-bys, ranks etc...

Write a Comment

User Comments (0)