Join Synopses for Approximate Query Answering - PowerPoint PPT Presentation

About This Presentation
Title:

Join Synopses for Approximate Query Answering

Description:

In this paper we demonstrate the difficulties in the traditional ... Scale factor 0.3 (database of about 300 megabytes). 296MHz UltraSparc-II. I/O - 5 MB/sec ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 25
Provided by: d059
Category:

less

Transcript and Presenter's Notes

Title: Join Synopses for Approximate Query Answering


1
Join Synopses for Approximate Query Answering
  • Swarup Acharya, Philip B. Gibbons, Viswanath
    Poosala, Sridhar Ramaswamy

By Vladimir Gamaley
2
Abstract
  • In large data environments its difficult to
    provide fast and reliable answers.
  • In this paper we demonstrate the difficulties in
    the traditional approach and propose the new
    technics for evaluation and maintenance of Join
    Synopses.

3
Introduction
  • Tradition query processing approach (exact answer
    though minimal time use)
  • Not always an exact answer is needed
  • Sometimes appropriate answer is enough

4
Introduction (continued)
  • Schemes for providing approximate answers that
    rely on basic relations alone suffer from serious
    disadvantages.
  • Use of precomputed small sets of distinguished
    joins.

5
Introduction (continued)
  • Careful allocation of place
  • Allocation heuristics
  • Providing approximate bounds
  • Join synopses maintenance
  • Experimental study results

6
AQUA System
  • The goal of AQUA system is to improve response
    times by avoiding accesses to the original data.
  • Maintenance of small synopses of various samples
    and histograms.

7
AQUA System (Components)
  • Statistics Collection
  • Query Rewriting
  • Maintenance

8
AQUA System (Architecture)
9
Problems with joins
  • Uniform random samples provide
  • Non uniform result samples
  • Small join results sizes

10
Problems with joins (example)
Base probability for tuple to be selected 1/r a1
and a2 - 1/r3 a1 and b1 - 1/r4 for k way foreign
join - 1/rk
11
Join Synopses
Foreign Key Join A two way join r1 r2 is a
foreign key join if the join attribute is a
foreign key in r1 (a key in r2). For k gt 2, a
k-way foreign join if there is an ordering
r1,r2..rk and for j 1,2,.. K, si-1 ri is a
2-way foreign join where si-1 is a relation
obtained joining r1, r2, ri-1
12
Join synopses
TPC-D scheme
13
Join synopses (continued)
Lemma 1 The subgraph of G on the k nodes in any
k - way foreign key join must be a connected
graph with a single root node
Lemma 2 There is a 1-1 correspondence between
tuples in a relation r1 and tuple in any k-way
foreign key join.
14
Join Synopses (continued)
Join Synopses For each node u in G,
corresponding to a relation r1, define J(u) to be
the output of the maximum foreign key join
r1,r2..rk with source k1. Let Su be a uniform
random sample of r1. Define a join synopses J(Su)
to be the output of join Su, r2, ..rk. The join
synopses for scheme consists of join synopses for
all us.
15
Join Synopses (continued)
Theorem Let r1,r2rk, kgt3 be an arbitrary k-way
foreign join, with source relation r1. Let u be
the node in G corresponding to r1 and let Su be a
uniform random sample of r1. Let A be the set of
attributes in r1,r1rk Then 1. J(Su) is a
uniform random sample of J(u) with Su tuples 2.
Join r1, r2rk is a projection of J(u) on the
attributes in r1, r2rk
16
Join Synopses (continued)
Lemma From a single join synopses for a node
whose maximum foreign key has k relations we can
extract uniform random sample of between k-1 to
2k-1 -1 distinct foreign key joins
Lemma For any node u whose maximum foreign key
join is a k-way join, number of tuples in its
renormalized join synopsis J(Su) is at most kSu
17
Space allocation strategies
ni - numbr of tuples allocated to the join fi-
fraction of queries for which the join is a
relation or the source of foreign key
join Theorem ni N N N/ si - size
of join tuple
18
Space allocation strategies
Heuristics EqJoin Equally between
relations CubeJoin In proportion to the cube
root of their join synopses tuple size PropJoin
In proportion to their join synopses size.
19
Maintenance of Join Synopses
Adding a tuple Deleting a tuple
20
Experiments Results
TestBed
TPC-D decision benchmark. Scale factor 0.3
(database of about 300 megabytes). 296MHz
UltraSparc-II. I/O - 5 MB/sec
21
Experiment 1accuracy - summary size
EquiBase, PropBase - produce answers only when
the summary size exceeds 1.5 of the
database. EquiJoin, PropJoin - good results even
for 0.1 of the database.
22
Experiment 2execution timing
Actual execution time - 122 seconds. The
response time increases with the summary
size. Query using Join Synopses needs in two
orders less time!
23
Related Work
Approximate query answering Statistical techniques
24
Conclusions
Schemes based on join synopses provide better
answer than those, based only on the basic
relations samples. Approximate answering is
becoming extremely important in new application
of data warehouses. However, there are still more
problems group-bys, ranks etc...
Write a Comment
User Comments (0)
About PowerShow.com