Join%20Synopses%20for%20Approximate%20Query%20Answering - PowerPoint PPT Presentation

About This Presentation
Title:

Join%20Synopses%20for%20Approximate%20Query%20Answering

Description:

demonstrates difficulty of providing good approximate answers to join queries. proposes join synopses as the efficient ... sum (nisi) = Total memory allocated ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 19
Provided by: cmk264
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Join%20Synopses%20for%20Approximate%20Query%20Answering


1
Join Synopses for Approximate Query Answering
  • Swarup Achrya
  • Philip B. Gibbons
  • Viswanath Poosala
  • Sridhar Ramaswamy
  • Presented by
  • Bhushan Pachpande

2
Contents
  1. Introduction
  2. Need for approximate answers
  3. Problem with joins
  4. Join synopses
  5. Allocation
  6. Maintenance of join synopses
  7. Experimental Evaluation

3
Introduction
  • This paper
  • demonstrates difficulty of providing good
    approximate answers to join queries
  • proposes join synopses as the efficient solution
    for this problem
  • presents strategy for allocating available space
    for join synopses
  • provides efficient algorithm for maintaining join
    synopses in presence of updates to the base
    relations

4
Why Approximate answers ?
  • Reduce overhead for large DBs and improve
    response time
  • Reduce access to the base relation
  • Example of Approximate answers
  • Initial queries in the data mining which are used
    to determine what the interesting queries are
  • Queries requesting numerical answers and full
    precision of exact answer not needed e.g. total,
    average
  • The research in this paper was conducted while
    developing efficient approximate query answering
    system, Aqua.

5
Aqua System
  • improve response time by avoiding frequent access
    to original data
  • maintains smaller sized statistical summaries,
    called synopses, on warehouse.
  • sits on the top of the DBMS.

Collects all synopses, uses it to answer queries
posed by user
  • There key components
  • Statistic Collection
  • Query Rewriting
  • Maintenance

parses sql input, rewrite queries for scaling
certain operators to fit for synopses
Keep synopses up to date during updating of
original data
6
Problem with Joins
  • Natural set of synopses for an approximate query
    includes uniform random samples of each base
    relations
  • Non-uniform result samples - For the join to be
    uniform random sample, probabilities of tuples in
    join samples must be equal
  • Small join result

Join of relations R S on attribute X
Probabilities of tuples a1 and a2 being
selected should be same as prob. of tuples a1
and b1 selected in join
R.X
S.X
a a b b
a b
a1
a2
prob. (a1,a2)(1/r)(1/r)(1/r)1/r3
b1
prob. (a1,b1)(1/r)(1/r)(1/r) (1/r)1/r4
To get uniform join samples is very difficult
uniform random sampling
7
Join Synopses
  • Naïve way - execute all possible join queries and
    collect samples
  • Join synopses - samples are taken from small set
    of distinguished joins
  • Can obtain random samples of all possible joins
    in the schema
  • This is scheme is for foreign key joins
  • Modeled database schema as a graph
  • vertex - base relation
  • directed edge (u to v) if u has at least one
    attribute which is foreign key in v

8
Join Synopses
  • Key result proved - There is 1-1 correspondence
    between a tuple in relation r a tuple in the
    output of any foreign key join involving r
    any of its descendents in the graph.
  • A sample Sr of a relation r can be used to
    produce another relation J(Sr) called a join
    synopsis of r. ( provides random samples).
  • Join synopses of R is simply a sample of R where
    as for C it is the join of N, R and sample of C.

9
Join Synopses
  • For each node u in database schema G,
    corresponding to a relation r1, define J(u) to be
    the output of the maximum foreign key join
    r1xr2x..xrk with source r1.
  • Let Su be a uniform random sample of r1.
  • The join synopsis J(Su) is the output of
    Suxr2xr3..xrk.
  • J(Su) is a uniform random sample of J(u) with
    Su tuples.
  • Thus we can extract from our synopsis a uniform
    random sample of the output of any k-way foreign
    key join.
  • From 1 join synopsis for a node whose foreign key
    join has k relations, we can extract a URS of the
    output of between k-1 pow(2,k-1)-1 distinct
    foreign key joins.

10
Allocation
  • Allocate space among various join synopses when
    certain properties of query workload are known.
  • Identify heuristics for the common case when such
    properties are not known.
  • Let S be a set of queries with selects,
    aggregates, group bys foreign key joins.
  • For each relation Ri, find fraction Fi of queries
    in S for which Ri is the source relation in a
    foreign key join.
  • It is known that the error bounds are inversely
    proportional to sqrt(n).(n- number of tuples in
    join sample).
  • Select join synopsis sizes so as to minimize the
    average relative error.

11
Allocation
  • The average relative error bound over the queries
    is proportional to sum(fi/sqrt(ni))
  • ni is selected so as to minimize the above
    equation for the total memory allocated for join
    synopses
  • For each relation Ri if si size of single join
    synopses tuple then join synopses size is chosen
    so as
  • sum (nisi) lt Total memory allocated
  • In the absence of query work load information
    heuristic strategies can be used.
  • EqJoin
  • CubeJoin
  • PropJoin

divides space equally amongst the relations
divides space proportional to their join synopses
tuple sizes
divides space proportional to cube root of their
join synopses tuple sizes
12
Maintenance of Join Synopses
  • Need to maintain the join synopses when base
    relation is updated (insert or delete)
  • does not require frequent access to base relation
  • If a new tuple is inserted
  • Let Pu be the probability of newly arrived tuple
    for relation u in random sample Su
  • Let uxr2xr3x.xrk be the max foreign key join
    with source u.
  • We add T (new tuple) to Su with probability Pu.
  • If T is added to Su, we add to J(Su) the tuple
    Txr2xr3x.rk

13
Maintenance of Join Synopses
  • If T is added to Su and Su exceeds its target
    size, then select uniformly at random a tuple T
    to evict from Su and remove the tuple in J(Su)
    corresponding to T.
  • On delete of a tuple T from u
  • T is in Su delete the tuple from Su and remove
    the tuple from J(Su) corresponding to T
  • If sample becomes too small due to many deletions
    repopulate by scanning relation u.
  • This algorithm performs lookups with the base
    relation with small probability Pu

14
Experimental Evaluation
  • Two classes of experiments
  • Accuracy experiments
  • Maintenance experiments
  • Accuracy Experiments
  • Compares accuracy of techniques based on join
    synopses and based on base samples
  • parameters varied - query selectivity and total
    space allocated to precomputed summaries (summary
    size/join synopses size)
  • Maintenance Experiments
  • Study cost of keeping join synopses up to date in
    presence of insertions/deletions to the
    underlying data.

15
Experimental Evaluation
  • Ran results on TPC-D decision support benchmark
  • Query used is an aggregate that is computed on
    join of Lineitem, Customer, Order, Supplier,
    Nation and Region.
  • The query used is

Query selectivity is varied using these parameters
  • region parameter is set to ASIA and selection
    predicate is on o_orderdate column to the range
    1/1/94, 1/1/95

16
Experimental Evaluation
Accuracy Experiments
17
Experimental Evaluation
Maintenance Experiments
tuples inserted in lineitem table
18
Conclusion
  • Provides uniform random sampling for joins in the
    database having foreign key joins.
  • Focus on computing approximate answers to
    aggregates computed on multi-way joins.
  • Join synopses can be maintained effectively
    during updates.
Write a Comment
User Comments (0)
About PowerShow.com