Containment of Relational Queries with Annotation Propagation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Containment of Relational Queries with Annotation Propagation

Description:

A system that is able to propagate meta-data that is associated with a piece of ... The minimal query of Q2 is isomorphic to Q1 up to variable renaming. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 22
Provided by: wangch9
Category:

less

Transcript and Presenter's Notes

Title: Containment of Relational Queries with Annotation Propagation


1
Containment of Relational Queries with Annotation
Propagation
  • Wang-Chiew Tan
  • University of California, Santa Cruz

2
Annotation Management System
  • A system that is able to propagate meta-data that
    is associated with a piece of data along with the
    data as the data is being moved around
  • Main feature
  • To trace the provenance and flow of data

a2
transformation
a1
a1
a2
3
Tracing the Provenance and Flow of Data
a2
b2
transformation
a1
b1
a1
a2
b3
a3
transformation
b2
b1
b3
a1
a2
4
Other Applications
  • Keep information that cannot be otherwise stored
    in the current database design
  • Highlight wrong data
  • Errorneous data may be copied but the comment
    that it is wrong goes along with it
  • Security
  • Annotate security level of data items
  • Quality metric
  • Annotate quality level of data items

5
Main Question
  • Are the annotated outcomes the same for
    equivalent queries?
  • Why this question?
  • A query optimizer rewrites a query. Will the
    rewritten query have the same annotation
    propagation behavior?

6
A Simple Example
  • Given two relation schemas R(A,B), S(B,C)
  • SELECT
  • FROM R NATURAL JOIN S
  • versus
  • SELECT r.A, r.B, s.C
  • FROM R r, S s
  • WHERE r.B s.B

R 1 2
S 2 3
a
b
a
7
In a More Concise Notation
a
b
  • Ans(x,y,z) - R(x,y), S(y,z) x ! 1, y !
    2, z ! 3
  • Ans(x,y,z) - R(x,y), S(y,z), y y
  • x ! 1,
    y! 2, y! 2, z ! 3
  • A location is a triple (R, t, A)
  • Annotations of values that reside in different
    locations but are bound to the same variable are
    unioned together
  • Ans(y) - R(x,y)
  • Ans(y) - S(y,z)
  • Ans(2 )
  • Annotations that belong to the same output
    location are unioned together

a
a
b
a
b
8
More Examples
  • Q1
  • Ans(x,v) - R(x,y,u), R(x,z,v), R(t,w,z)
  • Q2
  • Ans(x,v) - R(p,q,v), R(x,z,v), R(t,w,z)
  • First answer Ans(1, 5 )
  • Second answer Ans(1, 5 )

R 1 2 3 1 4 5 1 8 4 8 9 5
a
b
c
d
a
b
c
c
d
b
9
A sufficient condition for annotation containment
  • Theorem If Q1 and Q2 are equivalent and Q1 is
    minimal, then Q1 is annotation-contained in Q2
  • Intuition of proof
  • If Q1 is minimal, then no proper subquery of Q1
    is equivalent to Q1
  • The minimal query of Q2 is isomorphic to Q1 up to
    variable renaming. Assume that they are
    identical.
  • Any valuation ? for Q1 can be simulated by a
    valuation ? h that carries annotations in the
    same way as ? of Q1 (h is the homomorphism from
    Q2 to its minimal subquery)

10
Is the sufficient condition too strong?
  • Is it true that if Q1 is equivalent to Q2, then
    Q1 is annotation-contained in Q2?
  • Answer No.
  • Is it true that if Q1 is contained in Q2 and Q1
    is minimal, then Q1 is annotation contained in
    Q2?
  • Answer No.
  • Q1 Ans(x) - R(x, y), S(x, y)
  • Q2 Ans(x) - R(x, y)
  • Ans(1 ) Ans (1 )
  • Both Q1 and Q2 are minimal queries but neither Q1
    nor Q2 are annotation-contained in each other

R 1 2 1 3
S 1 2
a
c
b
a
c
b
a
11
Necessary and Sufficient condition?
ith column
jth column
pth subgoal
Q1 H( x ) - S( x )
h(y) x, h maps the qth subgoal of Q2 to the pth
subgoal of Q1
Q2 H( y ) - S( y )
ith column
jth column
qth subgoal
  • If Q1 carries an annotation of the jth column of
    some S-tuple to the output, there is a way for Q2
    to simulate this behavior via homomorphism h

12
A necessary and sufficient condition for
annotation-containment via homomorphisms
  • Theorem Q1 is annotation-contained in Q2 iff for
    every distinguished variable x that occurs at the
    ith column in the head and jth column of the pth
    subgoal in the body of Q1, there exists a
    homomorphism h from Q2 to Q1 such that
  • h maps the body of Q2 into the body of Q1 and the
    head of Q2 to the head of Q1
  • Let the qth subgoal Q2 be the preimage of the pth
    subgoal of Q1 under h. The variable that occurs
    at the jth column of the qth subgoal of Q2 is
    identical to the variable that occurs at the ith
    column in the head of Q2

13
Can a single homomorphism do the job?
  • Q1 Ans(x) - R(x,y), R(x,z)
  • Q2 Ans(x) - R(x,y)
  • Every homomorphism from Q2 to Q1 maps the body of
    Q2 to only one subgoal of Q1

14
Complexity of Annotation-Containment
  • Proposition It is NP-complete to decide if Q1 is
    annotation-contained in Q2

15
Propagating annotations back
  • If we wish to attach an annotation on a piece of
    data in the output, on which source data should
    we attach an annotation?
  • The user should be given the choice
  • Alert the user of a side-effect-free annotation
    when there is one

16
Annotation Placement Problem
  • Given the source database, the query, the output
    data that we wish to annotation, it is DP-hard to
    decide if there is a side-effect-free annotation
  • Upper-bound is not DP
  • Conjecture in a class slightly above DP

17
Related Work
  • Idea is not new though annotations were never
    explicitly stated as provenance-based Wang
    Madnick VLDB 90, Lee, Bressan Madnick WIDM
    98, Bernstein Bergstraesser IEEE Data Eng.
    99
  • Annotations of Web Documents
  • Annotations on genomic sequences

18
Open Issues
  • Are there polynomial time algorithms for deciding
    annotation-containment for the class of queries
    with bounded treewidths
  • Is query minimization church-rosser?
  • Exact complexity of the annotation placement
    problem?
  • Annotation and propagation for XML data
  • Relationship between annotation-containment and
    containment of conjunctive queries under bag
    semantics

19
Open Issues (contd)
  • Other annotation propagation semantics other than
    basing on provenance?
  • Querying the annotations?

20
Other results that do not carry over
  • Query Minimization
  • We can no longer minimize a query and preserve
    annotation-equivalence by discarding one subgoal
    at a time
  • Answering Queries using Views
  • Some classical results no longer hold
  • LMSS95 if a query Q has p subgoals and a query
    Q is a complete minimal rewriting of Q using a
    set of views V, then Q has at most p subgoals

21
Example
  • Q A(x) - R(x,z,v), R(x,u,z), R(x,z,t),
    R(x,s,z)
  • Q A(x) - R(x,u,z), R(x,z,t), R(x,s,z)
  • Qmin A(x) - R(x,z,t), R(x,s,z)

R 1 2 3 1 3 2 1 4 5 1
4 6
a1
a2
a3
a4
Write a Comment
User Comments (0)
About PowerShow.com