Title: Containment of Relational Queries with Annotation Propagation
1Containment of Relational Queries with Annotation
Propagation
- Wang-Chiew Tan
- University of California, Santa Cruz
2Annotation Management System
- A system that is able to propagate meta-data that
is associated with a piece of data along with the
data as the data is being moved around - Main feature
- To trace the provenance and flow of data
a2
transformation
a1
a1
a2
3Tracing the Provenance and Flow of Data
a2
b2
transformation
a1
b1
a1
a2
b3
a3
transformation
b2
b1
b3
a1
a2
4Other Applications
- Keep information that cannot be otherwise stored
in the current database design - Highlight wrong data
- Errorneous data may be copied but the comment
that it is wrong goes along with it - Security
- Annotate security level of data items
- Quality metric
- Annotate quality level of data items
5Main Question
- Are the annotated outcomes the same for
equivalent queries? - Why this question?
- A query optimizer rewrites a query. Will the
rewritten query have the same annotation
propagation behavior?
6A Simple Example
- Given two relation schemas R(A,B), S(B,C)
- SELECT
- FROM R NATURAL JOIN S
- versus
- SELECT r.A, r.B, s.C
- FROM R r, S s
- WHERE r.B s.B
R 1 2
S 2 3
a
b
a
7In a More Concise Notation
a
b
- Ans(x,y,z) - R(x,y), S(y,z) x ! 1, y !
2, z ! 3 - Ans(x,y,z) - R(x,y), S(y,z), y y
- x ! 1,
y! 2, y! 2, z ! 3 - A location is a triple (R, t, A)
- Annotations of values that reside in different
locations but are bound to the same variable are
unioned together - Ans(y) - R(x,y)
- Ans(y) - S(y,z)
- Ans(2 )
- Annotations that belong to the same output
location are unioned together
a
a
b
a
b
8More Examples
- Q1
- Ans(x,v) - R(x,y,u), R(x,z,v), R(t,w,z)
- Q2
- Ans(x,v) - R(p,q,v), R(x,z,v), R(t,w,z)
- First answer Ans(1, 5 )
- Second answer Ans(1, 5 )
R 1 2 3 1 4 5 1 8 4 8 9 5
a
b
c
d
a
b
c
c
d
b
9A sufficient condition for annotation containment
- Theorem If Q1 and Q2 are equivalent and Q1 is
minimal, then Q1 is annotation-contained in Q2 - Intuition of proof
- If Q1 is minimal, then no proper subquery of Q1
is equivalent to Q1 - The minimal query of Q2 is isomorphic to Q1 up to
variable renaming. Assume that they are
identical. - Any valuation ? for Q1 can be simulated by a
valuation ? h that carries annotations in the
same way as ? of Q1 (h is the homomorphism from
Q2 to its minimal subquery)
10Is the sufficient condition too strong?
- Is it true that if Q1 is equivalent to Q2, then
Q1 is annotation-contained in Q2? - Answer No.
- Is it true that if Q1 is contained in Q2 and Q1
is minimal, then Q1 is annotation contained in
Q2? - Answer No.
- Q1 Ans(x) - R(x, y), S(x, y)
- Q2 Ans(x) - R(x, y)
- Ans(1 ) Ans (1 )
- Both Q1 and Q2 are minimal queries but neither Q1
nor Q2 are annotation-contained in each other
R 1 2 1 3
S 1 2
a
c
b
a
c
b
a
11Necessary and Sufficient condition?
ith column
jth column
pth subgoal
Q1 H( x ) - S( x )
h(y) x, h maps the qth subgoal of Q2 to the pth
subgoal of Q1
Q2 H( y ) - S( y )
ith column
jth column
qth subgoal
- If Q1 carries an annotation of the jth column of
some S-tuple to the output, there is a way for Q2
to simulate this behavior via homomorphism h
12A necessary and sufficient condition for
annotation-containment via homomorphisms
- Theorem Q1 is annotation-contained in Q2 iff for
every distinguished variable x that occurs at the
ith column in the head and jth column of the pth
subgoal in the body of Q1, there exists a
homomorphism h from Q2 to Q1 such that - h maps the body of Q2 into the body of Q1 and the
head of Q2 to the head of Q1 - Let the qth subgoal Q2 be the preimage of the pth
subgoal of Q1 under h. The variable that occurs
at the jth column of the qth subgoal of Q2 is
identical to the variable that occurs at the ith
column in the head of Q2
13Can a single homomorphism do the job?
- Q1 Ans(x) - R(x,y), R(x,z)
- Q2 Ans(x) - R(x,y)
- Every homomorphism from Q2 to Q1 maps the body of
Q2 to only one subgoal of Q1
14Complexity of Annotation-Containment
- Proposition It is NP-complete to decide if Q1 is
annotation-contained in Q2
15Propagating annotations back
- If we wish to attach an annotation on a piece of
data in the output, on which source data should
we attach an annotation? - The user should be given the choice
- Alert the user of a side-effect-free annotation
when there is one
16Annotation Placement Problem
- Given the source database, the query, the output
data that we wish to annotation, it is DP-hard to
decide if there is a side-effect-free annotation - Upper-bound is not DP
- Conjecture in a class slightly above DP
17Related Work
- Idea is not new though annotations were never
explicitly stated as provenance-based Wang
Madnick VLDB 90, Lee, Bressan Madnick WIDM
98, Bernstein Bergstraesser IEEE Data Eng.
99 - Annotations of Web Documents
- Annotations on genomic sequences
18Open Issues
- Are there polynomial time algorithms for deciding
annotation-containment for the class of queries
with bounded treewidths - Is query minimization church-rosser?
- Exact complexity of the annotation placement
problem? - Annotation and propagation for XML data
- Relationship between annotation-containment and
containment of conjunctive queries under bag
semantics
19Open Issues (contd)
- Other annotation propagation semantics other than
basing on provenance? - Querying the annotations?
20Other results that do not carry over
- Query Minimization
- We can no longer minimize a query and preserve
annotation-equivalence by discarding one subgoal
at a time - Answering Queries using Views
- Some classical results no longer hold
- LMSS95 if a query Q has p subgoals and a query
Q is a complete minimal rewriting of Q using a
set of views V, then Q has at most p subgoals
21Example
- Q A(x) - R(x,z,v), R(x,u,z), R(x,z,t),
R(x,s,z) - Q A(x) - R(x,u,z), R(x,z,t), R(x,s,z)
- Qmin A(x) - R(x,z,t), R(x,s,z)
R 1 2 3 1 3 2 1 4 5 1
4 6
a1
a2
a3
a4