Title: Query languages II: equivalence
1 Query languages II equivalence
containment(Motivation rewriting queries using
views)
- conjunctive queries CQs
- Extensions of CQs
-
2Conjunctive queries equivalence containment
- For CQ q1, q2, with the same head predicate
- Decision problems
- The two problems are equivalent solved one,
solved the other
3Solution for containment ? for equivalence
Solution for equivalence ? for
containment (here, the ri and sj are db
predicates, not necessarily different)
4- Characterizations for containment assume q1,
q2 are given - A mapping h from the variables of q2 to
variables/constants (extended naturally to
constants and atoms) is a
homomorphism from q2 to q1 if - Maps head(q2) to head(q1)
- (assuming same heads ?identity on
head vars) - Maps each atom of q2 to an atom of q1
- If there are constrains on the side, Ci in qi,
then h(C2) is implied by C1 - Notation
5Thm The following are equivalent for CQs w/o
built-in preds Proof (ii) ? (i) is easy (and
holds even with b.i. preds) Every valuation from
q1 into a db D can be composed with h to a
valuation from q2. Hence, every answer of q1 on
D is also an answer of q2 on D
h
v
D
6- For (i) ? (ii)
- The body of a CQ (w/o b.is) can be viewed as a
db - consider each variable as a constant, different
from all constants in the CQ and the other
variables - or, replace each variable x by a distinct
constant cx - Denote this db by db(q)
- Obviously, q(db(q)) contains the head of q (or
its image) - Example
- Q q(d) - movies(t,d,a),
directory(Plaza, t, 1930) - db(Q) movies(ct,cd,ca),
directory(Plaza, ct,1930) - Obviously, applying Q to this db, one obtains
q(cd) (use the identity
valuation) -
7- ? (ii) (q2 contains q1 ? homomorphism from q2 to
q1) - Clearly, q1(db(q1)) contains head(q1)
- Since , q2(db(q1)) contains
head(q1) - The valuation from q2 to db(q1) that yields this
answer is a homomorphism - Example
- q1 p(d) - movies(t,d,Jane),
directory(Plaza, t, 1930), location(Plaza
, a, 01-58776655) - q2 p(z) - movies(t,z,a),
directory(Plaza, t, 1930) - Obviously, q1 is contained in q2, with h t? t,
z?d, a?Jane, - that maps the two atoms of body(q2) to the first
two of body(q1), and head(q2) to head(q1)
8- Because of this characterization, such a
homomorphism is also called
a
containment mapping from q2 to q1 - Intuition q1 is contained in q2 iff
- It has same or more atoms
- It may have some constants where q2 has variables
-
9Another characterization For a rule p(..)
- r1(..), , rk(..) a model is a set of facts
over p, r1, .., rk that satisfies the rule as a
logical formula (assuming all variables are
universally quantified) Thm the following are
equivalent The important useful
characterization
homomorphism, i.e., containment mapping
10- Algorithm and complexity
- To decide if q1 is contained in q2, search for a
containment mapping from the variables of q2 to
the variables and constants of q1
easy fast in many
cases, exponential in worst case - The containment is in NP
- given a mapping on the variables of q2 , it
is easy to check it is a homomorphism to q1
11- It is NP-hard
- given a graph G, it is 3-colorable iff
there is a homomorphism from G (represented as an
edge relation) to the 3-clique - one can represent G as the body of q2 (using
distinct variables for distinct nodes), the
3-clique as the body of q1 - for both, the head can be q( )
- Hence, containment equivalence are NP-complete
(even for queries with no head variables) - Note this is expression complexity, not data
complexity (here there is no db
actually) - (when such a query is applied to a db, it
returns either (), or )
12- Minimization of CQs
- For q, define a minimal equivalent query as any
equivalent q with a minimal number of body atoms
- Thm the minimal equivalent query of q
- is unique up to isomorphism,
- and can be obtained by removing some atoms from
body(q) - Proof
13Thus, for every CQ Q, there is a subset of the
body that gives a minimal equivalent query Called
a core of Q It is not necessarily unique,
(different subsets may yield cores), but all
cores are isomorphic
14Containment equivalence for extensions of CQs
- Extension to UCQs let
- Thm
-
- Proof ? is obvious
- ? if q1 is contained in q2, then each ri is
contained in q2 - q2(db(ri)) contains p(x)
- for some sj, sj(db(ri)) contains p(x)
- ? sj contains ri
q1 r1 p(x) - body1,1 rk
p(x)- body1,k
q2 s1 p(x) - body2,1 sm p(x)-
body2,m
15Containment algorithm For each ri, loop over
sj, and search for a containment mapping from sj
to ri Still exponential in size (of both
queries) Complexity The containment problem is
now Explanation A relation R(..) is ptime if
membership can be verified in ptime
16For a UCQ Q we can also consider the canonical
db of Q, denoted db(Q), obtained by taking the
bodies of all the rules together as a db (with
different existential variables in different
rules ) Here also Thm Q1 is contained in
Q2 iff Q2(db(Q1)) contains head(Q1) (this also
gives an algorithm for checking containment,
which boils down to finding containment
mappings)
17- Another extension of CQs b.i. preds in the
body - Example
- Q1 p(x, y) - q(x, y), r(u, v) , u lt v
- Q2 p(x, y) - q(x, y) , r(u,v), r(v, u)
- Is Q2 contained in/equivalent to Q1?
- Q2 is equivalent to the union of
- Q2,1 p(x, y) - q(x, y) , r(u,v), r(v, u),
ult v - Q2,2 p(x, y) - q(x, y) , r(u,v), r(v, u),
vlt u - Clearly, Q2,1 and Q2,2 are both contained in Q1
- This can be generalized to an algorithm that
reduces containment to that of UCQs (omitted)
18Containment of a UCQ Q and a (recursive)
Datalog program P Still decidable, but double
exponential time (upper lower bound) Here
also Thm P contains Q iff P(db(Q)) contains
head Q this gives an algorithm for checking
containment apply P to db(Q), see if you
obtain head(Q) (do you see exponentials in this
algorithm?) Containment of Datalog programs
undecidable