Title: Containment of Conjunctive Queries over Databases with Null Values
1Containment of Conjunctive Queries
overDatabases with Null Values
Carles Farré 1 Werner Nutt 2 Ernest Teniente
1 Toni Urpà 1 1 Universitat Politecnica to
Catalunya 2 Freie Universität Bozen
2Containment of Queries
- Introduced for conjunctive queries
(Chandra/Merlin 1977) - Investigated for a wealth of query types
- Underlying assumption
- databases contain constants
- but no null values
- However, nulls are ubiquitous in SQL databases
3Our Topic
- Containment
- of conjunctive queries
- (without comparisons)
- evaluated under set-semantics
- over databases that may contain null values
- possibly with a test is null
- according to the semantics of SQL for null values
- ? Single block SQL queries with SELECT DISTINCT
4Nulls in SQL
- residence(person,loc)
- Returns an answer if the WHERE clause evaluates
to true - Equalities involving NULL have the logical value
unknown - Conjunctions of conditions are true iff all
conjuncts are true - ? q returns locations ?NULL
5Null-containment Example
-
- Over databases w/ null values q ? q?, but not
q? ? q - Over databases w/o null values q q?
6Query Rules
- Rule notation
- q(x) ? residence(y1,x), residence(y2,x)
- The constant ? does not occur in queries
- A join variable is a variable that
occurs at least twice in the query body
7Query Evaluation
- A database D is a set of ground atoms, which may
contain the value ? - A query
- q(X) ? B
-
- is satisfied by an assignment ? over D if
- ?B ? D
- no join variable of q is mapped to ?
- Over D, the query q returns the set of answers
- q(D) ?X ? satisfies q over D
8Null-containment
- q is null-contained in q? (q ?? q?)
- iff
- q(D) ? q?(D) for all databases D possibly
containing nulls - Clearly,
- q ?? q? implies q ? q?, but not vice versa
-
9Checking Null-containment q ?? q??
- Given q(X) ? B, q?(X) ? B?
- Test Databases
- View B as a database Dq (freeze the variables)
- Let ? be a substitution that replaces some
non-join variables of q with ? - The database ?Dq is a null-version of Dq
- Theorem
- q ?? q? iff ?X ? q?(?Dq) for all
null-versions ?Dq of Dq
10Null Versions Example
- q?(x) ? residence(y1,x), residence(y2,x)
- q(x) ? residence(y,x)
- Test database Dq residence(y, x)
- Null versions of Dq are
- D1 residence(y, x)
- D2 residence( ?, x)
- D3 residence(y, ?)
- D4 residence( ?, ?)
- We have
- q(D3) ?, but q?(D3) ?
- Hence q ?? q?
11Upper Complexity Bound
- Corollary Null-containment of conjunctive
queries is in ?P2 - Questions
- Is the upper bound tight?
- Are there reasonable sufficient criteria
that are less
expensive?
12Homomorphisms
- Consider q(X) ? B, q?(X) ? B?
- A substitution ? for the variables of q? is
- a condition homomorphism if ?B? ? B
- a query homomorphism if in addition ?X X
- a J-homomorphism if in addition ? maps every
join variable of q? - either to a join variable of q
- or to a constant
13Sufficient Condition
- Proposition If there is a J-homomorphism ? from
q? to q, - then q ?? q?
- Proof If ? satisfies q over D, then ?? satisfies
q? over D -
-
When is this also necessary?
14Null-containment of Boolean Queries
- Theorem Let q(), q?() be boolean queries. Then
- q ?? q? ? there is a J-homomorphism from q?
to q - Intuition One test database ??Dq is enough,
- where ?? replaces every non-join variable with ?
- Corollary For boolean queries,
- null-containment is NP-complete
15Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- Note
- q? has three occurrences of p
- q? has three join variables,
- u occurs twice in a p-atom
- We look
for homomorphisms from q? to q
v1 , w2 and u
16Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
?1
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- ?1 is a query homomorphism, but not a
J-homomorphism
17Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
?2
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- ?1 is a query homomorphism, but not a
J-homomorphism - ?2 is a query homomorphism, but not a
J-homomorphism
18Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
?3
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- ?1 is a query homomorphism, but not a
J-homomorphism - ?2 is a query homomorphism, but not a
J-homomorphism - ?3 is a condition homomorphism, but not a query
homomorphism
19Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
?
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- ?1 is a query homomorphism, but not a
J-homomorphism - ?2 is a query homomorphism, but not a
J-homomorphism - ?3 is a condition homomorphism, but not a query
homomorphism - There is no other condition homomorphism from q?
to q - Hence There is no J-homomorphism from q? to q
-
What about containment?
20Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
- Suppose ? satisfies q over a database D and
retrieves ?x - Case 1 ?x ? ?
-
- Then ??1 satisfies the body of q? and ??1x ?x
21Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
?
D
- Suppose ? satisfies q over a database D and
retrieves ?x - Case 2 ?x ? and ?x2 ? ?
-
- Then ??2 satisfies the body of q? and ??2x ?x
22Null-containment Example
q?(x) ? p(x, v1, w1), p(u, v2, w2), p(u, v3,
w3), r(v1, w2)
q(x) ? p(x, y1, z1), p(x2, y2, z2), p(x3, y3,
x3), r(y1, z1), r(y1, z2), r(y2, x3)
?
D
- Suppose ? satisfies q over a database D and
retrieves ?x - Case 3 ?x ? and ?x2 ?
-
- Then ??3 satisfies the body of q? and ??3x ?x2
? ?x
23Observations
- Existence of J-homomorphisms is not necessary
for null-containment - We may have to consider condition homomorphisms,
not only query homomorphisms - Sometimes, several homomorphisms together give
rise to containment
24Complexity
- Theorem Null-containment of conjunctive queries
is - ?P2-complete
- When is it impossible for
such weird things to happen?
25Binary Queries
- Definition A query q(X) ? B is binary, if no
predicate - occurs more than twice in B
- Theorem (Sagiv/Saraiya 92) For binary queries,
existence - of a query homomorphism can be checked in
polynomial - time
- None of the queries in the last
example was binary
26Null-containment of Binary Queries
- Theorem Suppose q ?? q?. If q is binary, then
- there exists a J-homomorphism from q? to q
- Intuition
- Weird things as in the last example can only
happen - if some predicate occurs at least three times
27Complexity for Binary Queries
- Lemma
- Existence of J-homomorphisms can be reduced in
PTIME to the existence of query homomorphisms - The reduction translates binary queries into
binary queries - Theorem For binary queries,
- null containment can be decided in polynomial time
28Test Is Null
- In SQL, we can test whether an attribute value is
null - att is null
- We model this by a new built-in predicate
- isNull(y)
- How difficult is containment
if we add isNull?
29Test Is Not Null
- In a conjunctive query, we can force a variable y
- to have only non-null values
- Transform q() ? , p(x, y, w),
- into
- q() ? , p(x, y, w), p(x?, y, w?),
- where x?, w? are new variables.
30Checking Null-containment with isNull
- Modify the previous approach
- Consider only null versions of test databases
where - ?y ? if isNull(y)
- Consequence Containment is still in ?P2
- Sufficient conditions?
31J-Homomorphisms
- Now
- J-Homomorphism
- respects join variables
- maps isNull to isNull
- Existence of a J-homomorphism continues
to be sufficient for
null-containment
32Null Tests Example
- Also for boolean queries, existence of a
J-homomorphism - is no more necessary for null-containment
- q?() ? p(c,u1), p(u1,u2),
- r(u1,w1), isNull(w1),
- r(u2,w2), r(w3,w2)
- q() ? p(c,v1), p(c,v2), p(v1,v2), p(v2,v3),
- r(v1,z1), isNull(z1),
- r(v2,z2),
- r(w3,w2), r(w3,w2)
33Null Tests Example
- Note
- w1 must be bound to ?
- w2 must be bound to a non-null value
- z1 must be bound to ?
- z3 must be bound to a non-null value
- There is no constraint on z2
q?()
q()
34Null Tests Example
- There is no
- J-homomorphism
q?()
q()
35Null Tests Example
- Suppose, ? satisfies q
- over a database D.
- Case 1 ?z2 ? ?
- ??1 satisfies q?
- Case 2 ?z2 ?
- ? ??2 satisfies q?
q?()
q()
36Null Tests Complexity
- Theorem Null-containment of boolean conjunctive
queries - with null tests is ?P2-complete
- Observation We only used the fact that a
variable - is either bound to ?, or
- to a value ??
- The proof can be adapted to
- x ? c, x lt c
- x c, x ? c
- p(x), ?p(x)
37Summary
- Formalisation of containment in the presence of
SQL nulls - Null-containment is
- ?P2-complete for conjunctive queries with null
tests - ?P2-hard for
- boolean queries with null tests
- queries w/o null tests with distinguished
variables - NP-complete for boolean queries w/o null tests
- polynomial for binary queries w/o null tests
- There is a sufficient PTIME test, which is
necessary for binary queries - Containment is ?P2-hard for conjunctive queries
with negated atoms