Title: On the Inverse rules algorithm
1 On the Inverse rules algorithm
- It is guaranteed to compute the certain answers
- But, what about its efficiency?
- As presented, it computes tuples using views that
cannot contribute to the rewriting, and then
discards these tuples - We show examples, and then how to address the
problems
2Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) // find grandchildren
The algorithm inverts the view par(C,
f(C, G)) , par ((f(C,G), G) - v(C,G) Given n
tuples in the view, it produces 2n tuples, then
joins, the discards the results that contain
f(-,-) The bucket algorithm will spend more time
on rewriting, find Q(X, Y) - v(X, Y)
And then output the n results
3Example (university db) Views v1(s, c, q,
t) - registered(s, c, q), course(c, t),
cgt500, qgta98 v2(s, p, c, q) -
registered(s, c, q), teaches(p, c, q) v3(s,
c) - registered(s, c, q), qlta94
v4(p, c, t, q) - registered(s, c, q),
teaches(p, c, q), course(c, t), qlta97 Query
q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95 Inverting
v3 registered(s, c, f(s,c)) - v3(s, c)
This may produce any number of facts for
registered, but for this query none can be used
why?
4- v3(s, c) - registered(s, c, q),
qlta94 - q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95 - How should the constraint on q in v3 be
represented? - Could export it by f(s, c) lta94 then notice
conflict with f(s, c) gt a95 in query (how is q
in the query transformed to f(s,c)?) - But, what if the view contained no constraint?
- The view must export variables constrained in the
query - The query has a join on q with teaches teaches
facts are derived only from other views, so q
will be exported as a different function symbol,
or as q (which of these here?) - ? a join will fail (cannot join f1(-,-) with
f2(-,-) or a regular variable) - ? The view must export join variables of the
query
5The factors that determine usability of a view
are the same as in the bucket algorithm, but the
inverse rules algorithm tries to use all views
anyway Solution compose query with inverse
rules, to obtain a new query that uses directly
the views Composition Consider the heads of
inverse rules as a db collection of facts Look
for valuations mapping of query variables that
map query atoms to this db Then repalce query
goals by views
6Example A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q
q(X, Y) - par(X, Z), par(Z, Y) // find
grandchildren The algorithm inverts the
view par(C, f(C, G)) , par ((f(C,G), G)
- v(C,G) Two candidate valuation mappings X ?
C, Z ? f(C,G), Y ? G ? q(C, G) - v(C,
G), v(C, G) X ? f(C, G), Z ? ,G, Y ? f(C, G) ?
(assuming we add CG)
q(f(G, G), f(G,G)) - v(G,
G), v(G, G) 2nd is discarded no function
symbols in result Minimization of 1st gives q(C,
G) - v(C, G), same as bucket
db
7- q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95 - registered(s, c, f(s, c)), f(s, c)lta94 -
v3(s, c) - Any valuation that uses this fact must map q ?
f(s, c) - The constraint f(s, c) lt a94 conflicts with
f(s,c)gta95, - but what if there is no constraint to
export? - The mapping q? f(s, c) cannot be used to map
teaches to any fact derived from other views - ? v3 cannot be used
8- A mapping will fail to define a valuation if
- a view does not export a join variable, and does
not contain the join (why?) - The view does not export a variable that is
constrained in the query (cannot check the
constraint in the db) - Thus, the results (for a CQ query, possibly with
constraints) will be the same as for bucket
(assuming it is correct complete) - The amount of work invested will probably be
similar - Composition can be performed also for Datalog
queries, but weeding out useless mappings is more
difficult
9 The MiniCon algorithm --- the final one?
- Motivation
- Preliminaries
- The MiniCon algorithm
10Motivation
- Previous algorithms bucket, inverse
rules, may be quite expensive to use, especially
for systems with many views. - The bucket algorithm has a narrow peephole in 1st
stage each bucket is for a single atom - ? global constraints are treated only in 2nd
stage - ? Many useless combinations may be examined
- The inverse rules algorithm improved by
composition, seems to perform similar work - The motivation find an algorithm that will do
more work in preliminary filtering, and will
scale up to hundreds of views
11 Preliminaries
- The idea
- Once a view is put in a bucket of a query atom,
switch to considering join variables and find
which other atoms are necessarily covered by the
view - Along the way, find out also which view head
variables need to be equated - Given coverage by views, combine views with
disjoint covers - Expected gain
- more filtering in the 1st stage,
- better representation of information
- ? A smaller number of combinations, reduced
number of containment checks in the 2nd stage
12Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) Bucket one view in
each bucket par(X, Z) v(X,G)
par(Z, Y) v(P, Y) When the two view atoms
are combined, a containment check discovers that
GY ? containment, redundancy of 2nd atom
Alternative given par(X, Z) v(X,G), since Z
(join var) occurs in 2nd atom of query, add
par(Z, Y) to coverage of v(X,G), with GY In 2nd
stage, just use v(X, Y)
13- Assumptions, terminology
- CQ queries and views, for now no constants /
constraints in query/views - View definitions use variables different from
those in query or other views (disjoint sets of
variables) - b(Q) body atoms of Q, b(V) body atoms of view
V - A mapping from vars(Q) to a vars(V) is
interesting only if it maps a non-empty subset
of b(Q) to b(V) - Considered mappings always map Q head vars to V
head vars head var preservation (hvp) - If h maps x in vars(Q) to an existential var in
some V, then all atoms of b(Q) that contain x
must be mapped to same V - join variable condition --- (jvc)
14- Given Q(X), assume Q is a rewriting in terms of
views - Q q(X) - v1(X1), , vn(Xn)
- (some vi, vj may be occurrences of
same view v) - Exists containment mapping h from Q to
exp(Q) (satisfies hvp) - Let
- Gi be the set of atoms of b(Q) mapped to
b(exp(vi)) - h/i h restricted to vars(Gi)
- Then
- And Gi satisfies (jvc)
- if h/i maps x of vars(Gi) to existential
variable of vi, - then every atom g in b(Q) that contains
this atom is in Gi
15The occurrence of vi in Q may have some head
variables equated Example the original head
might be vi(A, B, C) the head in Q vi(X,
X, Z) These equalities are given by a unique
least set of equality constraints Ei (v/E --
the view v, with head variables equated as
specified by E) Summary (so far) the
containment mapping can be decomposed into
disjoint components (vi, Ei, h/i , Gi) All we
need to do is find such components, then combine
them What is the condition for successful
combination? Does a combination (s.t.
) ever fail ?
16- To find such components, we must use the given
view definitions (variables different from those
of Q or exp(Q)). - Answer a component and its mapping can be
expressed as - Here
- hi is a mapping from Q to the given view
definition for vi - Ei the least set of equalities that make
hi a good mapping - hi is a variable renaming
- Ei and hi depend only on Q and the definition of
vi - We can find components mappings from Q to the
view defs, then combine rename, possibly
equating more head vars
h/i
Gi
exp(vi(Xi))
hi
hi
vi/Ei
17- One more step
- A component (vi, Ei, hi , Gi) may be further
decomposed into smaller components (vi, Ei1, hi1
, Gi1), (vi, Ei2, hi2 , Gi2) provided - each of Gi1, Gi2 satisfies (jvc), and they are
disjoint - Each of Ei1, Ei2 is a subset of Ei, least sets
for the mappings hi1, hi2 to be ok - When these are combined, Ei1 union Ei2 is
augmented with the remaining equalities of Ei - Minimal such components
- Easier to find
- Can be re-used for different combinations.
-
18- What is a minimal component?
- C (vi, Ei, hi, Gi) is minimal if
- hi satisfies (hvp) (jvc) (assuming the
equalities in Ei) - There is no component C1 whose last three
components are contained in Cs last three
components (at least one is proper containment) - A component minicon (mini containment)
description -- MCD - The algorithm constructs and combines minimal MCDs
19The MiniCon Algorithm
- Minimal MCD Construction Algorithm
- For each g in b(Q), each k in each b(vi)
- Let E(g,k) be the least set of equalities s.t.
a mapping h(g,k) from g to k that satisfies (hvp)
exists - // E(g,k)
and h(g,k), if they exist, - // are
uniquely determined by g, k - If E(g,k) and h(g,k) exist
- find all minimal MCDs that extend them
- (vi, Ei, hi, Gi) extends if
- Ei contains E(g,k), hi contains
h(g,k), Gi contains g - For the final set of MCDs remove duplicates
20- How do we find minimal MCDs that extend a given
mapping? - I. Extension to one more query atom, one view
atom - extend (vi, E, h, g, k) // E equalities on head
vars of vi - // h
vars(Q) ? vars(vi), partial, hvp with E - // g in
b(Q), k in b(vi) - try to extend h to map g to k, with hvp, by
adding equalities to E - return fail, or the (uniquely determined)
E,h - (The first step in alg. of previous page is this
one, given empty E and h)
21- How do we find minimal MCDs that extend a given
mapping? - II. Extend repeatedly, as long as needed and
successful - Given vi, g, k , E(g,k) and h(g,k)
- Let C (vi, E(g,k), h(g,k), g, MC
//C initial component, (jvc) possibly not
satisfied - While C not empty
- remove some c (vi, E, h, G) from C
- if (jvc) satisifed put in MC
- if not, exists x in vars(Q) s.t. h(x) is
existential, g that contains x, g not in G - for each k in b(vi)
- if extend(vi, E, h, g, k)
succeeds, put extension in C - Remove duplicates from MC
22- Example
- A db parenthood relation par(c, p)
- A view v(C, G) - par(C, P), par(P, G) //
only grandchildren - A query Q q(X, Y) - par(X, Z), par(Z, Y)
- MCDs
- 1st query atom, 1st view atom h(1,1) X?C, Z?
P, E(1.1) - need to extend to par(Z, Y), can only map to
2nd view atom - MCD (v, E, hX?C, Z?P, Y?G, b(Q))
- 1st query atom, 2nd view atom no mapping
-
- The only MCD is the above
23Comment In the paper, if (vi, Ei1, hi1,
Gi1) and (vi, Ei2, hi2, Gi2) are both minimal
extensions, and Gi1 is contained in Gi2, then
the 2nd is thrown away (another minimization) I
do not know how to explain this optimization,
or prove that with it the algorithm is still
complete
242nd phase MCD combination, and variable renaming
A set of MCDs (vi, Ei, hi, Gi) is a candidate
if For each candidate set Rename variables
for each view variable y If hi(x) y (y a
view variable), rename y to x else rename y
to a fresh distinct variable Note if x in
domain of both hi, hj , then hi(x), hj(x) are
head variables of vi, vj (by def of MCD), ?
renaming makes them equal
25Example (contd) A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q q(X,
Y) - par(X, Z), par(Z, Y) MCD (v, E,
hX?C, Z?P, Y?G, b(Q)) Rename in v C to X, G to
Y Rewriting q(X, Y) - v(X, Y)
26- Example
- A db parenthood relation par(c, p)
- A view v(C, G) - par(C, P), par(P,
G) // only grandchildren - A query Q q(X, X) - par(X, Z), par(Z, X)
// I am my own grandpa - MCDs
- 1st query atom, 1st view atom h(1,1) X?C, Z?
P, E(1.1) - need to extend to par(Z, X), can only map to
2nd view atom - MCD (v, CG, X?C, Z?P, b(Q))
- 1st query atom, 2nd view atom no mapping
-
- The only MCD is the above
27- Example
- A db parenthood relation par(c, p)
- A view v(C, P) - par(C, P), par(P,
G) -
// parents where grandparents exist - A query Q q(X, Y) - par(X, Z), par(Z, Y)
- MCDs
- h(1,1) X? C, Z? P, E(1.1)
- ? MCD A1 ( v(C, P), , h(1,1),
par(X,Z) ) - h(1, 2) X? P, Z ? G, E(1,2), fails (why?)
- h(2, 1) Z? C, Y ? P, E(2,1)
- ? MCD A2 ( v(C, P), , h(2,1), ,
par(Z,Y) ) - h(2, 2) Z? P, Y ? G, fails (why?)
-
28A view v(C, P) - par(C, P), par(P,
G) A query Q q(X, Y) - par(X, Z),
par(Z, Y) MCDs A1 ( v(C, P), ,
h(1,1), par(X,Z) ) A2 ( v(C, P), ,
h(2,1), par(Z,Y) ) Rewritings (rename views
to have distinct vars) A1A2 X? C1, Z? P1, Z?
C2, Y ? P2 add P1 (in 1st v) C2 (in 2nd v)
rewriting v(C1,P1), v(P1, P2) renaming
v(X, Z), v(Z, Y) a correct rewriting
29- When Q or views contain constants
- MCD formation
- a of Q must be mapped to a head variable of vi,
or itself - If x is in headvar(Q), it can be mapped to
headvar(vi) or to a - Whenever x is mapped to a, hi records this fact
- MCD combination
- If A1, A2 are defined on x, then allow also
- Both map x to a
- One maps x to a, the other to head var of view
- In either case, rename x to a in rewriting
30- When Q or views contain comparisons
- If views contain comparisons, no change to
algorithm (it finds contained
rewritings anyway) - If Q contains comparisons, then there may be no
Datalog program that computes the certain answers
(can express x ! y) - But, we can expect that extending the algorithm
for comparisons will be a good heuristics, and
will find certain answers in many cases
31- When Q or views contain comparisons
- C(Q) constraints of Q (closed under inference)
- MCD formation (vi, Ei, hi, Gi) (extend the
join variable condition) - If hi(x) is existential of vi, and c(x, y) in
C(Q), then hi(y) is defined - C(vi) must imply all constraints in hi(C(Q))
that involve at least one existential of vi - MCD combination
- Add all constraints of C(Q) not covered by those
of the views