On the Inverse rules algorithm - PowerPoint PPT Presentation

About This Presentation

Title:

On the Inverse rules algorithm

Description:

... what if the view contained no ... a view does not export a join variable, and does not contain ... Along the way, find out also which view head variables ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 32

Provided by: off9

Category:

more less

Transcript and Presenter's Notes

Title: On the Inverse rules algorithm

1
On the Inverse rules algorithm

It is guaranteed to compute the certain answers
But, what about its efficiency?
As presented, it computes tuples using views that
cannot contribute to the rewriting, and then
discards these tuples
We show examples, and then how to address the
problems

2
Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) // find grandchildren
The algorithm inverts the view par(C,
f(C, G)) , par ((f(C,G), G) - v(C,G) Given n
tuples in the view, it produces 2n tuples, then
joins, the discards the results that contain
f(-,-) The bucket algorithm will spend more time
on rewriting, find Q(X, Y) - v(X, Y)
And then output the n results
3
Example (university db) Views v1(s, c, q,
t) - registered(s, c, q), course(c, t),
cgt500, qgta98 v2(s, p, c, q) -
registered(s, c, q), teaches(p, c, q) v3(s,
c) - registered(s, c, q), qlta94
v4(p, c, t, q) - registered(s, c, q),
teaches(p, c, q), course(c, t), qlta97 Query
q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95 Inverting
v3 registered(s, c, f(s,c)) - v3(s, c)
This may produce any number of facts for
registered, but for this query none can be used
why?
4

v3(s, c) - registered(s, c, q),
qlta94
q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95
How should the constraint on q in v3 be
represented?
Could export it by f(s, c) lta94 then notice
conflict with f(s, c) gt a95 in query (how is q
in the query transformed to f(s,c)?)
But, what if the view contained no constraint?
The view must export variables constrained in the
query
The query has a join on q with teaches teaches
facts are derived only from other views, so q
will be exported as a different function symbol,
or as q (which of these here?)
? a join will fail (cannot join f1(-,-) with
f2(-,-) or a regular variable)
? The view must export join variables of the
query

5
The factors that determine usability of a view
are the same as in the bucket algorithm, but the
inverse rules algorithm tries to use all views
anyway Solution compose query with inverse
rules, to obtain a new query that uses directly
the views Composition Consider the heads of
inverse rules as a db collection of facts Look
for valuations mapping of query variables that
map query atoms to this db Then repalce query
goals by views
6
Example A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q
q(X, Y) - par(X, Z), par(Z, Y) // find
grandchildren The algorithm inverts the
view par(C, f(C, G)) , par ((f(C,G), G)
- v(C,G) Two candidate valuation mappings X ?
C, Z ? f(C,G), Y ? G ? q(C, G) - v(C,
G), v(C, G) X ? f(C, G), Z ? ,G, Y ? f(C, G) ?
(assuming we add CG)
q(f(G, G), f(G,G)) - v(G,
G), v(G, G) 2nd is discarded no function
symbols in result Minimization of 1st gives q(C,
G) - v(C, G), same as bucket
db
7

q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95
registered(s, c, f(s, c)), f(s, c)lta94 -
v3(s, c)
Any valuation that uses this fact must map q ?
f(s, c)
The constraint f(s, c) lt a94 conflicts with
f(s,c)gta95,
but what if there is no constraint to
export?
The mapping q? f(s, c) cannot be used to map
teaches to any fact derived from other views
? v3 cannot be used

A mapping will fail to define a valuation if
a view does not export a join variable, and does
not contain the join (why?)
The view does not export a variable that is
constrained in the query (cannot check the
constraint in the db)
Thus, the results (for a CQ query, possibly with
constraints) will be the same as for bucket
(assuming it is correct complete)
The amount of work invested will probably be
similar
Composition can be performed also for Datalog
queries, but weeding out useless mappings is more
difficult

9
The MiniCon algorithm --- the final one?

Motivation
Preliminaries
The MiniCon algorithm

10
Motivation

Previous algorithms bucket, inverse
rules, may be quite expensive to use, especially
for systems with many views.
The bucket algorithm has a narrow peephole in 1st
stage each bucket is for a single atom
? global constraints are treated only in 2nd
stage
? Many useless combinations may be examined
The inverse rules algorithm improved by
composition, seems to perform similar work
The motivation find an algorithm that will do
more work in preliminary filtering, and will
scale up to hundreds of views

11
Preliminaries

The idea
Once a view is put in a bucket of a query atom,
switch to considering join variables and find
which other atoms are necessarily covered by the
view
Along the way, find out also which view head
variables need to be equated
Given coverage by views, combine views with
disjoint covers
Expected gain
more filtering in the 1st stage,
better representation of information
? A smaller number of combinations, reduced
number of containment checks in the 2nd stage

12
Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) Bucket one view in
each bucket par(X, Z) v(X,G)
par(Z, Y) v(P, Y) When the two view atoms
are combined, a containment check discovers that
GY ? containment, redundancy of 2nd atom
Alternative given par(X, Z) v(X,G), since Z
(join var) occurs in 2nd atom of query, add
par(Z, Y) to coverage of v(X,G), with GY In 2nd
stage, just use v(X, Y)
13

Assumptions, terminology
CQ queries and views, for now no constants /
constraints in query/views
View definitions use variables different from
those in query or other views (disjoint sets of
variables)
b(Q) body atoms of Q, b(V) body atoms of view
V
A mapping from vars(Q) to a vars(V) is
interesting only if it maps a non-empty subset
of b(Q) to b(V)
Considered mappings always map Q head vars to V
head vars head var preservation (hvp)
If h maps x in vars(Q) to an existential var in
some V, then all atoms of b(Q) that contain x
must be mapped to same V
join variable condition --- (jvc)

Given Q(X), assume Q is a rewriting in terms of
views
Q q(X) - v1(X1), , vn(Xn)
(some vi, vj may be occurrences of
same view v)
Exists containment mapping h from Q to
exp(Q) (satisfies hvp)
Let
Gi be the set of atoms of b(Q) mapped to
b(exp(vi))
h/i h restricted to vars(Gi)
Then
And Gi satisfies (jvc)
if h/i maps x of vars(Gi) to existential
variable of vi,
then every atom g in b(Q) that contains
this atom is in Gi

15
The occurrence of vi in Q may have some head
variables equated Example the original head
might be vi(A, B, C) the head in Q vi(X,
X, Z) These equalities are given by a unique
least set of equality constraints Ei (v/E --
the view v, with head variables equated as
specified by E) Summary (so far) the
containment mapping can be decomposed into
disjoint components (vi, Ei, h/i , Gi) All we
need to do is find such components, then combine
them What is the condition for successful
combination? Does a combination (s.t.
) ever fail ?

16

To find such components, we must use the given
view definitions (variables different from those
of Q or exp(Q)).
Answer a component and its mapping can be
expressed as
Here
hi is a mapping from Q to the given view
definition for vi
Ei the least set of equalities that make
hi a good mapping
hi is a variable renaming
Ei and hi depend only on Q and the definition of
vi
We can find components mappings from Q to the
view defs, then combine rename, possibly
equating more head vars

h/i
Gi
exp(vi(Xi))
hi
hi
vi/Ei
17

One more step
A component (vi, Ei, hi , Gi) may be further
decomposed into smaller components (vi, Ei1, hi1
, Gi1), (vi, Ei2, hi2 , Gi2) provided
each of Gi1, Gi2 satisfies (jvc), and they are
disjoint
Each of Ei1, Ei2 is a subset of Ei, least sets
for the mappings hi1, hi2 to be ok
When these are combined, Ei1 union Ei2 is
augmented with the remaining equalities of Ei
Minimal such components
Easier to find
Can be re-used for different combinations.

What is a minimal component?
C (vi, Ei, hi, Gi) is minimal if
hi satisfies (hvp) (jvc) (assuming the
equalities in Ei)
There is no component C1 whose last three
components are contained in Cs last three
components (at least one is proper containment)
A component minicon (mini containment)
description -- MCD
The algorithm constructs and combines minimal MCDs

19
The MiniCon Algorithm

Minimal MCD Construction Algorithm
For each g in b(Q), each k in each b(vi)
Let E(g,k) be the least set of equalities s.t.
a mapping h(g,k) from g to k that satisfies (hvp)
exists
// E(g,k)
and h(g,k), if they exist,
// are
uniquely determined by g, k
If E(g,k) and h(g,k) exist
find all minimal MCDs that extend them
(vi, Ei, hi, Gi) extends if
Ei contains E(g,k), hi contains
h(g,k), Gi contains g
For the final set of MCDs remove duplicates

How do we find minimal MCDs that extend a given
mapping?
I. Extension to one more query atom, one view
atom
extend (vi, E, h, g, k) // E equalities on head
vars of vi
// h
vars(Q) ? vars(vi), partial, hvp with E
// g in
b(Q), k in b(vi)
try to extend h to map g to k, with hvp, by
adding equalities to E
return fail, or the (uniquely determined)
E,h
(The first step in alg. of previous page is this
one, given empty E and h)

How do we find minimal MCDs that extend a given
mapping?
II. Extend repeatedly, as long as needed and
successful
Given vi, g, k , E(g,k) and h(g,k)
Let C (vi, E(g,k), h(g,k), g, MC

//C initial component, (jvc) possibly not
satisfied
While C not empty
remove some c (vi, E, h, G) from C
if (jvc) satisifed put in MC
if not, exists x in vars(Q) s.t. h(x) is
existential, g that contains x, g not in G
for each k in b(vi)
if extend(vi, E, h, g, k)
succeeds, put extension in C
Remove duplicates from MC

Example
A db parenthood relation par(c, p)
A view v(C, G) - par(C, P), par(P, G) //
only grandchildren
A query Q q(X, Y) - par(X, Z), par(Z, Y)
MCDs
1st query atom, 1st view atom h(1,1) X?C, Z?
P, E(1.1)
need to extend to par(Z, Y), can only map to
2nd view atom
MCD (v, E, hX?C, Z?P, Y?G, b(Q))
1st query atom, 2nd view atom no mapping
The only MCD is the above

23
Comment In the paper, if (vi, Ei1, hi1,
Gi1) and (vi, Ei2, hi2, Gi2) are both minimal
extensions, and Gi1 is contained in Gi2, then
the 2nd is thrown away (another minimization) I
do not know how to explain this optimization,
or prove that with it the algorithm is still
complete
24
2nd phase MCD combination, and variable renaming
A set of MCDs (vi, Ei, hi, Gi) is a candidate
if For each candidate set Rename variables
for each view variable y If hi(x) y (y a
view variable), rename y to x else rename y
to a fresh distinct variable Note if x in
domain of both hi, hj , then hi(x), hj(x) are
head variables of vi, vj (by def of MCD), ?
renaming makes them equal
25
Example (contd) A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q q(X,
Y) - par(X, Z), par(Z, Y) MCD (v, E,
hX?C, Z?P, Y?G, b(Q)) Rename in v C to X, G to
Y Rewriting q(X, Y) - v(X, Y)
26

Example
A db parenthood relation par(c, p)
A view v(C, G) - par(C, P), par(P,
G) // only grandchildren
A query Q q(X, X) - par(X, Z), par(Z, X)
// I am my own grandpa
MCDs
1st query atom, 1st view atom h(1,1) X?C, Z?
P, E(1.1)
need to extend to par(Z, X), can only map to
2nd view atom
MCD (v, CG, X?C, Z?P, b(Q))
1st query atom, 2nd view atom no mapping
The only MCD is the above

Example
A db parenthood relation par(c, p)
A view v(C, P) - par(C, P), par(P,
G)
// parents where grandparents exist
A query Q q(X, Y) - par(X, Z), par(Z, Y)
MCDs
h(1,1) X? C, Z? P, E(1.1)
? MCD A1 ( v(C, P), , h(1,1),
par(X,Z) )
h(1, 2) X? P, Z ? G, E(1,2), fails (why?)
h(2, 1) Z? C, Y ? P, E(2,1)
? MCD A2 ( v(C, P), , h(2,1), ,
par(Z,Y) )
h(2, 2) Z? P, Y ? G, fails (why?)

28
A view v(C, P) - par(C, P), par(P,
G) A query Q q(X, Y) - par(X, Z),
par(Z, Y) MCDs A1 ( v(C, P), ,
h(1,1), par(X,Z) ) A2 ( v(C, P), ,
h(2,1), par(Z,Y) ) Rewritings (rename views
to have distinct vars) A1A2 X? C1, Z? P1, Z?
C2, Y ? P2 add P1 (in 1st v) C2 (in 2nd v)
rewriting v(C1,P1), v(P1, P2) renaming
v(X, Z), v(Z, Y) a correct rewriting
29

When Q or views contain constants
MCD formation
a of Q must be mapped to a head variable of vi,
or itself
If x is in headvar(Q), it can be mapped to
headvar(vi) or to a
Whenever x is mapped to a, hi records this fact
MCD combination
If A1, A2 are defined on x, then allow also
Both map x to a
One maps x to a, the other to head var of view
In either case, rename x to a in rewriting

When Q or views contain comparisons
If views contain comparisons, no change to
algorithm (it finds contained
rewritings anyway)
If Q contains comparisons, then there may be no
Datalog program that computes the certain answers
(can express x ! y)
But, we can expect that extending the algorithm
for comparisons will be a good heuristics, and
will find certain answers in many cases

When Q or views contain comparisons
C(Q) constraints of Q (closed under inference)
MCD formation (vi, Ei, hi, Gi) (extend the
join variable condition)
If hi(x) is existential of vi, and c(x, y) in
C(Q), then hi(y) is defined
C(vi) must imply all constraints in hi(C(Q))
that involve at least one existential of vi
MCD combination
Add all constraints of C(Q) not covered by those
of the views