On the Inverse rules algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

On the Inverse rules algorithm

Description:

... what if the view contained no ... a view does not export a join variable, and does not contain ... Along the way, find out also which view head variables ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 32
Provided by: off9
Category:
Tags: algorithm | inverse | rules | the | view

less

Transcript and Presenter's Notes

Title: On the Inverse rules algorithm


1
On the Inverse rules algorithm
  • It is guaranteed to compute the certain answers
  • But, what about its efficiency?
  • As presented, it computes tuples using views that
    cannot contribute to the rewriting, and then
    discards these tuples
  • We show examples, and then how to address the
    problems

2
Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) // find grandchildren
The algorithm inverts the view par(C,
f(C, G)) , par ((f(C,G), G) - v(C,G) Given n
tuples in the view, it produces 2n tuples, then
joins, the discards the results that contain
f(-,-) The bucket algorithm will spend more time
on rewriting, find Q(X, Y) - v(X, Y)
And then output the n results
3
Example (university db) Views v1(s, c, q,
t) - registered(s, c, q), course(c, t),
cgt500, qgta98 v2(s, p, c, q) -
registered(s, c, q), teaches(p, c, q) v3(s,
c) - registered(s, c, q), qlta94
v4(p, c, t, q) - registered(s, c, q),
teaches(p, c, q), course(c, t), qlta97 Query
q(s, p, c) - registered(s, c, q), teaches(p,
c, q), course(c, t), cgt300, qgta95 Inverting
v3 registered(s, c, f(s,c)) - v3(s, c)
This may produce any number of facts for
registered, but for this query none can be used
why?
4
  • v3(s, c) - registered(s, c, q),
    qlta94
  • q(s, p, c) - registered(s, c, q), teaches(p,
    c, q), course(c, t), cgt300, qgta95
  • How should the constraint on q in v3 be
    represented?
  • Could export it by f(s, c) lta94 then notice
    conflict with f(s, c) gt a95 in query (how is q
    in the query transformed to f(s,c)?)
  • But, what if the view contained no constraint?
  • The view must export variables constrained in the
    query
  • The query has a join on q with teaches teaches
    facts are derived only from other views, so q
    will be exported as a different function symbol,
    or as q (which of these here?)
  • ? a join will fail (cannot join f1(-,-) with
    f2(-,-) or a regular variable)
  • ? The view must export join variables of the
    query

5
The factors that determine usability of a view
are the same as in the bucket algorithm, but the
inverse rules algorithm tries to use all views
anyway Solution compose query with inverse
rules, to obtain a new query that uses directly
the views Composition Consider the heads of
inverse rules as a db collection of facts Look
for valuations mapping of query variables that
map query atoms to this db Then repalce query
goals by views
6
Example A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q
q(X, Y) - par(X, Z), par(Z, Y) // find
grandchildren The algorithm inverts the
view par(C, f(C, G)) , par ((f(C,G), G)
- v(C,G) Two candidate valuation mappings X ?
C, Z ? f(C,G), Y ? G ? q(C, G) - v(C,
G), v(C, G) X ? f(C, G), Z ? ,G, Y ? f(C, G) ?
(assuming we add CG)
q(f(G, G), f(G,G)) - v(G,
G), v(G, G) 2nd is discarded no function
symbols in result Minimization of 1st gives q(C,
G) - v(C, G), same as bucket
db
7
  • q(s, p, c) - registered(s, c, q), teaches(p,
    c, q), course(c, t), cgt300, qgta95
  • registered(s, c, f(s, c)), f(s, c)lta94 -
    v3(s, c)
  • Any valuation that uses this fact must map q ?
    f(s, c)
  • The constraint f(s, c) lt a94 conflicts with
    f(s,c)gta95,
  • but what if there is no constraint to
    export?
  • The mapping q? f(s, c) cannot be used to map
    teaches to any fact derived from other views
  • ? v3 cannot be used

8
  • A mapping will fail to define a valuation if
  • a view does not export a join variable, and does
    not contain the join (why?)
  • The view does not export a variable that is
    constrained in the query (cannot check the
    constraint in the db)
  • Thus, the results (for a CQ query, possibly with
    constraints) will be the same as for bucket
    (assuming it is correct complete)
  • The amount of work invested will probably be
    similar
  • Composition can be performed also for Datalog
    queries, but weeding out useless mappings is more
    difficult

9
The MiniCon algorithm --- the final one?
  • Motivation
  • Preliminaries
  • The MiniCon algorithm

10
Motivation
  • Previous algorithms bucket, inverse
    rules, may be quite expensive to use, especially
    for systems with many views.
  • The bucket algorithm has a narrow peephole in 1st
    stage each bucket is for a single atom
  • ? global constraints are treated only in 2nd
    stage
  • ? Many useless combinations may be examined
  • The inverse rules algorithm improved by
    composition, seems to perform similar work
  • The motivation find an algorithm that will do
    more work in preliminary filtering, and will
    scale up to hundreds of views

11
Preliminaries
  • The idea
  • Once a view is put in a bucket of a query atom,
    switch to considering join variables and find
    which other atoms are necessarily covered by the
    view
  • Along the way, find out also which view head
    variables need to be equated
  • Given coverage by views, combine views with
    disjoint covers
  • Expected gain
  • more filtering in the 1st stage,
  • better representation of information
  • ? A smaller number of combinations, reduced
    number of containment checks in the 2nd stage

12
Example A db parenthood relation par(c,
p) A view v(C, G) - par(C, P), par(P, G) //
only grandchildren A query Q q(X, Y) -
par(X, Z), par(Z, Y) Bucket one view in
each bucket par(X, Z) v(X,G)
par(Z, Y) v(P, Y) When the two view atoms
are combined, a containment check discovers that
GY ? containment, redundancy of 2nd atom
Alternative given par(X, Z) v(X,G), since Z
(join var) occurs in 2nd atom of query, add
par(Z, Y) to coverage of v(X,G), with GY In 2nd
stage, just use v(X, Y)
13
  • Assumptions, terminology
  • CQ queries and views, for now no constants /
    constraints in query/views
  • View definitions use variables different from
    those in query or other views (disjoint sets of
    variables)
  • b(Q) body atoms of Q, b(V) body atoms of view
    V
  • A mapping from vars(Q) to a vars(V) is
    interesting only if it maps a non-empty subset
    of b(Q) to b(V)
  • Considered mappings always map Q head vars to V
    head vars head var preservation (hvp)
  • If h maps x in vars(Q) to an existential var in
    some V, then all atoms of b(Q) that contain x
    must be mapped to same V
  • join variable condition --- (jvc)

14
  • Given Q(X), assume Q is a rewriting in terms of
    views
  • Q q(X) - v1(X1), , vn(Xn)
  • (some vi, vj may be occurrences of
    same view v)
  • Exists containment mapping h from Q to
    exp(Q) (satisfies hvp)
  • Let
  • Gi be the set of atoms of b(Q) mapped to
    b(exp(vi))
  • h/i h restricted to vars(Gi)
  • Then
  • And Gi satisfies (jvc)
  • if h/i maps x of vars(Gi) to existential
    variable of vi,
  • then every atom g in b(Q) that contains
    this atom is in Gi

15
The occurrence of vi in Q may have some head
variables equated Example the original head
might be vi(A, B, C) the head in Q vi(X,
X, Z) These equalities are given by a unique
least set of equality constraints Ei (v/E --
the view v, with head variables equated as
specified by E) Summary (so far) the
containment mapping can be decomposed into
disjoint components (vi, Ei, h/i , Gi) All we
need to do is find such components, then combine
them What is the condition for successful
combination? Does a combination (s.t.
) ever fail ?

16
  • To find such components, we must use the given
    view definitions (variables different from those
    of Q or exp(Q)).
  • Answer a component and its mapping can be
    expressed as
  • Here
  • hi is a mapping from Q to the given view
    definition for vi
  • Ei the least set of equalities that make
    hi a good mapping
  • hi is a variable renaming
  • Ei and hi depend only on Q and the definition of
    vi
  • We can find components mappings from Q to the
    view defs, then combine rename, possibly
    equating more head vars

h/i
Gi
exp(vi(Xi))
hi
hi
vi/Ei
17
  • One more step
  • A component (vi, Ei, hi , Gi) may be further
    decomposed into smaller components (vi, Ei1, hi1
    , Gi1), (vi, Ei2, hi2 , Gi2) provided
  • each of Gi1, Gi2 satisfies (jvc), and they are
    disjoint
  • Each of Ei1, Ei2 is a subset of Ei, least sets
    for the mappings hi1, hi2 to be ok
  • When these are combined, Ei1 union Ei2 is
    augmented with the remaining equalities of Ei
  • Minimal such components
  • Easier to find
  • Can be re-used for different combinations.

18
  • What is a minimal component?
  • C (vi, Ei, hi, Gi) is minimal if
  • hi satisfies (hvp) (jvc) (assuming the
    equalities in Ei)
  • There is no component C1 whose last three
    components are contained in Cs last three
    components (at least one is proper containment)
  • A component minicon (mini containment)
    description -- MCD
  • The algorithm constructs and combines minimal MCDs

19
The MiniCon Algorithm
  • Minimal MCD Construction Algorithm
  • For each g in b(Q), each k in each b(vi)
  • Let E(g,k) be the least set of equalities s.t.
    a mapping h(g,k) from g to k that satisfies (hvp)
    exists
  • // E(g,k)
    and h(g,k), if they exist,
  • // are
    uniquely determined by g, k
  • If E(g,k) and h(g,k) exist
  • find all minimal MCDs that extend them
  • (vi, Ei, hi, Gi) extends if
  • Ei contains E(g,k), hi contains
    h(g,k), Gi contains g
  • For the final set of MCDs remove duplicates

20
  • How do we find minimal MCDs that extend a given
    mapping?
  • I. Extension to one more query atom, one view
    atom
  • extend (vi, E, h, g, k) // E equalities on head
    vars of vi
  • // h
    vars(Q) ? vars(vi), partial, hvp with E
  • // g in
    b(Q), k in b(vi)
  • try to extend h to map g to k, with hvp, by
    adding equalities to E
  • return fail, or the (uniquely determined)
    E,h
  • (The first step in alg. of previous page is this
    one, given empty E and h)

21
  • How do we find minimal MCDs that extend a given
    mapping?
  • II. Extend repeatedly, as long as needed and
    successful
  • Given vi, g, k , E(g,k) and h(g,k)
  • Let C (vi, E(g,k), h(g,k), g, MC

    //C initial component, (jvc) possibly not
    satisfied
  • While C not empty
  • remove some c (vi, E, h, G) from C
  • if (jvc) satisifed put in MC
  • if not, exists x in vars(Q) s.t. h(x) is
    existential, g that contains x, g not in G
  • for each k in b(vi)
  • if extend(vi, E, h, g, k)
    succeeds, put extension in C
  • Remove duplicates from MC

22
  • Example
  • A db parenthood relation par(c, p)
  • A view v(C, G) - par(C, P), par(P, G) //
    only grandchildren
  • A query Q q(X, Y) - par(X, Z), par(Z, Y)
  • MCDs
  • 1st query atom, 1st view atom h(1,1) X?C, Z?
    P, E(1.1)
  • need to extend to par(Z, Y), can only map to
    2nd view atom
  • MCD (v, E, hX?C, Z?P, Y?G, b(Q))
  • 1st query atom, 2nd view atom no mapping
  • The only MCD is the above

23
Comment In the paper, if (vi, Ei1, hi1,
Gi1) and (vi, Ei2, hi2, Gi2) are both minimal
extensions, and Gi1 is contained in Gi2, then
the 2nd is thrown away (another minimization) I
do not know how to explain this optimization,
or prove that with it the algorithm is still
complete
24
2nd phase MCD combination, and variable renaming
A set of MCDs (vi, Ei, hi, Gi) is a candidate
if For each candidate set Rename variables
for each view variable y If hi(x) y (y a
view variable), rename y to x else rename y
to a fresh distinct variable Note if x in
domain of both hi, hj , then hi(x), hj(x) are
head variables of vi, vj (by def of MCD), ?
renaming makes them equal
25
Example (contd) A db parenthood relation
par(c, p) A view v(C, G) - par(C, P), par(P,
G) // only grandchildren A query Q q(X,
Y) - par(X, Z), par(Z, Y) MCD (v, E,
hX?C, Z?P, Y?G, b(Q)) Rename in v C to X, G to
Y Rewriting q(X, Y) - v(X, Y)
26
  • Example
  • A db parenthood relation par(c, p)
  • A view v(C, G) - par(C, P), par(P,
    G) // only grandchildren
  • A query Q q(X, X) - par(X, Z), par(Z, X)
    // I am my own grandpa
  • MCDs
  • 1st query atom, 1st view atom h(1,1) X?C, Z?
    P, E(1.1)
  • need to extend to par(Z, X), can only map to
    2nd view atom
  • MCD (v, CG, X?C, Z?P, b(Q))
  • 1st query atom, 2nd view atom no mapping
  • The only MCD is the above

27
  • Example
  • A db parenthood relation par(c, p)
  • A view v(C, P) - par(C, P), par(P,
    G)

  • // parents where grandparents exist
  • A query Q q(X, Y) - par(X, Z), par(Z, Y)
  • MCDs
  • h(1,1) X? C, Z? P, E(1.1)
  • ? MCD A1 ( v(C, P), , h(1,1),
    par(X,Z) )
  • h(1, 2) X? P, Z ? G, E(1,2), fails (why?)
  • h(2, 1) Z? C, Y ? P, E(2,1)
  • ? MCD A2 ( v(C, P), , h(2,1), ,
    par(Z,Y) )
  • h(2, 2) Z? P, Y ? G, fails (why?)

28
A view v(C, P) - par(C, P), par(P,
G) A query Q q(X, Y) - par(X, Z),
par(Z, Y) MCDs A1 ( v(C, P), ,
h(1,1), par(X,Z) ) A2 ( v(C, P), ,
h(2,1), par(Z,Y) ) Rewritings (rename views
to have distinct vars) A1A2 X? C1, Z? P1, Z?
C2, Y ? P2 add P1 (in 1st v) C2 (in 2nd v)
rewriting v(C1,P1), v(P1, P2) renaming
v(X, Z), v(Z, Y) a correct rewriting
29
  • When Q or views contain constants
  • MCD formation
  • a of Q must be mapped to a head variable of vi,
    or itself
  • If x is in headvar(Q), it can be mapped to
    headvar(vi) or to a
  • Whenever x is mapped to a, hi records this fact
  • MCD combination
  • If A1, A2 are defined on x, then allow also
  • Both map x to a
  • One maps x to a, the other to head var of view
  • In either case, rename x to a in rewriting

30
  • When Q or views contain comparisons
  • If views contain comparisons, no change to
    algorithm (it finds contained
    rewritings anyway)
  • If Q contains comparisons, then there may be no
    Datalog program that computes the certain answers
    (can express x ! y)
  • But, we can expect that extending the algorithm
    for comparisons will be a good heuristics, and
    will find certain answers in many cases

31
  • When Q or views contain comparisons
  • C(Q) constraints of Q (closed under inference)
  • MCD formation (vi, Ei, hi, Gi) (extend the
    join variable condition)
  • If hi(x) is existential of vi, and c(x, y) in
    C(Q), then hi(y) is defined
  • C(vi) must imply all constraints in hi(C(Q))
    that involve at least one existential of vi
  • MCD combination
  • Add all constraints of C(Q) not covered by those
    of the views
Write a Comment
User Comments (0)
About PowerShow.com