Views as Incomplete Databases Certain - PowerPoint PPT Presentation

About This Presentation
Title:

Views as Incomplete Databases Certain

Description:

V may also be mixed: some views are sound, others are complete ... Since we have only the views, this is the set of possible databases. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 20
Provided by: off9
Category:

less

Transcript and Presenter's Notes

Title: Views as Incomplete Databases Certain


1
Views as Incomplete Databases Certain
Possible Answers
  • Views an incomplete representation
  • Certain and possible answers
  • Complexity results for certain answers

2
Views an incomplete representation
  • Given a view def V, view extension I
  • Sound V I is contained in V(D)
  • Complete V I contains V(D)
  • Precise V I V(D)
  • V may also be mixed some views are sound, others
    are complete
  • In general, more than one db D may exist s.t.

3
  • Example teams in World Cup Soccer Tournament
  • Global scheme Team(country, group) (gr
    assignment for 1st round)
  • Source1 S-C(C) the countries that participate
  • Source2 S-Q(C) -- countries that participated
    in qualifying games
  • Source3 S-T(C) teams whose games will be on
    T.V
  • For all three, the logical mapping is
  • v(X) - Team(X, Y)

4
Given V (including a specification in s/c/p) and
I poss(V,I) D D is a db for which I is a
possible view Since we have only the views, this
is the set of possible databases. For sound views
an infinite set For complete views contains
the empty db For precise views may be empty
-- inconsistent views Example v1(X, Y)
- R(X, Y, Z), v1(a, b), (b, c) v2(X,Z)
- R(X, Y, Z), v2(a, d), (c, e) The above
changes when the global db is known to satisfy
constraints (e.g. keys)
5
Certain and possible answers
  • Now, assume also a query Q
  • cert(Q, V, I) seems easier to compute, always
    finite
  • poss(Q, V, I) may be infinite
  • and where do we obtain
    values not in I?
  • A possible approach a finite representation of a
    possibly infinite family of partially unknown
    databases

6
We concentrate on certain answers -- an absolute
notion of answering queries using views Cert(Q,
V, I) depends on soundness/completeness of views
Example global p(x, y) v1(x) - p(x,
y), v2(y)- p(x, y) I v1(a), v2(b)
Q q(x, y) - p(x, y) Sound views cert(Q, V,
I) is empty Precise views cert(Q, V, I) is
(a, b)
7
An issue in query processing For same example,
let Q s(x) - p(x, y) To allow relational
algebra manipulation of certain answers, we need
more than a simple relational representation! We
need algorithms for performing operations on
representations of partially unknown dbs (not in
this course)
8
  • From now sound views, certain answers
  • Was investigated for
  • views defined in L1, query defined in L2,
    where
  • L1, L2 in CQ, CQ!, NR-Datalog, Datalog, FO
  • Results include
  • Complexity lower bounds
  • Algorithms upper bounds

9
Complexity results for certain answers
  • Thm for V in L1 , Q in L2, the following are
    equivalent
  • (a) computing cert(Q, V, I)
  • (b) deciding containment is Q1 (in L1)
    contained in Q2 (in L2)?
  • (a) is decidable iff (b) is
  • When decidable,
  • combined complexity of (a) query complexity
    of (b)
  • data complexity of (a) lt query complexity of (b)
  • Data complexity function of db size
  • Query complexity function of query size
  • Combined both

10
Proof (sketch) ? given t, how hard to decide
if t is in cert(Q, V, I)? Let I vi(tij),
define Q by Q contains the rules that define
V, and one more large rule
(t follows from facts in
I) Claim Hence deciding if t in cert(Q, V, I)
is no harder than this containment (Note for L1
CQ, need to massage Q into CQ)
11
  • How hard to check containment of Q1 in Q2?
  • let p be a new predicate
  • Define V by rules of Q1, and v(c) - q1(X),
    p(X) ,
  • let I v(c)
  • Define Q by rules of Q2 , and q(c) - q2(X),
    p(X)
  • Then (c) is in cert(Q, V, I) iff Q1 is
    contained in Q2

12
  • Consequences computing certain answers
    (depends on L1, L2)
  • Is undecidable for Datalog, FO
  • decidable if one side lt datalog, other
    side lt nr-datalog
  • For decidable cases, the above gives combined
    complexity,
  • We are interested more in data complexity here
    it is
  • Co-NP data complexity is bad impractical to
    compute, no datalog plan!
  • We will not prove co-NP complexity results

same
13
  • Claim For Q in Datalog, V in CQ(!), let V be
    the same view def, with inequalities omitted
  • Then cert(Q, V, I) cert(Q, V, I)
  • (Computing the certain answers from I using V w/o
    the inequalities gives same results)
  • Proof
  • (b) If t is in cert(Q, V, I), then
    for
    each D in poss(V, I), t in Q(D)
  • If D also in poss(V, I) -- fine
  • If D not in poss(V, I), exists larger D in
    poss(V, I) s.t. t is in Q(D)
  • Hence, t is in cert(Q, V, I)

14
  • Proof of last claim
  • some s in I, but s not in V(D), because of some
    inequality
  • Since s is in V(D), inequality involves
    attribute in view body
  • can add some tuples to D so obtain D1, s.t. s
    is in V(D1)
  • adding for all such s gives D that contains
    D, s.t. D is in poss(V, I)
  • If t in Q(D), since Q has no inequalities, t
    also in Q(D)

15
  • For CQ views, Datalog queries,
  • Query plan datalog program P on V
  • exp(P) replace views by their definitions
  • (using fresh names for existential
    variables)
  • P is maximally-contained in Q
  • exp(P)(D) is contained in Q(D)
  • exp(P)(D) is contained in ep(P)(D) for all other
    plans P
  • Such a plan is best among all plans
  • (This is a language-dependent notion given a
    more expressive language, P may not be best any
    more)
  • But, if a plan delivers cert(Q, V, I) it is
    absolutely best

16
  • Thm For CQ sound views, Datalog queries,
  • the inverse rules algorithm computes cert(Q, V,
    I)
  • (Thus, for this case, a Datalog query plan can
    give the absolute best possible answer)
  • Corollary If P is max-cont(Q) then, for all
    view instances, I P(I) cert(Q,
    V, I)
  • we proceed to prove the theorem

17
  • Def A tableau is a collection of atoms, with
    constants and variables
  • A tableau T represents a db D
  • there is a valuation from T
    into D
  • Rep(T) D for some h, D contains H(T)

18
  • Claim For a Datalog query Q, tableau T
  • cert(Q, rep(T)) the tuples w/o variables
    in Q(T)
  • Proof
  • Can consider only D in rep(T) s.t. D h(T)
  • every tuple in Q(D) but not in Q(D) where D
    is larger than h(T) is not in cert(Q, rep(T))
  • (b) For such D, h(Q(T)) Q(D)
  • ? a ground tuple in Q(T) is in cert(Q, rep(T))
  • (c) For a non-ground t tuple in Q(T), can find
    D1, D2 in rep(T) that give different values to
    variables in t
  • ? no instance of this tuple is in cert(Q, rep(T))

19
  • The inverse rules of V create from a view I a
    database with elements that are skolem functions.
  • Consider each skolem term to be a distinct
    variable
  • This is a tableau T(V, I)
  • Claim T(V, I) represents poss(V, I)
  • Proof easy
  • Corollary is cert(Q,
    V, I)
  • This is precisely what the inverse rule algorithm
    produces
  • For each I, the inverse rules produce T(V, I),
    then apply Q
  • end of
    story
  • Next one more (last) algorithm, for CQ queries
    and views, that is fastest so far
Write a Comment
User Comments (0)
About PowerShow.com