NTUA - PowerPoint PPT Presentation

About This Presentation
Title:

NTUA

Description:

How to express the view? How to 'compose' the client query with the view, ... Mixed And Redundant Storage (MARS) initial configuration. view of proprietary. data ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 79
Provided by: Alin161
Category:
Tags: ntua

less

Transcript and Presenter's Notes

Title: NTUA


1
  • XML Query Reformulation
  • Val Tannen
  • University of Pennsylvania
  • Joint work with Alin Deutsch, UC San Diego
  • and in part with Lucian Popa, IBM Almaden

2
Data Exchange Between Businesses Using XML
published data
published data
pharmaceutical company
insurance company
published data
published data
hospital
3
XML?
ltdruggt ltnamegtaspirinlt/namegt
ltpricegt4lt/pricegt ltnotesgt
ltside-effectsgtupset stomachlt/side-effectsgt
ltmakergtBayerlt/makergt lt/notesgt lt/druggt
text
4
A Simple Publishing Scenario
virtual data
ltstudygt ltcasegt ltdiaggtmigrainelt/diaggt
ltdruggtaspirinlt/druggt
ltusagegt2/daylt/usagegt lt/casegt ltcasegt
ltdiaggtallergylt/diaggt
ltdruggtcortisonelt/druggt ltusagegt3/daylt/usage
gt lt/casegt lt/studygt
patient name is hidden
XML query language standard (draft)
published data
proprietary data
prescription
patient
usage drug name 2/day
aspirin John 3/day cortisone
Jane
name diagnosis
John migraine Jane
allergy
How to express the view?
View query which, if executed,
would produce the virtual data
How to compose the client query with the
view, obtaining the reformulation?
5
The General Problem of Query Reformulation
client
query Q(P)
? reformulated query X(S)
schema P
schema S
schema correspondence
soundness
Given query Q(P), find query(ies) X(S) returning
same answer,
whenever such X(S) exists
completeness
6
Applications of Query Reformulation
  • data publishing
  • data integration
  • schema evolution
  • data security

we just saw it public schema / storage schema
P
S
global schema / local schema
P
S
old schema / new schema
P
S
illustrated next
7
An Application Data Security
client
query E(S) (exposes secret data correlation)
public schema P
proprietary schema S
schema correspondence
Only possible if Completeness Property holds!
8
More Complicated Data PublishingMixed And
Redundant Storage (MARS)
initial configuration
9
An Example With Tuning
XML
XML
drug,usage,diagnosis
simple publishing view
identity view
XML
drug,price,notes
drug,usage,name
name,diagnosis
10
Redundancy Enables Multiple Reformulations
client query find how much each treatment costs
XML
XML
drug,usage,diagnosis
simple publishing view
identity view
cached query
relational view
XML
XML
drug,price,notes
drug,price
drug,usage,name
name,diagnosis
diagnosis,drug
Some reformulations are potentially cheaper to
execute than others. Want to find an optimal
one!
11
Schema Correspondence Expressible in XQuery
The DB administrator must be able to specify the
correspondence.
XML
XML
XQuery
XQuery
XQuery
XQuery
XML
XML
encode
encode
XML
XML
Can use XQuery, fixing any of the common
encodings of relational tables in XML.
12
XQuery?
binding part
drug
for d in document/drug,
m in d//maker return ltproducedBygtm/text()lt/p
roducedBygt
name
price
notes
aspirin
4
side-effects
maker
tagging template
upset stomach
Bayer
// (descendant) is the transitive closure of /
(child)
Result should contain ltproducedBygtBayerlt/produce
dBygt
13
Approach XQuery Reformulation Reduced to
Relational Reformulation
14
XQuery Semantics
Variable binding stage
for d in document/drug, m
in d//maker return ltproducedBygtm/text()lt/p
roducedBygt
XML data model is a tagged tree
ltdruggt ltnamegtaspirinlt/namegt
ltpricegt4lt/pricegt ltnotesgt
ltside-effectsgtupset stomachlt/side-effectsgt
ltmakergtBayerlt/makergt lt/notesgt lt/druggt
tagging stage
15
Compiling the Binding Part of XQueries to
Relational Queries
XBind query binding part of XQuery (returns a
relation tuples of variable bindings)
a relational conjunctive query
compiles to P(d,m) - Root(r) , child(r,d)
, tag(d,drug) ,
desc(d,x) , child(x,m) , tag(m,maker)
But not all models of this schema correspond to
the intended model need GReX !
16
Sample Constraints from GReX
  • Relationship between child and descendant
    navigation
  • ?x?y child(x,y) ? desc(x,y)
    desc contains child
  • ?x el(x) ? desc(x,x)
    desc is reflexive
  • ?x?y?z desc(x,y) ? desc(y,z) ?
    desc(x,z) desc is transitive
  • Tagged tree structure of XML
  • ?r?x root(r) ? desc(x,r) ? x r
    root has no ancestors
  • ?x?y?z child(x,z) ? child(y,z) ? x y
    at most one parent

These do not capture transitive closure
completely, nor is it possible to do it in
first-order logic STILL...
17
More Constraints from GReX
  • (some Tag) ?x el(x) ? ?t tag(x,t)
    every element has a tag
  • (oneTag) ?x?t1?t2 tag(x,t1) ? tag(x,t2) ?
    t1 t2 one tag per element
  • (noLoop) ?x?y desc(x,y) ? desc(y,x) ? x
    y no non-trivial cycles
  • (noShare) ?x?y?u?v child(x,u) ? child(x,v)
    ? unique path between
  • desc(u,y)
    ? desc(v,y) ? u v elements
  • (inLine) ?x?y desc(x,u) ? desc(y,u) ?
    ancestors of an element
  • x y ?
    desc(x,y) ? desc(y,x) are collinear

18
Which Reformulations Do We Find This Way?
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
all of them?
19
Restrictions on XQuery
  • Main restriction no aggregates (to be
    investigated)
  • Leaving out aggregates, most common queries can
    be processed.
  • Minor restrictions
  • no user-defined functions (of course!)
  • limited use of negation (or else the problem
    becomes undecidable)
  • limited use of document order (to be
    investigated)
  • no navigation to parent or wildcard child (of
    unspecified tag) (unintuitive, but we can show
    that this needs another algorithm, unless NP ? 2)

p
20
The Reduction is Sound and Complete
  • For the restricted XQuery fragment,
  • Given
  • - XBind query B
    ? compiled to a relational query
    c(B)
  • - schema correspondence C given by XQueries ?
    compiled to set of constraints c(C)

Relative Completeness Theorem R
is a minimal reformulation of B under
C iff
c(R) is a minimal reformulation of
c(B) under c(C) and GReX
R can be computed from c(R)
All of them are found by CB.
21
A Glimpse at the ChaseTransforming Queries
Using Constraints
A query find data satisfying condition A
A
Q
The chase repeatedly applying chase steps until
no new conditions can be added
In general, Q and Q1 are not equivalent, but in
all DBs satisfying the constraint, they are!
Theory of the chase 20 years old, deep and rich,
due to Beeri, Maier, Mendelson, Sagiv, Vardi,
Yannakakis and others!
22
How Do We Use the Chase?Capturing Relational
Views With Constraints
Let the schema correspondence be the view
retrieve the data satisfying conditions A and
B
V
A
B
all data satisfying A and B appears in
result of V
all data appearing in V satisfies A and B
23
Chase Backchase
First chase
A
Q
Next inspect all subqueries (syntactic pieces)
of the chase result Q2
SQ
V
It turns out that SQ is equivalent to Q
Presence of constraint A ? B allows reformulation

24
General CB Algorithm (joint work with Lucian
Popa, IBM Almaden)
  • (public) schema P , (proprietary) schema S
  • Let C be a set of constraints. (eg., on P
    and/or P S )

Assume some terminating chasing sequence
Q(P)
25
Two Sets of Experiments
  • Synthetic queries
  • reformulation time as function of query
    complexity
  • XML analog of relational star queries,
    increasing number of joins
  • can very complex queries still be
    reformulated in a practical amount of time ?
  • Realistic queries from the XML Benchmark
    Project http//monetdb.cwi.nl/xml
  • The Queries 20 queries designed to
    exercise interesting features of XQuery
  • The Schema correspondence views in both
    directions

  • compiles to about 200 constraints!

Much more than in typical relational schemas!
26
Experiments with Synthetic Queries
Number of joins (number of corners in the star)
27
Experiments with Benchmark Queries
Reformulation times must be understood in
conjunction with execution times (eg., tens of
seconds for Q10)
28
Summary of Contributions
  • MARS, a system for XQuery reformulation,
  • - with mixed and redundant storage, under
    integrity constraints.
  • - complex schema correspondence (views in both
    directions)
  • Showed practical relevance of CB method
    (feasible and worthwhile)
  • A completeness result for a significant fragment
    of XQuery and a large
  • class of schema correspondences. The method
    remains sound for the full language.
  • A reduction between minimal reformulation and
    query equivalence, and
  • we gave matching lower bounds showing our
    chase-based decision procedure is
  • asymptotically optimal for the fragment
    considered.

29
The End
30
Why XML?
  • The relational data model is still the dominant
    concept in databases.
  • All data can be coded into tables.
  • (For that matter into (goedel)numbers too!)
  • Artificial coding makes life harder for query
    programmers.
  • Result less productivity, more bugs.
  • XML is much more flexible. It is also
    self-describing, i.e., no
  • need apriori for types/schemas (but this is
    sometimes a bad idea).
  • It came from the document community (tagged text)
  • and was cheered by industry gurus. So we have to
    live with it.
  • (Although one can image better data models)

31
Making It Work
  • Chase each chase step is similar to evaluation
    of a recursive Datalog rule on a
  • symbolic database built from
    the query
  • ? we borrowed classical query
    processing techniques

Backchase size of search space is O(2u), u
size of universal plan We
found criteria for pruning this space.
  • compiling constraints to join tree
  • joins implemented as hash-joins
  • pushing selections into joins
  • Cost-independent prune subqueries that
  • - do not correspond to legal XML queries
  • - contain redundant descendant navigation
    steps

bottom-up exploration of subqueries first
all performing 1 navigation step, next all
performing 2 navigation steps, etc.
Perform contiguous navigation steps starting from
the root
x child-of y, y child-of z, x descendant-of z
  1. A cost-based pruning strategy parameterized by
    costing model

- finds optimal reformulation for any monotonic
cost model - cost models for XML are still under
research - heuristic cost model cost is
number of table scans/XML navigation steps
performed - amenable to experimenting with
other cost models
32
Benefit of Reformulation For Execution Time
no. of elements in document
Benefit increases with increasing complexity of
query and increasing database size
33
More Results for Benchmark Queries
Delta to finish search
Delta to best reformulation
Time to first reformulation
For redundancy materialized the XBind query for
each query
(particular case of Acess Support Relation)
Time to find first reformulation is essentially
the same as in the absence of redundancy. Addition
al time spent only for finding optimal one.
34
Related WorkData Integration As Particular Case
of MARS Applications
Global As View (GAV)
Q
XQ o CR
P
(global schema)
CR
S
(local schema)
with Fernandez and Suciu in SIGMOD99
reformulation by composition-with-views
TSIMMIS, SilkRoute, XPeranto
35
Future Work Directions
  • Short-Term
  • - tuning of CB implementation for further
    speedup
  • - XML-specific strategies for pruning the
    backchase stage
  • - in particular, finding a good cost model to
    perform cost-based pruning
  • Medium-Term
  • - Applying CB to Data Security
  • - Applications to Adaptive Distributed Query
    Optimization
  • Long Term
  • - a unified framework for integrating data from
    various, heterogenous sources going
  • beyond classical databases (XML/relational/LDAP
    web forms web services)

36
Application 3 Schema Evolution (e.g. Caching)
Goal support existing client applications even
after changing the schema
client
old query Q (O)
old schema O
new schema N
schema correspondence
could be O extended with cached results
37
A Source of Redundancy Relational Storage of XML
catalog
drug
drug
name
price
notes
price
notes
name
50
aspirin
cortisone
4
38
Containment Under Integrity Constraints
  • Decision procedure for containment is based on
    chasing with constraints from GReX.
  • Natural extension to XML integrity constraints.
  • Some results
  • Containment of well-behaved XPath/XBind queries
    under bounded simple XML integrity constraints
    (SXICs) is decidable (used in relative
    completeness theorem).
  • Even modest use of unboundedness makes the
    problem undecidable.
  • Corollary containment under bounded SXICs and
    DTDs is undecidable.
  • Containment under DTDs only is an open problem,
    but we have a PSPACE lower bound.
  • See proposal for details.

39
LDAP
40
The Very End
41
The Architecture of Our Solution
client XQuery
defined next
Mappings (?) as XQueries rel/XML encodings
schema correspondence
not shown here
reformulated queries (multiple solutions)
42
  • Problem
  • XML/MARS XQuery Reformulation
  • schema correspondence given by views in both
    directions
  • multiple solutions

43
Capturing Relational Views With Constraints
Let the schema correspondence be a view defined
as the relational conjunctive query V(x,z) -
A(x,y), B(y,z)
Capture the definition with constraints,
(cV) ?x ?y ?z A(x,y) ? B(y,z) ? V(x,z)
(bV) ?x ?z V(x,z) ? ?y A(x,y) ? B(y,z)
44
Partially capturing the XML model
  • Partially, because some features cannot fully be
    captured with constraints
  • descendant is the transitive closure of child,
    but this is not FO-definable
  • neither is the treeness property
  • our solution
  • add a set of constraints GREX to approximate
    intended models
  • it turns out that capturing descendant
    helps in capturing treeness
  • then, we define a significant XQuery fragment
    (we call it well-behaved)
  • that cannot distinguish between
    intended and approximate models

45
Constraints in GReX (2) the tagged tree
structure of XML
  • (topRoot) ?r?x root(r) ? desc(x,r) ? x r
    root has no ancestors
  • (oneTag) ?x?t1?t2 tag(x,t1) ? tag(x,t2) ?
    t1 t2 one tag per element
  • (noLoop) ?x?y desc(x,y) ? desc(y,x) ? x
    y no non-trivial cycles
  • (oneParent) ?x?y?z child(x,z) ? child(y,z) ? x
    y at most one parent
  • (noShare) ?x?y?u?v child(x,u) ? child(x,v)
    ? unique path between
  • desc(u,y)
    ? desc(v,y) ? u v elements
  • (inLine) ?x?y desc(x,u) ? desc(y,u) ?
    ancestors of an element
  • x y ?
    desc(x,y) ? desc(y,x) are collinear

46
XQuery Restrictions
  • What it allows
  • composition of navigation
    steps,
  • navigation axes self,
    (named)child, descendant, ancestor, idrefs
  • qualifiers path,
    string ? path, and, or, path
    equality/inequality
  • where clause
    disjunction, path equality/inequality,

  • existential quantification
  • What it rules out
  • user-defined functions,
  • range, before predicates,
  • aggregates, arbitrary
    negation, universal quantification,
  • concatenation (,)
  • navigation to parent (..) or
    to child of unspecified name ()

47
CB Completeness
  • Let C be a set of constraints (relates public
    schema P and proprietary schema S)
  • C-minimal query
  • removing any of its relational atoms
    produces non-equivalent query under D
  • Q1 is a subquery of Q2
  • Q1 is isomorphic to a piece of Q2

Q(P)
Completeness Theorem Any C-minimal reformulation
of Q is a subquery of U
48
A Completeness Result for Our Solution
  • Given
  • - well-behaved XBind query B
  • compiled to a relational query c(B)
  • - schema correspondence M given by well-behaved
    XQueries (in both directions),
  • compiled to set of relational
    constraints c(M)
  • - bounded XML integrity constraints XIC,
  • compiled to set of relational
    constraints c(XIC)

a class of XML integrity constraints, see
KRDB01
Relative Completeness Theorem for any R
R is a (MXIC)-minimal
reformulation of B
iff c(R) is
a (GReX ? c(M) ? c(XIC))-minimal reformulation of
c(B)
All of them are found by CB. Corollary
completeness of reformulation algorithm for XBind
queries
R can be computed from c(R)
49
Capturing XML Semantics
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
50
Summary of Constraints Used in CB Phase
  • Built-in constraints in GReX
  • Relational views compile to inclusion
    constraints
  • XQuery views
  • their XBind queries compile to inclusion
    constraints as for relational views
  • their return clause compiles to several
    decorrelated queries, each captured with
    constraints
  • the XML template in the return clause compiles to
    several Skolem and copy functions, each compiled
    to constraints
  • Integrity constraints
  • XML constraints compile to relational constraints
  • relational schema constraints

51
Are the Restrictions Justified?
  • Our completeness result holds for well-behaved
    XQueries, under bounded
  • XML integrity constraints.
  • What about reformulating
  • XQueries with parent and wildcard child
    navigation?
  • Under other XML integrity constraints?
  • Even under full-fledged DTDs?
  • For such extensions, we make a deeper study of
    equivalence, which is an even simpler problem in
    reformulation.
  • The equivalence checker is invoked as black-box
    algorithm during CB.

52
XBind (includes XPath) Fragments
Equivalence
path concatenation, attribute values navigation
axes self, (named)child, descendant qualifiers
path, string ? path, and
PTIME
join on attribute variables
NP-complete
any or all (!) of the following .
disjunction . ancestor navigation .
path equality . wildcard child (?)
navigation parent, preceding(following)-sibling
53
Containment for the well-behaved fragment of
XBind/XPath
Theorem B1 , B2 XBind/XPath queries from our
well-behaved fragment c(B1) , c(B2) their
relational compilation B1 is
equivalent to B2 iff c(B1) is
equivalent to c(B2) under GReX
decidable in P2p using chase
This result about containment is used in the
relative completeness theorem
54
Extensions of the NP fragment ?2p fragments
  • any or all (!) of the following make equivalence
    ?2p-complete
  • disjunction
  • unsurprising conjunctive queriesunion
    already ?2p-complete SY80
  • ancestor navigation
  • translate ancestor away introducing union
    /a/b/ancestor ? /a/b ? /ab
  • path equality qualifier
  • can simulate ancestor
    //..//./p/s ? /p/ancestor/s
  • wildcard child navigation
  • union introduced by interaction //??
    //a ? /a ? /?//a

Not well-behaved, but we have a different
decision procedure
55
Experimental Setup Started From the XML Benchmark
  • Used the official XML Benchmark Project
    http//monetdb.cwi.nl/xml
  • The application domain an online auctioning
    application.
  • The published schema a DTD given by the XML
    Benchmark Project
  • Data is partially nicely structured.
  • The Queries 20 queries designed
    to exercise interesting features of XQuery

56
What We Added to the XML Benchmark Setup
The mixed storage schema
relationally person, item, open auction,
closed auction, etc.
unstructured part annotations on items The
redundancy materialized the XBind query for
each query
(particular case of Acess Support Relation) The
mappings in both directions
relations ? XML, XML ? XML
It all compiles to about 200
constraints !
Much more than in typical relational schemas! Had
to change original implementation SIGMOD00 to
scale.
57
Related Work
  • Publishing systems
  • Schema mapping proprietary relational ?
    published XML SilkRoute, Xperanto
  • reformulation by composition-with-views.
  • Schema mapping published XML ? proprietary
    relational STORED, Agora
  • reformulation by rewriting-with-views
  • Information Integration
  • TSIMMIS (composition-w-views), Information
    Manifold (rewriting-w-views)
  • Containment
  • Miklau and Suciu, smaller fragment of
    XPath(they too find that is naughty
  • FLS, CGLV - conjunctive regular path
    queries
  • Amer-Ahia and Srivastava - minimization of
    tree pattern queries
  • Containment under integrity constraints
  • XML keys BDFHT description logics CGL

58
Query Reformulation in Data Publishing
public schema P (virtual data)
schema interface against which
queries are formulated
publishing query (may hide some proprietary
data)
proprietary storage schema S (materialized data)
59
Compiling the Binding Part of XQueries to
Relational Queries
But, over arbitrary DBs with this schema, the
relational translation of Root ? desc ?
desc is not equivalent to that of
Root ? desc
must communicate to the CB that desc table is
transitive
60
The Challenge for Reformulation on MARS
  • To find the reformulations efficiently, we need
    to
  • reason with schema correspondence
  • efficiently construct the search space for
    reformulations
  • - must contain all reformulations (for
    completeness)
  • explore search space
  • - exhaustively (for security applications)
  • - maybe trading optimality of reformulation for
    search speed
  • (for optimization purposes)

61
Contributions
  • A novel algorithm for reformulation of relational
    queries under relational constraints
  • Chase Backchase

Uses this semantics and exploits CB
  • A declarative semantics for most of XQuery

VLDB99 with Popa and Tannen SIGMOD00 with
Popa, Sahuguet and Tannen
  • A reformulation algorithm for XQuery
  • practical (feasible and worthwhile)
  • complete for most of XQuery
  • optimal (we show lower bounds for various XQuery
    fragments KRDB01, DBPL01)
  • MARS a system for XQuery reformulation over
    Mixed And Redundant Storage
  • constructs and represents search space
    efficiently
  • cost-based exploration strategy parameterized by
    traditional costing module
  • finds first reformulation fast
  • Experimental evaluation time to first
    reformulation, simple cost

62
Compiling Client XQueries
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
63
Capturing the Schema Correspondence
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
64
Major Obstacles in Compiling Schema Mappings to
Constraints
  • Schema correspondence given by XQueries. As
    opposed to relational queries,
  • XQueries have nested, correlated subqueries in
    return clause
  • XQueries create new elements
  • XQueries return deep, recursive copies of
    input XML trees
  • (solution not shown)

65
Compiling Nested Subqueries Decorrelation
  • the query
  • for p in doc(foo.xml)//person
  • return ltresgtp/phone/text()lt/resgt
  • is short for the nested query
  • for p in doc(foo.xml)//person
  • return ltresgtfor t in p/phone/text()
  • return t
  • lt/resgt

compile XBind parts to two decorrelated
relational queries (shown here in Datalog
syntax) Bouter(p) ? Root(r), desc(r,x),
child(x,p), tag(p,person) Binner(p,t) ?
Bouter(p), child(p,n), tag(n,phone),
text(n,t) capture each with two inclusion
constraints, as done in original CB method
66
Capturing Creation of New Elements
  • for p in
    doc(foo.xml)//person
  • return ltresgtp/phone/text()lt/r
    esgt
  • For each binding of p, a distinct ltresgt-element
    is constructed.

Capture F by the relation G representing its
graph, and the constraints ?p?r1?r2 G(p,r1) ?
G(p,r2) ? r1r2 ( r F(p)
) ?p1?p2?r G(p1,r) ? G(p2,r) ? p1p2
( F is injective ) ?p ?r G(p,r) ?
Bouter(p)
(Fs domain is included in Bouter) ?p
Bouter(p) ? ?r G(p,r)
(Bouter is included in Fs domain)
F is the Skolem function that validates this
constraint
67
Stratified-Witness Constraints(with L.P.)
Full dependencies no existential quantifier. The
chase always terminates. Beyond this? Given set
C of dependencies --gt define chase flow
graph Nodes correspond to relation components
an R or arity 3 produces 3 nodes. Edges are drawn
between ith of R and jth of S iff R appears on
the left side and S appears on the right side of
the implication of some dependency. The edge is
labeled ? if the corresponding variable in S is
existentially quantified. C is
stratified-witness if there is no cycle with an
?-labeled edge Proposition The chase with
stratified-witness constraints always terminates.
68
(Relational) Conjunctive Queries
Q(x,z) R(x,y,z) , R(y,x,u) ,
S(z,u) select r1.A , s.A from R r1 , R r2
, S s where r1.Ar2.B and r1.Br2.A and
r1.Cs.A and r2.Cs.B notation r
stands for r1 , , rn queries select
O(r) from R r where C(r)
69
(Relational) Dependencies a.k.a Integrity
Constraints
?(r?R) B(r) ? ?(s?S) C(r,s)
B and C are conjunctions of equalities,
as in where clause example ?(r1?R)(r2?R)
r1.E r2.E ? ?(s?R) s.D r1.D
? s.E r1.E ? s.F r2.F
70
Query Containment and Dependencies
Q1 select O1(r1) from R1 r1 where
C1(r1) Q2 select O2(r2) from R2 r2
where C2(r2) define cont(Q1,Q2) as
?(r1?R1) C1(r1) ?
?(r2?R2) C2(r2) ? O1(r1)O2(r2) we have, in
each instance Q1 Q2 iff
cont(Q1,Q2)
71
And Viceversa
d ?(r?R) B(r) ? ?(s?S) C(r,s)
front(d) select r from
R r where B(r) back(d) select r
from R r , S s where B(r) ? C(r,s)
we have, in each instance d
iff front(d) back(d)
72
Chase Step
d ?(r?R) B(r) ? ?(s?S) C(r,s)
select O(r) select
O(r) from R r
from R r , S s where B(r)
where B(r) ? C(r,s) basic fact
Q Q ? Q d Q the chase
step is applicable if Q is not trivially
equivalent to Q (for example, we cannot chase
Q with d ! )
73
Using the Chase
basic fact if chase step of Q with d is
not applicable then Inst(Q)
d ( canonical instance Inst(Q) built from query
Q ) Basic Theorem D set of dependencies
Q1 . . . chaseD(Q1) terminating chase
sequence
(no more applicable steps) Then Q1
D Q2 iff chaseD(Q1) Q2
74
Reformulation with Views
a view is just a query V select
O(r) from R r where C(r) Reformulation
of query Q(R) with view V finding
X(R,V) such that Q(R) V X(R,V)
75
One View Two Dependencies
V select O(r) from R r where C(r) the
chase-in dependency cV ?(r?R) C(r) ?
?(x?V) xO(r) the backchase dependency bV
?(x?V) ?(r?R) C(r) ? xO(r) It turns out
that if rewritings of Q with V exist
then such a rewriting can be obtained by chasing
Q with cV
76
The Chase and Backchase (CB) Algorithm(joint
work with Lucian Popa, IBM Almaden)
The chase with cV always terminates. The search
space for rewritings of Q with V consists of the
subqueries of chasecV(Q). ( S is a
subquery injective homomorphism from S to
chasecV(Q) ) Keep only subqueries such that
S V chasecV(Q) This can be checked by
(back!)chasing with cV, bV (also terminating)
77
Preliminary Completeness Result for CB(with
L.P.)
Theorem Any scan-minimal reformulation of Q
with V is a subquery of
chasecV(Q). scan-minimal no scan (from
item) can be removed without compromising
equivalence with Q. Fewer scans means faster
execution under most cost models.
78
Additional Integrity Constraints
In general the storage schema contains integrity
constraints that restrict its class of instances
(models). This may extend the set of
reformulation solutions! Let C be a set of
dependencies Reformulating query Q(R) with
view V under C finding X(R,V) such that
Q(R) V,D X(R,V). Thats the same as
reformulating Q under C cV bV Can we still
use the chase?
Write a Comment
User Comments (0)
About PowerShow.com