Title: NTUA
1 - XML Query Reformulation
- Val Tannen
- University of Pennsylvania
- Joint work with Alin Deutsch, UC San Diego
- and in part with Lucian Popa, IBM Almaden
2Data Exchange Between Businesses Using XML
published data
published data
pharmaceutical company
insurance company
published data
published data
hospital
3XML?
ltdruggt ltnamegtaspirinlt/namegt
ltpricegt4lt/pricegt ltnotesgt
ltside-effectsgtupset stomachlt/side-effectsgt
ltmakergtBayerlt/makergt lt/notesgt lt/druggt
text
4A Simple Publishing Scenario
virtual data
ltstudygt ltcasegt ltdiaggtmigrainelt/diaggt
ltdruggtaspirinlt/druggt
ltusagegt2/daylt/usagegt lt/casegt ltcasegt
ltdiaggtallergylt/diaggt
ltdruggtcortisonelt/druggt ltusagegt3/daylt/usage
gt lt/casegt lt/studygt
patient name is hidden
XML query language standard (draft)
published data
proprietary data
prescription
patient
usage drug name 2/day
aspirin John 3/day cortisone
Jane
name diagnosis
John migraine Jane
allergy
How to express the view?
View query which, if executed,
would produce the virtual data
How to compose the client query with the
view, obtaining the reformulation?
5The General Problem of Query Reformulation
client
query Q(P)
? reformulated query X(S)
schema P
schema S
schema correspondence
soundness
Given query Q(P), find query(ies) X(S) returning
same answer,
whenever such X(S) exists
completeness
6Applications of Query Reformulation
- data publishing
- data integration
- schema evolution
- data security
we just saw it public schema / storage schema
P
S
global schema / local schema
P
S
old schema / new schema
P
S
illustrated next
7An Application Data Security
client
query E(S) (exposes secret data correlation)
public schema P
proprietary schema S
schema correspondence
Only possible if Completeness Property holds!
8More Complicated Data PublishingMixed And
Redundant Storage (MARS)
initial configuration
9An Example With Tuning
XML
XML
drug,usage,diagnosis
simple publishing view
identity view
XML
drug,price,notes
drug,usage,name
name,diagnosis
10Redundancy Enables Multiple Reformulations
client query find how much each treatment costs
XML
XML
drug,usage,diagnosis
simple publishing view
identity view
cached query
relational view
XML
XML
drug,price,notes
drug,price
drug,usage,name
name,diagnosis
diagnosis,drug
Some reformulations are potentially cheaper to
execute than others. Want to find an optimal
one!
11Schema Correspondence Expressible in XQuery
The DB administrator must be able to specify the
correspondence.
XML
XML
XQuery
XQuery
XQuery
XQuery
XML
XML
encode
encode
XML
XML
Can use XQuery, fixing any of the common
encodings of relational tables in XML.
12XQuery?
binding part
drug
for d in document/drug,
m in d//maker return ltproducedBygtm/text()lt/p
roducedBygt
name
price
notes
aspirin
4
side-effects
maker
tagging template
upset stomach
Bayer
// (descendant) is the transitive closure of /
(child)
Result should contain ltproducedBygtBayerlt/produce
dBygt
13Approach XQuery Reformulation Reduced to
Relational Reformulation
14XQuery Semantics
Variable binding stage
for d in document/drug, m
in d//maker return ltproducedBygtm/text()lt/p
roducedBygt
XML data model is a tagged tree
ltdruggt ltnamegtaspirinlt/namegt
ltpricegt4lt/pricegt ltnotesgt
ltside-effectsgtupset stomachlt/side-effectsgt
ltmakergtBayerlt/makergt lt/notesgt lt/druggt
tagging stage
15Compiling the Binding Part of XQueries to
Relational Queries
XBind query binding part of XQuery (returns a
relation tuples of variable bindings)
a relational conjunctive query
compiles to P(d,m) - Root(r) , child(r,d)
, tag(d,drug) ,
desc(d,x) , child(x,m) , tag(m,maker)
But not all models of this schema correspond to
the intended model need GReX !
16Sample Constraints from GReX
- Relationship between child and descendant
navigation - ?x?y child(x,y) ? desc(x,y)
desc contains child - ?x el(x) ? desc(x,x)
desc is reflexive - ?x?y?z desc(x,y) ? desc(y,z) ?
desc(x,z) desc is transitive - Tagged tree structure of XML
- ?r?x root(r) ? desc(x,r) ? x r
root has no ancestors - ?x?y?z child(x,z) ? child(y,z) ? x y
at most one parent
These do not capture transitive closure
completely, nor is it possible to do it in
first-order logic STILL...
17More Constraints from GReX
- (some Tag) ?x el(x) ? ?t tag(x,t)
every element has a tag - (oneTag) ?x?t1?t2 tag(x,t1) ? tag(x,t2) ?
t1 t2 one tag per element - (noLoop) ?x?y desc(x,y) ? desc(y,x) ? x
y no non-trivial cycles - (noShare) ?x?y?u?v child(x,u) ? child(x,v)
? unique path between - desc(u,y)
? desc(v,y) ? u v elements - (inLine) ?x?y desc(x,u) ? desc(y,u) ?
ancestors of an element - x y ?
desc(x,y) ? desc(y,x) are collinear
18Which Reformulations Do We Find This Way?
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
all of them?
19Restrictions on XQuery
- Main restriction no aggregates (to be
investigated) - Leaving out aggregates, most common queries can
be processed. - Minor restrictions
- no user-defined functions (of course!)
- limited use of negation (or else the problem
becomes undecidable) - limited use of document order (to be
investigated) - no navigation to parent or wildcard child (of
unspecified tag) (unintuitive, but we can show
that this needs another algorithm, unless NP ? 2)
p
20The Reduction is Sound and Complete
- For the restricted XQuery fragment,
- Given
- - XBind query B
? compiled to a relational query
c(B) - - schema correspondence C given by XQueries ?
compiled to set of constraints c(C)
Relative Completeness Theorem R
is a minimal reformulation of B under
C iff
c(R) is a minimal reformulation of
c(B) under c(C) and GReX
R can be computed from c(R)
All of them are found by CB.
21A Glimpse at the ChaseTransforming Queries
Using Constraints
A query find data satisfying condition A
A
Q
The chase repeatedly applying chase steps until
no new conditions can be added
In general, Q and Q1 are not equivalent, but in
all DBs satisfying the constraint, they are!
Theory of the chase 20 years old, deep and rich,
due to Beeri, Maier, Mendelson, Sagiv, Vardi,
Yannakakis and others!
22How Do We Use the Chase?Capturing Relational
Views With Constraints
Let the schema correspondence be the view
retrieve the data satisfying conditions A and
B
V
A
B
all data satisfying A and B appears in
result of V
all data appearing in V satisfies A and B
23Chase Backchase
First chase
A
Q
Next inspect all subqueries (syntactic pieces)
of the chase result Q2
SQ
V
It turns out that SQ is equivalent to Q
Presence of constraint A ? B allows reformulation
24General CB Algorithm (joint work with Lucian
Popa, IBM Almaden)
- (public) schema P , (proprietary) schema S
- Let C be a set of constraints. (eg., on P
and/or P S )
Assume some terminating chasing sequence
Q(P)
25Two Sets of Experiments
- Synthetic queries
-
- reformulation time as function of query
complexity - XML analog of relational star queries,
increasing number of joins - can very complex queries still be
reformulated in a practical amount of time ? - Realistic queries from the XML Benchmark
Project http//monetdb.cwi.nl/xml - The Queries 20 queries designed to
exercise interesting features of XQuery - The Schema correspondence views in both
directions -
compiles to about 200 constraints!
Much more than in typical relational schemas!
26Experiments with Synthetic Queries
Number of joins (number of corners in the star)
27Experiments with Benchmark Queries
Reformulation times must be understood in
conjunction with execution times (eg., tens of
seconds for Q10)
28Summary of Contributions
- MARS, a system for XQuery reformulation,
- - with mixed and redundant storage, under
integrity constraints. - - complex schema correspondence (views in both
directions) - Showed practical relevance of CB method
(feasible and worthwhile) -
- A completeness result for a significant fragment
of XQuery and a large - class of schema correspondences. The method
remains sound for the full language. - A reduction between minimal reformulation and
query equivalence, and - we gave matching lower bounds showing our
chase-based decision procedure is - asymptotically optimal for the fragment
considered. -
29The End
30Why XML?
- The relational data model is still the dominant
concept in databases. - All data can be coded into tables.
- (For that matter into (goedel)numbers too!)
- Artificial coding makes life harder for query
programmers. - Result less productivity, more bugs.
- XML is much more flexible. It is also
self-describing, i.e., no - need apriori for types/schemas (but this is
sometimes a bad idea). - It came from the document community (tagged text)
- and was cheered by industry gurus. So we have to
live with it. - (Although one can image better data models)
31Making It Work
- Chase each chase step is similar to evaluation
of a recursive Datalog rule on a - symbolic database built from
the query - ? we borrowed classical query
processing techniques
Backchase size of search space is O(2u), u
size of universal plan We
found criteria for pruning this space.
- compiling constraints to join tree
- joins implemented as hash-joins
- pushing selections into joins
- Cost-independent prune subqueries that
- - do not correspond to legal XML queries
- - contain redundant descendant navigation
steps
bottom-up exploration of subqueries first
all performing 1 navigation step, next all
performing 2 navigation steps, etc.
Perform contiguous navigation steps starting from
the root
x child-of y, y child-of z, x descendant-of z
- A cost-based pruning strategy parameterized by
costing model
- finds optimal reformulation for any monotonic
cost model - cost models for XML are still under
research - heuristic cost model cost is
number of table scans/XML navigation steps
performed - amenable to experimenting with
other cost models
32Benefit of Reformulation For Execution Time
no. of elements in document
Benefit increases with increasing complexity of
query and increasing database size
33More Results for Benchmark Queries
Delta to finish search
Delta to best reformulation
Time to first reformulation
For redundancy materialized the XBind query for
each query
(particular case of Acess Support Relation)
Time to find first reformulation is essentially
the same as in the absence of redundancy. Addition
al time spent only for finding optimal one.
34Related WorkData Integration As Particular Case
of MARS Applications
Global As View (GAV)
Q
XQ o CR
P
(global schema)
CR
S
(local schema)
with Fernandez and Suciu in SIGMOD99
reformulation by composition-with-views
TSIMMIS, SilkRoute, XPeranto
35Future Work Directions
- Short-Term
- - tuning of CB implementation for further
speedup - - XML-specific strategies for pruning the
backchase stage - - in particular, finding a good cost model to
perform cost-based pruning -
- Medium-Term
- - Applying CB to Data Security
- - Applications to Adaptive Distributed Query
Optimization - Long Term
- - a unified framework for integrating data from
various, heterogenous sources going - beyond classical databases (XML/relational/LDAP
web forms web services)
36Application 3 Schema Evolution (e.g. Caching)
Goal support existing client applications even
after changing the schema
client
old query Q (O)
old schema O
new schema N
schema correspondence
could be O extended with cached results
37A Source of Redundancy Relational Storage of XML
catalog
drug
drug
name
price
notes
price
notes
name
50
aspirin
cortisone
4
38Containment Under Integrity Constraints
- Decision procedure for containment is based on
chasing with constraints from GReX. - Natural extension to XML integrity constraints.
- Some results
- Containment of well-behaved XPath/XBind queries
under bounded simple XML integrity constraints
(SXICs) is decidable (used in relative
completeness theorem). - Even modest use of unboundedness makes the
problem undecidable. - Corollary containment under bounded SXICs and
DTDs is undecidable. - Containment under DTDs only is an open problem,
but we have a PSPACE lower bound. - See proposal for details.
39LDAP
40The Very End
41The Architecture of Our Solution
client XQuery
defined next
Mappings (?) as XQueries rel/XML encodings
schema correspondence
not shown here
reformulated queries (multiple solutions)
42- Problem
- XML/MARS XQuery Reformulation
- schema correspondence given by views in both
directions - multiple solutions
43Capturing Relational Views With Constraints
Let the schema correspondence be a view defined
as the relational conjunctive query V(x,z) -
A(x,y), B(y,z)
Capture the definition with constraints,
(cV) ?x ?y ?z A(x,y) ? B(y,z) ? V(x,z)
(bV) ?x ?z V(x,z) ? ?y A(x,y) ? B(y,z)
44Partially capturing the XML model
- Partially, because some features cannot fully be
captured with constraints - descendant is the transitive closure of child,
but this is not FO-definable - neither is the treeness property
-
- our solution
- add a set of constraints GREX to approximate
intended models - it turns out that capturing descendant
helps in capturing treeness - then, we define a significant XQuery fragment
(we call it well-behaved) - that cannot distinguish between
intended and approximate models
45Constraints in GReX (2) the tagged tree
structure of XML
- (topRoot) ?r?x root(r) ? desc(x,r) ? x r
root has no ancestors - (oneTag) ?x?t1?t2 tag(x,t1) ? tag(x,t2) ?
t1 t2 one tag per element - (noLoop) ?x?y desc(x,y) ? desc(y,x) ? x
y no non-trivial cycles - (oneParent) ?x?y?z child(x,z) ? child(y,z) ? x
y at most one parent - (noShare) ?x?y?u?v child(x,u) ? child(x,v)
? unique path between - desc(u,y)
? desc(v,y) ? u v elements - (inLine) ?x?y desc(x,u) ? desc(y,u) ?
ancestors of an element - x y ?
desc(x,y) ? desc(y,x) are collinear
46XQuery Restrictions
- What it allows
- composition of navigation
steps, - navigation axes self,
(named)child, descendant, ancestor, idrefs - qualifiers path,
string ? path, and, or, path
equality/inequality - where clause
disjunction, path equality/inequality, -
existential quantification - What it rules out
- user-defined functions,
- range, before predicates,
- aggregates, arbitrary
negation, universal quantification, - concatenation (,)
- navigation to parent (..) or
to child of unspecified name ()
47CB Completeness
- Let C be a set of constraints (relates public
schema P and proprietary schema S) -
- C-minimal query
- removing any of its relational atoms
produces non-equivalent query under D - Q1 is a subquery of Q2
- Q1 is isomorphic to a piece of Q2
Q(P)
Completeness Theorem Any C-minimal reformulation
of Q is a subquery of U
48A Completeness Result for Our Solution
- Given
- - well-behaved XBind query B
- compiled to a relational query c(B)
- - schema correspondence M given by well-behaved
XQueries (in both directions), - compiled to set of relational
constraints c(M) - - bounded XML integrity constraints XIC,
- compiled to set of relational
constraints c(XIC)
a class of XML integrity constraints, see
KRDB01
Relative Completeness Theorem for any R
R is a (MXIC)-minimal
reformulation of B
iff c(R) is
a (GReX ? c(M) ? c(XIC))-minimal reformulation of
c(B)
All of them are found by CB. Corollary
completeness of reformulation algorithm for XBind
queries
R can be computed from c(R)
49Capturing XML Semantics
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
50Summary of Constraints Used in CB Phase
- Built-in constraints in GReX
- Relational views compile to inclusion
constraints - XQuery views
- their XBind queries compile to inclusion
constraints as for relational views - their return clause compiles to several
decorrelated queries, each captured with
constraints - the XML template in the return clause compiles to
several Skolem and copy functions, each compiled
to constraints - Integrity constraints
- XML constraints compile to relational constraints
- relational schema constraints
51Are the Restrictions Justified?
- Our completeness result holds for well-behaved
XQueries, under bounded - XML integrity constraints.
- What about reformulating
- XQueries with parent and wildcard child
navigation? - Under other XML integrity constraints?
- Even under full-fledged DTDs?
- For such extensions, we make a deeper study of
equivalence, which is an even simpler problem in
reformulation. - The equivalence checker is invoked as black-box
algorithm during CB.
52 XBind (includes XPath) Fragments
Equivalence
path concatenation, attribute values navigation
axes self, (named)child, descendant qualifiers
path, string ? path, and
PTIME
join on attribute variables
NP-complete
any or all (!) of the following .
disjunction . ancestor navigation .
path equality . wildcard child (?)
navigation parent, preceding(following)-sibling
53Containment for the well-behaved fragment of
XBind/XPath
Theorem B1 , B2 XBind/XPath queries from our
well-behaved fragment c(B1) , c(B2) their
relational compilation B1 is
equivalent to B2 iff c(B1) is
equivalent to c(B2) under GReX
decidable in P2p using chase
This result about containment is used in the
relative completeness theorem
54Extensions of the NP fragment ?2p fragments
- any or all (!) of the following make equivalence
?2p-complete -
- disjunction
- unsurprising conjunctive queriesunion
already ?2p-complete SY80
- ancestor navigation
- translate ancestor away introducing union
/a/b/ancestor ? /a/b ? /ab - path equality qualifier
- can simulate ancestor
//..//./p/s ? /p/ancestor/s - wildcard child navigation
- union introduced by interaction //??
//a ? /a ? /?//a
Not well-behaved, but we have a different
decision procedure
55Experimental Setup Started From the XML Benchmark
- Used the official XML Benchmark Project
http//monetdb.cwi.nl/xml - The application domain an online auctioning
application. - The published schema a DTD given by the XML
Benchmark Project - Data is partially nicely structured.
- The Queries 20 queries designed
to exercise interesting features of XQuery
56What We Added to the XML Benchmark Setup
The mixed storage schema
relationally person, item, open auction,
closed auction, etc.
unstructured part annotations on items The
redundancy materialized the XBind query for
each query
(particular case of Acess Support Relation) The
mappings in both directions
relations ? XML, XML ? XML
It all compiles to about 200
constraints !
Much more than in typical relational schemas! Had
to change original implementation SIGMOD00 to
scale.
57Related Work
- Publishing systems
- Schema mapping proprietary relational ?
published XML SilkRoute, Xperanto - reformulation by composition-with-views.
- Schema mapping published XML ? proprietary
relational STORED, Agora - reformulation by rewriting-with-views
- Information Integration
- TSIMMIS (composition-w-views), Information
Manifold (rewriting-w-views) - Containment
- Miklau and Suciu, smaller fragment of
XPath(they too find that is naughty - FLS, CGLV - conjunctive regular path
queries - Amer-Ahia and Srivastava - minimization of
tree pattern queries - Containment under integrity constraints
- XML keys BDFHT description logics CGL
58Query Reformulation in Data Publishing
public schema P (virtual data)
schema interface against which
queries are formulated
publishing query (may hide some proprietary
data)
proprietary storage schema S (materialized data)
59Compiling the Binding Part of XQueries to
Relational Queries
But, over arbitrary DBs with this schema, the
relational translation of Root ? desc ?
desc is not equivalent to that of
Root ? desc
must communicate to the CB that desc table is
transitive
60The Challenge for Reformulation on MARS
- To find the reformulations efficiently, we need
to - reason with schema correspondence
- efficiently construct the search space for
reformulations - - must contain all reformulations (for
completeness) - explore search space
- - exhaustively (for security applications)
- - maybe trading optimality of reformulation for
search speed - (for optimization purposes)
-
61Contributions
- A novel algorithm for reformulation of relational
queries under relational constraints - Chase Backchase
Uses this semantics and exploits CB
- A declarative semantics for most of XQuery
VLDB99 with Popa and Tannen SIGMOD00 with
Popa, Sahuguet and Tannen
- A reformulation algorithm for XQuery
- practical (feasible and worthwhile)
- complete for most of XQuery
- optimal (we show lower bounds for various XQuery
fragments KRDB01, DBPL01)
- MARS a system for XQuery reformulation over
Mixed And Redundant Storage - constructs and represents search space
efficiently - cost-based exploration strategy parameterized by
traditional costing module - finds first reformulation fast
- Experimental evaluation time to first
reformulation, simple cost
62Compiling Client XQueries
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
63Capturing the Schema Correspondence
client XQuery
Mappings (?) as XQueries
schema correspondence
GReX built-in constraints capture XML data model
reformulated queries (multiple solutions)
64Major Obstacles in Compiling Schema Mappings to
Constraints
- Schema correspondence given by XQueries. As
opposed to relational queries, - XQueries have nested, correlated subqueries in
return clause - XQueries create new elements
- XQueries return deep, recursive copies of
input XML trees - (solution not shown)
65Compiling Nested Subqueries Decorrelation
- the query
- for p in doc(foo.xml)//person
- return ltresgtp/phone/text()lt/resgt
-
-
- is short for the nested query
- for p in doc(foo.xml)//person
- return ltresgtfor t in p/phone/text()
- return t
- lt/resgt
compile XBind parts to two decorrelated
relational queries (shown here in Datalog
syntax) Bouter(p) ? Root(r), desc(r,x),
child(x,p), tag(p,person) Binner(p,t) ?
Bouter(p), child(p,n), tag(n,phone),
text(n,t) capture each with two inclusion
constraints, as done in original CB method
66Capturing Creation of New Elements
- for p in
doc(foo.xml)//person - return ltresgtp/phone/text()lt/r
esgt - For each binding of p, a distinct ltresgt-element
is constructed.
Capture F by the relation G representing its
graph, and the constraints ?p?r1?r2 G(p,r1) ?
G(p,r2) ? r1r2 ( r F(p)
) ?p1?p2?r G(p1,r) ? G(p2,r) ? p1p2
( F is injective ) ?p ?r G(p,r) ?
Bouter(p)
(Fs domain is included in Bouter) ?p
Bouter(p) ? ?r G(p,r)
(Bouter is included in Fs domain)
F is the Skolem function that validates this
constraint
67Stratified-Witness Constraints(with L.P.)
Full dependencies no existential quantifier. The
chase always terminates. Beyond this? Given set
C of dependencies --gt define chase flow
graph Nodes correspond to relation components
an R or arity 3 produces 3 nodes. Edges are drawn
between ith of R and jth of S iff R appears on
the left side and S appears on the right side of
the implication of some dependency. The edge is
labeled ? if the corresponding variable in S is
existentially quantified. C is
stratified-witness if there is no cycle with an
?-labeled edge Proposition The chase with
stratified-witness constraints always terminates.
68(Relational) Conjunctive Queries
Q(x,z) R(x,y,z) , R(y,x,u) ,
S(z,u) select r1.A , s.A from R r1 , R r2
, S s where r1.Ar2.B and r1.Br2.A and
r1.Cs.A and r2.Cs.B notation r
stands for r1 , , rn queries select
O(r) from R r where C(r)
69(Relational) Dependencies a.k.a Integrity
Constraints
?(r?R) B(r) ? ?(s?S) C(r,s)
B and C are conjunctions of equalities,
as in where clause example ?(r1?R)(r2?R)
r1.E r2.E ? ?(s?R) s.D r1.D
? s.E r1.E ? s.F r2.F
70Query Containment and Dependencies
Q1 select O1(r1) from R1 r1 where
C1(r1) Q2 select O2(r2) from R2 r2
where C2(r2) define cont(Q1,Q2) as
?(r1?R1) C1(r1) ?
?(r2?R2) C2(r2) ? O1(r1)O2(r2) we have, in
each instance Q1 Q2 iff
cont(Q1,Q2)
71And Viceversa
d ?(r?R) B(r) ? ?(s?S) C(r,s)
front(d) select r from
R r where B(r) back(d) select r
from R r , S s where B(r) ? C(r,s)
we have, in each instance d
iff front(d) back(d)
72Chase Step
d ?(r?R) B(r) ? ?(s?S) C(r,s)
select O(r) select
O(r) from R r
from R r , S s where B(r)
where B(r) ? C(r,s) basic fact
Q Q ? Q d Q the chase
step is applicable if Q is not trivially
equivalent to Q (for example, we cannot chase
Q with d ! )
73Using the Chase
basic fact if chase step of Q with d is
not applicable then Inst(Q)
d ( canonical instance Inst(Q) built from query
Q ) Basic Theorem D set of dependencies
Q1 . . . chaseD(Q1) terminating chase
sequence
(no more applicable steps) Then Q1
D Q2 iff chaseD(Q1) Q2
74Reformulation with Views
a view is just a query V select
O(r) from R r where C(r) Reformulation
of query Q(R) with view V finding
X(R,V) such that Q(R) V X(R,V)
75One View Two Dependencies
V select O(r) from R r where C(r) the
chase-in dependency cV ?(r?R) C(r) ?
?(x?V) xO(r) the backchase dependency bV
?(x?V) ?(r?R) C(r) ? xO(r) It turns out
that if rewritings of Q with V exist
then such a rewriting can be obtained by chasing
Q with cV
76The Chase and Backchase (CB) Algorithm(joint
work with Lucian Popa, IBM Almaden)
The chase with cV always terminates. The search
space for rewritings of Q with V consists of the
subqueries of chasecV(Q). ( S is a
subquery injective homomorphism from S to
chasecV(Q) ) Keep only subqueries such that
S V chasecV(Q) This can be checked by
(back!)chasing with cV, bV (also terminating)
77Preliminary Completeness Result for CB(with
L.P.)
Theorem Any scan-minimal reformulation of Q
with V is a subquery of
chasecV(Q). scan-minimal no scan (from
item) can be removed without compromising
equivalence with Q. Fewer scans means faster
execution under most cost models.
78Additional Integrity Constraints
In general the storage schema contains integrity
constraints that restrict its class of instances
(models). This may extend the set of
reformulation solutions! Let C be a set of
dependencies Reformulating query Q(R) with
view V under C finding X(R,V) such that
Q(R) V,D X(R,V). Thats the same as
reformulating Q under C cV bV Can we still
use the chase?