Title: Rewriting Nested XML Queries Using Nested Views
1Rewriting Nested XML Queries Using Nested Views
- Nicola Onosejoint work withAlin Deutsch,
Yannis Papakonstantinou, Emiran
CurtmolaUniversity of California, San Diego
2The problem
INTRO
query result
Can we answer Q using only view access paths?
the query Q
docVn
docV1
V1
Vn
Input XML data
- views defined by queries V1, , Vn and
materialized as docV1, , docVn
3The problem
INTRO
query result
the rewritingquery R
?
the query Q
docVn
docV1
V1
Vn
Input XML data
- views defined by queries V1, , Vn and
materialized as docV1, , docVn - is there a query R such that R(V1(Input)
Vn(Input)) Q(Input)?
4Motivation caching indexes
INTRO
query result
the rewritingquery R
the query Q
docVn
docV1
materialized views, faster to access than the
original input
V1
Vn
Input XML data
- caching answer new queries using results of
previously answered ones - (partial) indexes materialized references to
frequently accessed parts of the data
5Motivation security views
INTRO
query result
the rewritingquery R
?
the query Q
docVn
docV1
V1
Vn
security views(permitted queries)
Input XML data
- checking existence of R ? security problemallow
only queries that can be expressed in terms of
certain permitted queries, the security views
6Motivation data integration
INTRO
query result
the rewritingquery R
the query Q
source1
sourcen
local/global mappings expressed as views
Virtual global DB
- data integration given a query expressed in
global terms, rewrite it using the descriptions
of the particular sources
7Rewritings enabled by pattern matching
INTRO
- Previous literature find parts of the query that
are precomputed by the views. - How to decide that match the patterns of the
views into the query - In the relational case, patterns were tableaux,
conjunctive queries - For XPath tree patterns
- Matching XML queries?
- (until recently) no pattern based description of
XQuery semantics - Nested XML Tableaux (NEXT) come to fill the
gapThe NEXT Logical Framework for XQuery,
A.Deutsch et al., VLDB04
8Scope of Our Approach
INTRO
Tree Patterns ? cover XPath
NEXT ? extend TreePatterns with
- nested for-loops - joins
- element construction etc.
NEXT ? extends NEXT to the whole XQuery
language, including -
function calls - universal
quantification - disjunction,
negation etc.
- Nested XML Tableaux (NEXT) extend previous work
on tree patterns. - NEXT extends NEXT to the whole XQuery.
9Scope of Our Approach
INTRO
Tree Patterns ? cover XPath
NEXT ? extend TreePatterns with
- nested for-loops - joins
- element construction etc.
NEXT ? extends NEXT to the whole XQuery
language, including -
function calls - universal
quantification - disjunction,
negation etc.
completeness guaranteeif a rewriting exists, we
will find one
soundness guaranteeif a rewriting is found, it
is equivalent to the original query
10Rewriting using views example
INTRO
Query Q group titles by author for each
distinct author, output the titles of his/her
books
View V group authors by title for each book,
output its title and the list of authors
The result of the view is cached and has faster
access time than getting the data directly from
the source
bib.xml
book
Rewriting R scan the view and create an entry
for each distinct author in the view output add
to it all the titles of the respective author
author
title
?
?
?
Data on the Web
11Rewriting using views example
INTRO
Query Q group titles by author for each
distinct author, output the titles of his/her
books
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
Previous work captures - XPath navigation
Rewriting R scan the view and create an entry
for each distinct author in the view output add
to it all the titles of the respective author
12Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
- Previous work captures - XPath navigation
- NEXT captures - XPath navigation
- nested for loops
- joins
- element construction etc.
13Rewriting using views example
INTRO
Query Q group titles by author for in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
a
a1
- Previous work captures - XPath navigation
- NEXT captures - XPath navigation
- nested for loops
- joins
- element construction etc.
14Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
- Previous work captures - XPath navigation
- NEXT captures - XPath navigation
- nested for loops
- joins
- element construction etc.
15Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
bib.xml
bound to the root of the view output
book
Rewriting R for a3 in distinct-values(docV/autho
rlisttitle/author) return ltbibentrygt a3,
for p in docV/authorlist,
t3 in p/title where
some a4 in p/author
satisfies a4 eq a3 return t3
lt/bibentrygt
author
title
navigate inside the view output
?
?
?
Data on the Web
16Outline
- NEXT (NEsted XML Tableaux)
- Rewriting Algorithm and Extensions
- Experiments
- Previous Work
- Conclusions
17Outline
- NEXT (NEsted XML Tableaux)
- Rewriting Algorithm and Extensions
- Experiments
- Previous Work
- Conclusions
18Architecture of the NEXT framework
NEXT
XQuery query and views
Normalization
patterns
Nested XML Tableaux (NEXT)
presented at this conference
VLDB04
Logical Optimization
Rewriting Using Views
Minimization
Nested XML Tableaux (NEXT)
Logical Plan
Translate to XQuery
Plan Execution Engine
To Any XQuery Processor
19The need for normalization
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
20Normalization into NEXT
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, a1 in
b/author, t in
b/title where a1 eq a
return t
lt/bibentrygt
21Normalization into NEXT
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, a1 in
b/author, t in
b/title where a1 eq a
groupby b, t return t
lt/bibentrygt
cardinality?
NEXT
22NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
- graphical representation of NEXT nested patterns
forest of tree patterns
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
23NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
- graphical representation of NEXT nested patterns
descendant navigation
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
child navigation
book(b1)
B2(V)
a2
author(a2)
a2
24NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
- graphical representation of NEXT nested patterns
return function
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
25NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
- graphical representation of NEXT nested patterns
list of groupby variables
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
26NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
Query Q
for b0 in doc//book, t0 in b0/title, a in
b0/author groupby a return ltbibentrygt a,
for b in doc//book, a1 in
b/author, t in b/title where
a1 eq a groupby b,t
return t lt/bibentrygt
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B1(Q)
B2(V)
B2(Q)
- graphical representation of NEXT nested patterns
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(V)
book(b1)
book(b0)
B1(Q)
b1,t1
title(t1)
a
title(t0)
author(a)
doc
book(b1)
B2(V)
book(b)
B2(Q)
a2
t
author(a2)
a2
b, t
title(t)
author(a1)
27NEXT Patterns
NEXT
- alternative way of defining the XQuery semantics
(but equivalent to the standard), given by
matching patterns
View V
Query Q
for b0 in doc//book, t0 in b0/title, a in
b0/author groupby a return ltbibentrygt a,
for b in doc//book, a1 in
b/author, t in b/title where
a1 eq a groupby b,t
return t lt/bibentrygt
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
- graphical representation of NEXT nested patterns
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(V)
book(b1)
book(b0)
B1(Q)
b1,t1
title(t1)
a
title(t0)
author(a)
doc
book(b1)
B2(V)
book(b)
B2(Q)
a2
t
author(a2)
a2
b, t
title(t)
author(a1)
28Outline
- NEXT (NEsted XML Tableaux)
- Rewriting Algorithm and Extensions
- Experiments
- Previous Work
- Conclusions
29Architecture of the NEXT framework
NEXT
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
rewriting algorithm
Logical Optimization
Rewriting Using Views
Minimization
Nested XML Tableaux (NEXT)
Logical Plan
Translate to XQuery
Plan Execution Engine
Independent XQuery Processor
30Overview of the Rewriting Algorithm
REWRITING ALGORITHM
- Input query Q, views V
- detect alternative access paths towards the
variable bindings through the views - build a candidate rewriting R that uses only the
access paths from phase 1. - check that R is equivalent to Q
Query Q
Access paths through V
Access paths(candidate rewriting)
31Step 1 Detect View Access Paths
REWRITING ALGORITHM
- access paths ways of accessing data using the
view - identify matching subqueries(extended tree
pattern matching) - find a mapping and add navigation from the view
return
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
book(b1)
book(b0)
title(t1)
title(t0)
author(a)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
32Step 1 Detect View Access Paths
REWRITING ALGORITHM
- access paths ways of accessing data using the
view - identify matching subqueries(extended tree
pattern matching) - find a mapping and add navigation from the view
return
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
title(t2)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
extended query
33Step 1 Detect View Access Paths
REWRITING ALGORITHM
- access paths ways of accessing data using the
view - identify matching subqueries(extended tree
pattern matching) - find a mapping and add navigation from the view
return - and another one
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
author(a3)
title(t2)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
extended query
34Step 1 Detect View Access Paths
REWRITING ALGORITHM
- access paths ways of accessing data using the
view - identify matching subqueries(extended tree
pattern matching) - find a mapping and add navigation from the view
return - and another one
- computing all such mappings ? query extension
that uses only view access paths
doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
book(b1)
book(b)
authorlist(p)
a2
author(a2)
author(a1)
author(a4)
title(t)
title(t3)
query extension
view
query body
extended query
35Step 2 Candidate Rewriting
REWRITING ALGORITHM
- same return function as the initial query, but
with other variable bindings
original query
doc
docV
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(Q)
book(b0)
authorlist(p0)
a
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
B2(Q)
book(b)
authorlist(p)
t
b, t
author(a1)
author(a4)
title(t)
title(t3)
extended query
36Step 2 Candidate Rewriting
REWRITING ALGORITHM
- same return function as the initial query, but
with other variable bindings
candidate rewriting
original query
doc
docV
ltbibentrygt a3, B2(R) lt/bibentrygt
B1(Q)
book(b0)
authorlist(p0)
B1(R)
a
a3
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
B2(Q)
book(b)
authorlist(p)
B2(R)
t3
b, t
t3
author(a1)
author(a4)
title(t)
title(t3)
37Step 3 Equivalence Check
REWRITING ALGORITHM
- check that R Q containment mappings defined on
the tree of query blocks - and then (optional step) translate back to XQuery
Rewriting R for a3 in distinct-values(docV/au
thorlisttitle/author) return ltbibentrygt a3,
for p in docV/authorlist,
t3 in p/title where
some a4 in p/author
satisfies a4 eq a3 return p
lt/bibentrygt
docV
ltbibentrygt a3, B2(R) lt/bibentrygt
authorlist(p0)
B1(R)
a3
title(t2)
author(a3)
docV
authorlist(p)
t3
B2(R)
t3
title(t3)
author(a4)
38Under the Hood
REWRITING ALGORITHM
- two types of equality by value and by node id
- mappings must take it into consideration
- the groupby clause also
- XQuery results have order. We consider rewritings
that - do not respect order (for DB-centric
applications) - respect order (for text-centric applications)
- for rewritings that respect order look for an
ordering of the view access paths that preserves
the original query order (details in the paper)
39Extensions to NEXT
REWRITING ALGORITHM
- Extended NEXT to NEXT
- extend the pattern based representation to the
whole XQuery - functions and other expressions (negation,
disjunction, aggregates etc.) modeled as
uninterpreted functions - Extended the algorithm to use NEXT need to
identify maximal subparts that are pure NEXT
blocks
for x in doc/book where count( for
a in x/author where x/price eq 60
groupby a return a ) eq count(
) groupby x return x
40Extensions to NEXT
REWRITING ALGORITHM
- Extended NEXT to NEXT
- extend the pattern based representation to the
whole XQuery - functions and other expressions (negation,
disjunction, aggregates etc.) modeled as
uninterpreted functions - Extended the algorithm to use NEXT need to
identify maximal subparts that are pure NEXT
blocks.
for x in doc/book where count( for
a in x/author where x/price eq 60
groupby a return a ) eq count(
) groupby x return x
rewrite outer block, disregarding function calls
rewrite blocks inside function arguments, with
free variables bound in upper blocks
41Formal Guarantees
REWRITING ALGORITHM
- The rewriting algorithm is sound
- and complete for a large fragment of XQuery (the
one that can be translated into NEXT), without
order - Completeness means that if there are any
rewritings, we are guaranteed to find at least
one. - There is no hope for completeness for
- ordered rewritings equivalence is undecidable
- expressions beyond NEXT negation and universal
quantification also lead to undecidability - ?In these cases, our algorithm is a best effort
approach, with guaranteed soundness.
42Implementation (considerations)
REWRITING ALGORITHM
- completeness guarantees ? a price to
paycompute mappings between view and query
patterns - in general, NP-complete, but PTIME if the
patterns are trees (no equality conditions)
based on M. Yanakakis, Algorithms for acyclic
database schemes, 1981 - our goal design an implementation whose running
time is polynomial for pure tree patterns and
degrades progressively with the number of added
joins
43Implementation in practice
REWRITING ALGORITHM
V
Q
..
mappings
compile
compile
XML instance
query plan (SPJ)
evaluate
- when computing the query plan, apply techniques
from the Yanakakis algorithm push projections
selections - performance degrades with the number of
equalities the problem is NP-complete in the
width of the view pattern (see the paper) and in
PTIME when no join equalities.
44Outline
- NEXT (NEsted XML Tableaux)
- Rewriting Algorithm and Extensions
- Experiments
- Previous Work
- Conclusions
45Experiments Design
EXPERIMENTS
- The running time of the algorithm increases with
- number of nested levels mappings are block by
block - size of the pattern of mapped and target nodes
increases - number of views more patterns to match
- Our experiments measured how the algorithm scales
with these parameters. - We designed a configuration where we generated
queries and views of increasing size and nesting
depth.
46Experiments Implementation
EXPERIMENTS
Queries views with similar basic patterns, in a
vertical chain of blocks
doc
doc
block Bk
mk
mk
..
c1
a
c2
a
doc
doc
doc
block Bk1
basic pattern
mk
mk1
mk1
..
c1
a
c2
a
ci
a
- Irrelevant views dont matter (can be quickly
discarded). ? We create only relevant views (with
mappings into query) - split the query recursively into fragments
views - make them overlap on basic patterns
47Experiments Good Scalability
EXPERIMENTS
d depth ( of nested levels in a query) b
breadth ( of basic patterns in a block)
1.25s for d16, b16 and 128 views
48Previous work
- rewriting XPath queries using XPath
viewsRewriting XPath Queries Using Materialized
ViewsW.Xu et al. VLDB 2005 - rewriting XQuery using XPath viewsA Framework
for Using Materialized XPath Views in XML Query
ProcessingA. Balmin et al. VLDB 2004 - rewrite an XQuery with only one XQuery view that
has to contain the queryACE-XQ A CachE-aware
XQuery Answering SystemL.Chen et al. WebDB 2002 - caching common XQuery subexpressionsImplementing
Memoization in a Streaming XQuery
ProcessorY.Diao et al. XSym 2004
49Conclusions
- NEXT is a pattern based representation that
describes what the query result is and not how it
is computed ? more opportunities for semantic
optimizations - extensible to all of XQuery, using NEXT
- rewriting using views algorithm
- sound for the whole language
- complete for a large fragment of XQuery
- good scalability
- independent of the underlying algebra of the
query processor
50Online Demo