Title: Containment of Nested XML Queries
1Containment of Nested XML Queries
- Xin (Luna) Dong, Alon Halevy, Igor Tatarinov
- University of Washington
2Query Containment
- The most fundamental relationship between a pair
of queries - Query Q is contained in Q if
- For any database D,
- Q(D) is a subset of Q(D)
3Applications of Query Containment
- Semantic caching
- Reasoning about contents of data sources in data
integration - Verification of integrity constraints
- Verification of knowledge bases
- Determining queries independent of updates
- Query answering using views
4Query Processing in PDMS
- XML Query Containment in Peer Data Management
System (PDMS) - Answering queries using views to extract remote
data - Removing redundant queries to enhance performance
- Tatarinov and Halevy, SIGMOD 2004
QW
QW
QW
QS
QS
QB2
QP
QB1
QB1
5Query Containment Relational v.s. XML
6Query Containment Relational v.s. XML
7Example An XML Instance
D ltprojectgt ltmembergtAlicelt/membergt lt/projectgt ltp
rojectgt ltmembergtBoblt/membergt lt/projectgt
8Example An XML Query
D
- Q
- for x in /project return
- ltgroupgt
- for y in x/member
- return
- ltnamegt
- where yAlice
- return ltAlice/gt
- where yBob
- return ltBob/gt
- lt/namegt
- lt/groupgt
Q(D)
9Example Another XML Query
D
- Q
- for x in /project return
- ltgroupgt
- for y in /project/member
- return
- ltnamegt
- where yAlice
- return ltAlice/gt
- where yBob
- return ltBob/gt
- lt/namegt
- lt/groupgt
Q(D)
10Example Tree Homomorphism and Query Containment
Q(D)
Q(D)
11Query Containment Problem
- From answer containment to query containment
- Our problems
- Given queries Q and Q, decide whether Q Q
- The complexity of query containment
Q(D) Q (D)
? Q Q
Q (D) Q(D)
? Q Q
12Previous Work (I)
- Relational query containment
- Conjunctive queries Chandra and Merlin, STOC
1977 - Acyclic queries Yannakakis, VLDB 1981
- Queries with union Sagiv and Yannakakis, JACM
1980 - Queries with negation Levy and Sagiv, VLDB 1993
- Queries with arithmetic comparisons Klug, JACM
1988 - Recursive queriesShmueli, 1993, Chaudhuri and
Vardi, 1992 - Queries over bags Ioannidis and Ramakrishnan,
1995
13Previous Work (II)
- XML query containment two new challenges
- XPath containment
- With , // and Miklau and Suciu, PODS 2002
- With equality testing on tag variablesDeutsch
and Tannen, KRDB 2001 - Conjunctive queries over path expressions
Florescu, Levy and Suciu, PODS 1998 - Nested query containment
14Containment Cannot be Determined Solely by
Comparing XPath Components
Q for g in /group where g/gname/text()
database return ltareagt for p in g/person
return ltpersongt ltnamegtp/text()lt/namegt for
q in g/paper where q/author/text()
p/text() return ltpapergtq/title/text()lt/pap
ergt lt/persongt lt/areagt
Q for g in /group return ltareagt for p in
g/person return ltpersongt ltnamegtp/text()lt/
namegt ltgroupgtg/gname/text()lt/groupgt for q
in g/paper where q/author/text() p/text()
return ltpapergtq/title/text()lt/papergt lt/pe
rsongt lt/areagt
15Previous Work (II)
- XML query containment two new challenges
- XPath containment
- With , // and Miklau and Suciu, PODS 2002
- With equality testing on tag variablesDeutsch
and Tannen, KRDB 2001 - Conjunctive queries over path expressions
Florescu, Levy and Suciu, PODS 1998 - Nested query containment
- Complex object query containment Levy and Suciu,
PODS 1997
Containment of nested XML queries has not been
fully studied
16Our Focus Nested XML Queries
- Returned tag constants
- Conjunctive no two sibling query blocks return
the same tag - XPath
- HAVE
- Child axis (/)
- Wildcards ()
- Branches ()
- NOT HAVE
- descendant //
- Arithmetic comparison
- Union
Here, XPath containment is in PTIME
17Complexity Result (I)
18Complexity Result (II)
19Complexity Result (II)
20Complexity Result (II)
21Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
- Query containment in practice
- Relaxing the assumptions
- Conclusions
22Deciding Q Q?
- How to find a property for an infinite number of
input XML instances - Standard technique
- Find a finite set of input representatives
Canonical Databases - Relational query each canonical database is a
minimal input to generate the answer template - XML query answers have infinite number of shapes
- Find a finite set of answer templates Canonical
Answers
23Answer Shapes Determined by the Head Tree
-
- Q
- for x in /project return
- ltgroupgt
- for y in /project/member return
- ltnamegt
- where yAlice
- return ltAlice/gt
- where yBob
- return ltBob/gt
- lt/namegt
- lt/groupgt
Head Tree
group
name
Alice
Bob
24An Additional Candidate Answer
Head Tree
group
name
Alice
Bob
25Why Consider the Additional Case
D
Head Tree
group
name
Alice
Bob
Q(D)
Q(D)
26What can Serve as Canonical Answers?
- Prefix subtrees of the head tree? necessary
but not sufficient - Trees contained in the head tree?
- necessary and sufficient
- but, too many and too complex
27A Head Tree can Have Many Trees Contained in it
Head Tree
group
group
name
name
name
name
Alice
Alice
Bob
Alice
Bob
Bob
group
group
group
group
name
name
name
name
name
Alice
Bob
Alice
Bob
Alice
Alice
Bob
28What can Serve as Canonical Answers?
- Prefix subtrees of the head tree? necessary
but not sufficient - Trees contained in the head tree?
- necessary and sufficient
- but, too many and too complex
- Our solution consider only minimal trees that
are contained in the head tree
29Canonical Answer
- A minimal XML instance No two sibling subtrees
where one is contained in the other - Canonical Answer A minimal XML instance
contained in the head tree - Every answer A of query Q corresponds to a unique
canonical answer CA, s.t. A CA, CA A
group
group
group
name
name
name
name
name
Bob
Alice
Alice
Alice
Bob
Alice
Bob
?
?
30Canonical Database
- Canonical Database DBCA
- The minimal XML instance to generate CA
CA
- for x in /project return
- ltgroupgt
- for y in /project/member return
- ltnamegt
- where yAlice
- return ltAlice/gt
- where yBob
- return ltBob/gt
- lt/namegt
- lt/groupgt
group
name
name
Alice
Bob
DB
31Sound and Complete Conditions for Nested Query
Containment
- Theorem 1. Q Q, if and only if for every
canonical database DB of Q, Q(DB) Q(DB) - Theorem 2. Q Q, if and only if for every
canonical answer CA of Q, - CA is a canonical answer of Q
- DBCA DBCA
32Query Containment Algorithm
- Algorithm
- for every canonical answer CA of Q do
- check whether CA is a canonical answer of Q
- generate DBCA and DBCA
- check DBCA DBCA
33Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
- Query containment in practice
- Relaxing the assumptions
- Conclusions
34Query Containment Algorithm
- Algorithm
- for every canonical answer CA of Q do
- check whether CA is a canonical answer of Q
- generate DBCA and DBCA
- check DBCA DBCA
- Polynomial in the size and number of canonical
answers - What are the sizes of canonical answers?
- What is the number of canonical answers?
35Containment of XML Queries with Fanout 1
- E.g. d3 the depth m1 the maximum fanout
- Canonical Answers and Complexity
- Number the depth of the query
- Size bounded by the depth of the query
- Complexity O( dQQ)
- Theorem Testing containment of XML Queries with
fanout 1 is in PTIME
- for x in /project return
- ltgroupgtfor y in /project/member return
- ltnamegtwhere y Alice return ltAlice/gt
- lt/namegt
- lt/groupgt
group
group
group
name
name
Alice
Nesting with fanout 1 does not increase complexity
36Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
- Query containment in practice
- Relaxing the assumptions
- Conclusions
37Containment of XML Queries with Arbitrary Fanout
- E.g. d4 the depth m3 the maximum fanout
- Canonical Answers Complexity
- Number
- Size
- Theorem Testing containment of XML Queries with
depth 2 and arbitrary fanout is coNP-hard
38Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
-
NOT TIGHT
- Query containment in practice
- Relaxing the assumptions
- Conclusions
39Effect of the Depth on Containment of XML Queries
- Insight Kernel Canonical Answer
- The root node has a single child
- In any subtree, a path pattern is repeated no
more than cd times. - d query depth
- c (maximum path steps in a query block)
- The size of kernel canonical answers
- Polynomial in the query size
- Exponential in the query depth
- Theorem
- Testing containment of XML queries with fixed
depth is coNP-complete - Testing containment of XML queries with arbitrary
depth is in coNEXPTIME
40Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
- Query containment in practice
- Relaxing the assumptions
- Conclusions
41Containment Checking in Practice
- Analyze element cardinality to reduce the number
of canonical answers for containment checking - canonical answers originally 71 ? after
analysis 2
Q for g in /group where g/gname/text()
database return ltareagt for p in g/person
return ltpersongt ltnamegtp/text()lt/namegt for
q in g/paper where q/author/text()
p/text() return ltpapergtq/title/text()lt/pap
ergt lt/persongt lt/areagt
Q for g in /group return ltareagt for p in
g/person return ltpersongt ltnamegtp/text()lt/
namegt ltgroupgtg/gname/text()lt/groupgt for q
in g/paper where q/author/text() p/text()
return ltpapergtq/title/text()lt/papergt lt/pe
rsongt lt/areagt
42Roadmap
- Introduction and problem definition
- Containment of a subset of XML queries
- Query containment is decidable
-
- Query containment in practice
- Relaxing the assumptions
- Conclusions
43An Example Query that Returns Tag Variables
for x in dbGrp return ltresultgt for y in
x/proj return ltgroupgt for u in y/member
return ltnamegt u/text() lt/namegt for v in
y/paper return ltpubgt v/text()
lt/pubgt lt/groupgt lt/resultgt
44Deciding Query Containment
- Leverage previous results simulation mapping
Levy and Suciu, PODS97 - Check query simulation mapping for every
canonical answer - Complexity
- Simulation mapping can be checked in polynomial
time in terms of query size - Complexity of checking containment does not arise
45Other Extensions
46Conclusions
- Contributions
- A sound and complete condition for containment of
nested XML queries - Detailed complexity analysis
- Future work
- Fill in the open gap of complexity in case of
queries with arbitrary fanout and arbitrary
nesting depth - Evaluate and optimize the containment algorithm
with element cardinality analysis - Answering nested XML queries using views
47Containment of Nested XML Queries
- _at_VLDB 2004
- Xin (Luna) Dong, Alon Halevy, Igor Tatarinov
- University of Washington
- www.cs.washington.edu/homes/lunadong