Containment of Nested XML Queries - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Containment of Nested XML Queries

Description:

Queries with negation [Levy and Sagiv, VLDB 1993] ... Why Consider the Additional Case. name. group. name. Alice. Bob. project. project. member ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 48
Provided by: Sweet7
Category:

less

Transcript and Presenter's Notes

Title: Containment of Nested XML Queries


1
Containment of Nested XML Queries
  • Xin (Luna) Dong, Alon Halevy, Igor Tatarinov
  • University of Washington

2
Query Containment
  • The most fundamental relationship between a pair
    of queries
  • Query Q is contained in Q if
  • For any database D,
  • Q(D) is a subset of Q(D)

3
Applications of Query Containment
  • Semantic caching
  • Reasoning about contents of data sources in data
    integration
  • Verification of integrity constraints
  • Verification of knowledge bases
  • Determining queries independent of updates
  • Query answering using views

4
Query Processing in PDMS
  • XML Query Containment in Peer Data Management
    System (PDMS)
  • Answering queries using views to extract remote
    data
  • Removing redundant queries to enhance performance
  • Tatarinov and Halevy, SIGMOD 2004

QW
QW
QW
QS
QS
QB2
QP
QB1
QB1
5
Query Containment Relational v.s. XML
6
Query Containment Relational v.s. XML
7
Example An XML Instance
D ltprojectgt ltmembergtAlicelt/membergt lt/projectgt ltp
rojectgt ltmembergtBoblt/membergt lt/projectgt
8
Example An XML Query
D
  • Q
  • for x in /project return
  • ltgroupgt
  • for y in x/member
  • return
  • ltnamegt
  • where yAlice
  • return ltAlice/gt
  • where yBob
  • return ltBob/gt
  • lt/namegt
  • lt/groupgt

Q(D)
9
Example Another XML Query
D
  • Q
  • for x in /project return
  • ltgroupgt
  • for y in /project/member
  • return
  • ltnamegt
  • where yAlice
  • return ltAlice/gt
  • where yBob
  • return ltBob/gt
  • lt/namegt
  • lt/groupgt

Q(D)
10
Example Tree Homomorphism and Query Containment
Q(D)
Q(D)
11
Query Containment Problem
  • From answer containment to query containment
  • Our problems
  • Given queries Q and Q, decide whether Q Q
  • The complexity of query containment

Q(D) Q (D)
? Q Q
Q (D) Q(D)
? Q Q
12
Previous Work (I)
  • Relational query containment
  • Conjunctive queries Chandra and Merlin, STOC
    1977
  • Acyclic queries Yannakakis, VLDB 1981
  • Queries with union Sagiv and Yannakakis, JACM
    1980
  • Queries with negation Levy and Sagiv, VLDB 1993
  • Queries with arithmetic comparisons Klug, JACM
    1988
  • Recursive queriesShmueli, 1993, Chaudhuri and
    Vardi, 1992
  • Queries over bags Ioannidis and Ramakrishnan,
    1995

13
Previous Work (II)
  • XML query containment two new challenges
  • XPath containment
  • With , // and Miklau and Suciu, PODS 2002
  • With equality testing on tag variablesDeutsch
    and Tannen, KRDB 2001
  • Conjunctive queries over path expressions
    Florescu, Levy and Suciu, PODS 1998
  • Nested query containment

14
Containment Cannot be Determined Solely by
Comparing XPath Components
Q for g in /group where g/gname/text()
database return ltareagt for p in g/person
return ltpersongt ltnamegtp/text()lt/namegt for
q in g/paper where q/author/text()
p/text() return ltpapergtq/title/text()lt/pap
ergt lt/persongt lt/areagt
Q for g in /group return ltareagt for p in
g/person return ltpersongt ltnamegtp/text()lt/
namegt ltgroupgtg/gname/text()lt/groupgt for q
in g/paper where q/author/text() p/text()
return ltpapergtq/title/text()lt/papergt lt/pe
rsongt lt/areagt
15
Previous Work (II)
  • XML query containment two new challenges
  • XPath containment
  • With , // and Miklau and Suciu, PODS 2002
  • With equality testing on tag variablesDeutsch
    and Tannen, KRDB 2001
  • Conjunctive queries over path expressions
    Florescu, Levy and Suciu, PODS 1998
  • Nested query containment
  • Complex object query containment Levy and Suciu,
    PODS 1997

Containment of nested XML queries has not been
fully studied
16
Our Focus Nested XML Queries
  • Returned tag constants
  • Conjunctive no two sibling query blocks return
    the same tag
  • XPath
  • HAVE
  • Child axis (/)
  • Wildcards ()
  • Branches ()
  • NOT HAVE
  • descendant //
  • Arithmetic comparison
  • Union

Here, XPath containment is in PTIME
17
Complexity Result (I)
18
Complexity Result (II)
19
Complexity Result (II)
20
Complexity Result (II)
21
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable
  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

22
Deciding Q Q?
  • How to find a property for an infinite number of
    input XML instances
  • Standard technique
  • Find a finite set of input representatives
    Canonical Databases
  • Relational query each canonical database is a
    minimal input to generate the answer template
  • XML query answers have infinite number of shapes
  • Find a finite set of answer templates Canonical
    Answers

23
Answer Shapes Determined by the Head Tree
  • Q
  • for x in /project return
  • ltgroupgt
  • for y in /project/member return
  • ltnamegt
  • where yAlice
  • return ltAlice/gt
  • where yBob
  • return ltBob/gt
  • lt/namegt
  • lt/groupgt

Head Tree
group
name
Alice
Bob
24
An Additional Candidate Answer
Head Tree
group
name
Alice
Bob
25
Why Consider the Additional Case
D
Head Tree
group
name
Alice
Bob
Q(D)
Q(D)
26
What can Serve as Canonical Answers?
  • Prefix subtrees of the head tree? necessary
    but not sufficient
  • Trees contained in the head tree?
  • necessary and sufficient
  • but, too many and too complex

27
A Head Tree can Have Many Trees Contained in it
Head Tree
group
group
name
name
name
name
Alice
Alice
Bob
Alice
Bob
Bob
group
group
group
group
name
name
name
name
name
Alice
Bob
Alice
Bob
Alice
Alice
Bob
28
What can Serve as Canonical Answers?
  • Prefix subtrees of the head tree? necessary
    but not sufficient
  • Trees contained in the head tree?
  • necessary and sufficient
  • but, too many and too complex
  • Our solution consider only minimal trees that
    are contained in the head tree

29
Canonical Answer
  • A minimal XML instance No two sibling subtrees
    where one is contained in the other
  • Canonical Answer A minimal XML instance
    contained in the head tree
  • Every answer A of query Q corresponds to a unique
    canonical answer CA, s.t. A CA, CA A

group
group
group
name
name
name
name
name
Bob
Alice
Alice
Alice
Bob
Alice
Bob
?
?
30
Canonical Database
  • Canonical Database DBCA
  • The minimal XML instance to generate CA

CA
  • for x in /project return
  • ltgroupgt
  • for y in /project/member return
  • ltnamegt
  • where yAlice
  • return ltAlice/gt
  • where yBob
  • return ltBob/gt
  • lt/namegt
  • lt/groupgt

group
name
name
Alice
Bob
DB
31
Sound and Complete Conditions for Nested Query
Containment
  • Theorem 1. Q Q, if and only if for every
    canonical database DB of Q, Q(DB) Q(DB)
  • Theorem 2. Q Q, if and only if for every
    canonical answer CA of Q,
  • CA is a canonical answer of Q
  • DBCA DBCA

32
Query Containment Algorithm
  • Algorithm
  • for every canonical answer CA of Q do
  • check whether CA is a canonical answer of Q
  • generate DBCA and DBCA
  • check DBCA DBCA

33
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable
  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

34
Query Containment Algorithm
  • Algorithm
  • for every canonical answer CA of Q do
  • check whether CA is a canonical answer of Q
  • generate DBCA and DBCA
  • check DBCA DBCA
  • Polynomial in the size and number of canonical
    answers
  • What are the sizes of canonical answers?
  • What is the number of canonical answers?

35
Containment of XML Queries with Fanout 1
  • E.g. d3 the depth m1 the maximum fanout
  • Canonical Answers and Complexity
  • Number the depth of the query
  • Size bounded by the depth of the query
  • Complexity O( dQQ)
  • Theorem Testing containment of XML Queries with
    fanout 1 is in PTIME
  • for x in /project return
  • ltgroupgtfor y in /project/member return
  • ltnamegtwhere y Alice return ltAlice/gt
  • lt/namegt
  • lt/groupgt

group
group
group
name
name
Alice
Nesting with fanout 1 does not increase complexity
36
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable
  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

37
Containment of XML Queries with Arbitrary Fanout
  • E.g. d4 the depth m3 the maximum fanout
  • Canonical Answers Complexity
  • Number
  • Size
  • Theorem Testing containment of XML Queries with
    depth 2 and arbitrary fanout is coNP-hard

38
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable




  • NOT TIGHT

  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

39
Effect of the Depth on Containment of XML Queries
  • Insight Kernel Canonical Answer
  • The root node has a single child
  • In any subtree, a path pattern is repeated no
    more than cd times.
  • d query depth
  • c (maximum path steps in a query block)
  • The size of kernel canonical answers
  • Polynomial in the query size
  • Exponential in the query depth
  • Theorem
  • Testing containment of XML queries with fixed
    depth is coNP-complete
  • Testing containment of XML queries with arbitrary
    depth is in coNEXPTIME

40
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable
  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

41
Containment Checking in Practice
  • Analyze element cardinality to reduce the number
    of canonical answers for containment checking
  • canonical answers originally 71 ? after
    analysis 2

Q for g in /group where g/gname/text()
database return ltareagt for p in g/person
return ltpersongt ltnamegtp/text()lt/namegt for
q in g/paper where q/author/text()
p/text() return ltpapergtq/title/text()lt/pap
ergt lt/persongt lt/areagt
Q for g in /group return ltareagt for p in
g/person return ltpersongt ltnamegtp/text()lt/
namegt ltgroupgtg/gname/text()lt/groupgt for q
in g/paper where q/author/text() p/text()
return ltpapergtq/title/text()lt/papergt lt/pe
rsongt lt/areagt
42
Roadmap
  • Introduction and problem definition
  • Containment of a subset of XML queries
  • Query containment is decidable
  • Query containment in practice
  • Relaxing the assumptions
  • Conclusions

43
An Example Query that Returns Tag Variables
for x in dbGrp return ltresultgt for y in
x/proj return ltgroupgt for u in y/member
return ltnamegt u/text() lt/namegt for v in
y/paper return ltpubgt v/text()
lt/pubgt lt/groupgt lt/resultgt
44
Deciding Query Containment
  • Leverage previous results simulation mapping
    Levy and Suciu, PODS97
  • Check query simulation mapping for every
    canonical answer
  • Complexity
  • Simulation mapping can be checked in polynomial
    time in terms of query size
  • Complexity of checking containment does not arise

45
Other Extensions
46
Conclusions
  • Contributions
  • A sound and complete condition for containment of
    nested XML queries
  • Detailed complexity analysis
  • Future work
  • Fill in the open gap of complexity in case of
    queries with arbitrary fanout and arbitrary
    nesting depth
  • Evaluate and optimize the containment algorithm
    with element cardinality analysis
  • Answering nested XML queries using views

47
Containment of Nested XML Queries
  • _at_VLDB 2004
  • Xin (Luna) Dong, Alon Halevy, Igor Tatarinov
  • University of Washington
  • www.cs.washington.edu/homes/lunadong
Write a Comment
User Comments (0)
About PowerShow.com