Title: Evaluation of Partial Path Queries on XML Data
1Evaluation of Partial Path Queries on XML Data
Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu
(NJIT, USA) Dimitri Theodoratos (NJIT,
USA) Theodore Dalamagas (NTUA, GREECE) Timos
Sellis (NTUA, GREECE)
2Evaluation of Partial Path Queries on XML Data
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
3Difficulties on Querying XML Data
Creta
4Difficulties on Querying XML Data
Search problem Name Xiaoying Wu Place Athens
Center, Heraklio Purpose Sightseeing Problem
? structural difference
Parthenon (438 BC)
Phaistos Disk (1700 BC)
Creta
5Difficulties on Querying XML Data
Search problem Name Theodore Dalamagas Place
Islands Purpose Sea sports Problem ?
structural inconsistency
Windsurf
Jet ski
Creta
6Difficulties on Querying XML Data
Search problem Name Dimitri Theodoratos Place
Heraklio Purpose HDMS Conference Problem ?
unknown structure
HDMS 2008
Creta
7Difficulties on Querying XML Data
Search problem Name Stefanos Souldatos Place
Any island Purpose Escape from PhD! Problem ?
multiple sources
Creta
?
theHotel.gr
1400 islands
hotels.gr
holidays.gr
8Difficulties on Querying XML Data
Can we use existing query languages (XPath,
XQuery) to express our queries?
Can we use existing techniques to evaluate our
queries?
Creta
9Path Queries in XPath
no structure (keywords)
full structure (path patterns)
partial path queries
//theHotel.gr descendant-or-self ancestor-or
-selfCity ancestor-or-selfIsland
//theHotel.gr//City descendant-or-self ances
tor-or-selfIsland
/theHotel.gr/City//Island
10Partial Path Queries
root node (optional) query node labelled by
a child relationship descendant relationship
r
a
partial path query
11Partial Path Queries
QUERY PROCESSING
QUERY EVALUATION
partial path query
partial path query in canonical form
12Evaluation of Partial Path Queries on XML Data
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
?
13Query Processing
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
14Query Processing
INFERENCE RULES (IR1) - r//ai (IR2) x/y - x//y
(IR3) x//y, y//z - x//z (IR4) x/ai, x//bj -
ai//bj (IR5) ai/x, bj//x - bj//ai (IR6) x/y,
y/w, x//z, z//w - x/z (IR7) x/y, x//z, w/z, w//y
- x/z (IR8) x/y, y/w, x/z - z/w (IR9) x//y,
y//w, x/z - z//w (IR10) x/y, w/y, w/z -
x/z (IR11) x//y, w/y, w//z - x//z (IR12) x/y,
y/w, z/w - x/z (IR13) x//y, y//w, z/w -
x//z x,y,z,w query nodes ai/bj nodes labelled
by a/b
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
15Query Processing
INFERENCE RULES (IR1) - r//ai (IR2) x/y - x//y
(IR3) x//y, y//z - x//z (IR4) x/ai, x//bj -
ai//bj (IR5) ai/x, bj//x - bj//ai (IR6) x/y,
y/w, x//z, z//w - x/z (IR7) x/y, x//z, w/z, w//y
- x/z (IR8) x/y, y/w, x/z - z/w (IR9) x//y,
y//w, x/z - z//w (IR10) x/y, w/y, w/z -
x/z (IR11) x//y, w/y, w//z - x//z (IR12) x/y,
y/w, z/w - x/z (IR13) x//y, y//w, z/w -
x//z x,y,z,w query nodes ai/bj nodes labelled
by a/b
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
16Query Processing
INFERENCE RULES (IR1) - r//ai (IR2) x/y - x//y
(IR3) x//y, y//z - x//z (IR4) x/ai, x//bj -
ai//bj (IR5) ai/x, bj//x - bj//ai (IR6) x/y,
y/w, x//z, z//w - x/z (IR7) x/y, x//z, w/z, w//y
- x/z (IR8) x/y, y/w, x/z - z/w (IR9) x//y,
y//w, x/z - z//w (IR10) x/y, w/y, w/z -
x/z (IR11) x//y, w/y, w//z - x//z (IR12) x/y,
y/w, z/w - x/z (IR13) x//y, y//w, z/w -
x//z x,y,z,w query nodes ai/bj nodes labelled
by a/b
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
17Query Processing
INFERENCE RULES (IR1) - r//ai (IR2) x/y - x//y
(IR3) x//y, y//z - x//z (IR4) x/ai, x//bj -
ai//bj (IR5) ai/x, bj//x - bj//ai (IR6) x/y,
y/w, x//z, z//w - x/z (IR7) x/y, x//z, w/z, w//y
- x/z (IR8) x/y, y/w, x/z - z/w (IR9) x//y,
y//w, x/z - z//w (IR10) x/y, w/y, w/z -
x/z (IR11) x//y, w/y, w//z - x//z (IR12) x/y,
y/w, z/w - x/z (IR13) x//y, y//w, z/w -
x//z x,y,z,w query nodes ai/bj nodes labelled
by a/b
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
18Query Processing
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
A query is unsatisfiable if its full form
contains a trivial cycle
19Query Processing
A node y is redundant if one of the following
patterns occur
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
a)
d)
b)
c
c)
20Query Processing
- Full form
- Satisfiability
- Redundant nodes
- Canonical form
canonical form of satisfiable query full form
IR2 IR3 redundant nodes
The canonical form of a query is a directed
acyclic graph (dag)
21Evaluation of Partial Path Queries on XML Data
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
?
?
22Evaluation Algorithms
- Based on PathStack Bruno et al. 02
- Produce all possible path queries
- Decompose into root-to-leaf paths
- PartialMJ Decompose a spanning tree into paths
- Extending PathStack Bruno et al. 02
- PartialPathStack Produce a topological order of
the query nodes and extend PathStack to handle
it
23Based on PathStack
1. Producing all possible path queries
r
a
c
b
d
e
f
g
24Based on PathStack
1. Producing all possible path queries
r
r
r
r
a
a
a
a
c
b
c
b
c
b
c
b
d
d
d
d
e
f
e
f
e
f
e
f
g
g
g
g
25Based on PathStack
1. Producing all possible path queries
26Based on PathStack
1. Producing all possible path queries
Problems ? too many queries to evaluate ?
multiple traversal of the XML tree
27Based on PathStack
2. Decomposing into root-to-leaf paths
28Based on PathStack
2. Decomposing into root-to-leaf paths
PathStack
29Based on PathStack
2. Decomposing into root-to-leaf paths
Problems ? path overlaps ? more than one
components to evaluate ? intermediate results
30Based on PathStack
PartialMJ. Using a spanning tree
Remove edges to create a spanning tree
31Based on PathStack
PartialMJ. Using a spanning tree
32Based on PathStack
PartialMJ. Using a spanning tree
PathStack
33Based on PathStack
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
34Based on PathStack
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
35Based on PathStack
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
36Based on PathStack
PartialMJ. Using a spanning tree
37Based on PathStack
PartialMJ. Using a spanning tree
Problems ? path overlaps ? more than one
components to evaluate ? intermediate results
38Extending PathStack
PartialPathStack. Employ a topological order
r
a
c
b
d
e
f
g
39Extending PathStack
PartialPathStack. Employ a topological order
PartialPathStack
40PartialPathStack Example
query
tree
results
r
a1
b1
d1
d1
sink nodes
c1
e1
d2
c2
e2
41PartialPathStack Example
tree
query
results
r
a1
b1
d1
d1
sink nodes
c1
e1
r
d2
c2
e2
42PartialPathStack Example
tree
query
results
r
a1
b1
d1
d1
sink nodes
c1
e1
r
a1
d2
c2
e2
43PartialPathStack Example
tree
query
results
r
a1
b1
d1
d1
sink nodes
c1
e1
r
a1
b1
d2
c2
e2
44PartialPathStack Example
tree
query
results
r
a1
b1
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
d2
c2
e2
45PartialPathStack Example
tree
query
results
r
a1
b1
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
d2
c2
e2
46PartialPathStack Example
tree
query
results
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
47PartialPathStack Example
tree
query
results
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
48PartialPathStack Example
tree
query
results
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
49PartialPathStack Example
tree
query
results
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
50PartialPathStack Example
tree
query
results
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
51PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
r
a1
b1
d1
c1
e1
d2
c2
e2
52PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
d1
d1
sink nodes
c1
e1
d2
r
a1
b1
d1
c1
e1
d2
c2
e2
53PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
54PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
55PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
56PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
57PartialPathStack Example
tree
query
results ra1b1d1c1e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
58PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
59PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
d1
d1
sink nodes
c1
e1
d2
c2
r
a1
b1
d1
c1
e1
d2
c2
e2
60PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
61PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
62PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
63PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
64PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
65PartialPathStack Example
tree
query
results ra1b1d1c1e1 ra1b1d1c2e1 ra1b1d1c1e2
r
a1
b1
OUTPUT!!!
d1
d1
sink nodes
c1
e1
d2
e2
r
a1
b1
d1
c1
e1
d2
c2
e2
66PartialPathStack Example
query
tree
results ra1b1d1c1e1 ra1b1d1c2e1 ra1b1d1c1e2
r
a1
b1
d1
d1
c1
e1
? only one component to evaluate ? no
intermediate results
d2
c2
e2
67Evaluation Algorithms
68PartialPathStack vs PathStack
- PathStack
- Path queries
- Indegree 1
- Outdegree 1
- O(input output)
- PartialPathStack
- Partial path queries
- Indegree gt 1
- Outdegree gt 1
- O(inputindegree outputoutdegree)
69Evaluation of Partial Path Queries on XML Data
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
70Queries Used in the Experiments
Q1/Q5
Q2/Q6
Q3/Q7
Q4/Q8
71Experiment 1
Execution time on Treebank
2.5 million nodes
72Experiment 1
Execution time on Treebank
2.5 million nodes
path queries
73Experiment 1
Execution time on Treebank
2.5 million nodes
too many results
74Experiment 1
Execution time on Synthetic data
2.5 million nodes (IBM AlphaWorks XML generator)
75Experiment 2
Q2
Execution time varying the size of the XML
tree (1 - 3 million nodes)
PartialMJ
PartialPathStack
Q3
Q7
PartialMJ
PartialMJ
PartialPathStack
PartialPathStack
76Evaluation of Partial Path Queries on XML Data
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
?
77Conclusion
78Questions?
- Partial path queries
- Query processing
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
?