Title: Honey, I Shrunk the XQuery!
1Honey, I Shrunk the XQuery! An XML Algebra
Optimization Approach
- Xin Zhang, Bradford Pielech
- and
- Elke A. Rundensteiner
2XML and Relational
The look and feel of an XML query system combined
with the maturity and technology support of RDB
Rainbow
3 Architecture
Rainbow
XAT Merger
XAT Generator
XAT Decorrelator
User Query Results in XML
User XAT
User XAT
User XQuery
XAT Executor
XAT Optimizer
SQL Generator
XAT
XAT
XAT
Tuples
XAT
View XAT
View XAT
View XQuery
SQL
RDBMS
XAT XML Algebra Tree
4GOAL XQuery level optimization
5Running Example
ltresultgt FOR t IN document(prices.xml)/book
/title RETURN t lt/resultgt
ltpricesgt ltbookgt lttitlegtTCP/IP
Illustratedlt/titlegt ltpricegt65.95lt/pricegt lt/book
gt ltbookgt lttitlegtData on the Weblt/titlegt ltpric
egt34.95lt/pricegt lt/bookgt lt/pricesgt
6User XML Algebra Tree (XAT)
User Query
?col3
1
ltresultgt FOR t IN document(prices.xml)/book
/title RETURN t lt/resultgt
2
Tltresultsgttlt/resultgtcol3
Agg
3
?R0, book/titlet
6
Sprices.xmlR0
7
7View XML Algebra Tree (XAT)
View Query
ltpricesgt FOR book IN document(dxv.xml)/book/ro
w prices IN document(dxv.xml)/prices/row WHE
RE book/bid prices/bid RETURN ltbookgt bo
ok/title, prices/price lt/bookgt lt/pricesgt
?col6col7
26
?book, bidcol6
27
?prices, bidcol7
28
?
31
Tltbookgt col10col12 lt/bookgtcol5
22
23
Tltpricesgtcol5lt/pricesgtcol4
?book, titlecol10
11
?R3, /prices/rowprices
?R1, /book/rowbook
20
14
Agg
?prices, pricecol12
12
25
15
Sdxv.xml R3
Sdxv.xml R1
21
8Merged XML Algebra Tree (XAT)
User Query
View Query
?col6col7
?col3
26
1
2
Tltpricesgtcol5lt/pricesgtcol4
?book, bidcol6
Tltresultsgttlt/resultgtcol3
11
27
Agg
3
Agg
?prices, bidcol7
12
28
?R0, book/titlet
Tltbookgt col10col12 lt/bookgtcol5
6
?
22
31
?col4 R0
23
7
?book, titlecol10
?R3, /prices/rowprices
?R1, /book/rowbook
20
14
?prices, pricecol12
25
15
Sdxv.xml R3
Sdxv.xml R1
21
9Outline
- XAT Optimization
- XAT Rewrite
- XAT Cleanup
- Preliminary Evaluation
- Related Work
- Summary
10XAT Rewrite
- Query Optimization at Logic Level.
- Goal
- Redundancy Elimination.
- Computation Pushdown.
- Technique
- Equivalence Rewrite Rules.
- Heuristics
- Pushdown Navigates
- Remove Construction of Intermediate Result
- Combine Multiple Operators.
11Before Navigation Pushdown
User Query
View Query
?col6col7
?col3
26
1
?book, bidcol6
2
Tltresultsgttlt/resultgtcol3
27
Agg
3
?prices, bidcol7
28
Tltbookgt col10col12 lt/bookgtcol5
?R0, book/titlet
?
6
22
31
?col4 R0
23
7
?book, titlecol10
?R3, /prices/rowprices
?R1, /book/rowbook
20
Tltpricesgtcol5lt/pricesgtcol4
11
14
?prices, pricecol12
25
15
Agg
Sdxv.xml R3
Sdxv.xml R1
12
21
12After Navigation Pushdown
View Query
User Query
Tltbookgt col10col12 lt/bookgtcol5
22
?col3
1
?col6col7
26
2
Tltresultsgttlt/resultgtcol3
?
31
Agg
3
?prices, pricecol12
?book, titlecol10
23
25
?R0, book/titlet
6
?prices, bidcol7
?book, bidcol6
28
27
Tltpricesgtcol5lt/pricesgtR0
11
Agg
12
13After Tagger Cancel Out
View Query
User Query
JOIN col6col7
31
?prices, pricecol12
25
?prices, bidcol7
28
?R3, /prices/rowprices
20
Sdxv.xml R3
21
14Outline
- XAT Optimization
- XAT Rewrite
- XAT Cleanup
- Preliminary Evaluation
- Related Work
- Summary
15XAT Cleanup
- Why
- SQL engine cannot reduce redundancy in XQuery.
- How
- Data Redundancy by Schema Cleanup
- Each operator produced, consumed and modified
some columns. - Minimum schema is then computed.
- Tree Redundancy by Unused Operator Cutting
- Cutting matrix generation.
- Required columns analysis.
- Operator cutting.
16XAT Operator Properties
- Produced
- Desc New column generated by operator.
- Example ?, S, T
- Consumed
- Desc Columns required by operator.
- Example ?, ?
- Modified
- Desc Columns modified by operator.
- Example ?, ?, ?
17Schema Computation
Old Schema
Consumed
Produced
Parent
Node
col3
col3
1
1
?col3
col3, R1, book, col6, t, R3, prices, col7,
col12
t
col3
1
2
Tltresultsgttlt/resultgtcol3
2
R1, book, col6, t, R3, prices, col7, col12
2
3
Agg
3
R1, book, col6, t, R3, prices, col7, col12
col6, col7
3
31
? col6col7
31
R1, book, col6, t
book
t
31
23
?prices, pricecol12
25
?book, titlet
23
R1, book, col6
book
col6
23
27
R1, book
R1
book
27
14
28
?prices, bidcol7
?book, bidcol6
27
R1
R1
14
15
20
R3, prices, col7, col12
prices
col12
31
25
?R3, /prices/rowprices
?R1, /book/rowbook
14
R3, prices, col7
prices
col7
25
28
Sdxv.xml R3
21
R3, prices
R3
prices
28
20
Sdxv.xml R1
15
R3
R3
20
21
18Schema Computation
Intuition Dont keep anything thats not used
later.
New Schema
R3
prices
col12
R1
book
col7
col6
t
col3
Parent()
1
col3
C
1
?col3
col3
C
P
1
2
Tltresultsgttlt/resultgtcol3
2
t
2
3
t
C
C
3
31
Agg
3
col6, t
C
P
31
23
book, col6
C
P
23
27
? col6col7
31
book
C
P
27
14
?prices, pricecol12
25
?book, titlet
23
R1
P
14
15
col7, col12
C
P
31
25
28
?prices, bidcol7
?book, bidcol6
27
prices, col7
C
P
25
28
20
?R3, /prices/rowprices
?R1, /book/rowbook
prices
C
P
28
20
14
R3
P
20
21
Sdxv.xml R3
21
Sdxv.xml R1
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
15
19Schema Cleanup Result
Node Original Schema Minimum Schema
1 col3, R1, book, col6, t, R3, prices, col7, col12 col3
2 col3, R1, book, col6, t, R3, prices, col7, col12 col3
3 R1, book, col6, t, R3, prices, col7, col12 t
31 R1, book, col6, t, R3, prices, col7, col12 t
23 R1, book, col6, t col6, t
27 R1, book, col6 book, col6
14 R1, book book
15 R1 R1
25 R3, prices, col7, col12 col7, col12
28 R3, prices, col7 prices, col7
20 R3, prices prices
21 R3 R3
20XAT Cleanup
- Schema Cleanup
- Each operator produced, consumed and modified
some columns. - Minimum schema is then computed.
- Unused Operator Cutting
- Cutting matrix generation.
- Required columns analysis.
- Operator cutting.
21Cutting Matrix
- Purpose
- Get rid of the unused operators.
- Equations
- Propagation of modified
- Propagation of required
- Identify cuttable node.
22Matrix Computation
1
?col3
Parent() col3 t col6 col7 book R1 col12 prices R3 Cut?
1 C
2 1 P C
3 2 - - - - - - - - -
31 3 C C
23 31 P C
27 23 P C
14 27 P C
15 14 P
25 31 P C
28 25 P C
20 28 P C
21 20 P
Tltresultsgttlt/resultgtcol3
2
Agg
3
JOIN col6col7
31
?prices, pricecol12
25
?book, titlet
23
28
?prices, bidcol7
?book, bidcol6
27
20
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
23Matrix Computation (Cont.1)
Intuition Give me only the required columns in
order to get the final result.
1
?col3
Cut?
R3
prices
col12
R1
book
col7
col6
t
col3
Parent()
Tltresultsgttlt/resultgtcol3
2
R
R
R
R
1
C
P
1
2
-
-
-
-
-
-
-
M
-
2
3
Agg
3
C
C
3
31
C
P
31
23
JOIN col6col7
31
C
P
23
27
C
P
27
14
?prices, pricecol12
25
?book, titlet
23
P
14
15
C
P
31
25
28
?prices, bidcol7
C
P
25
28
?book, bidcol6
27
C
P
28
20
20
P
20
21
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
24Matrix Computation (Cont. 2)
1
?col3
Parent() col3 t col6 col7 book R1 col12 prices R3 Cut?
1 R R R R
2 1 P C
3 2 - M - - - - - - -
31 3 C C X
23 31 P C
27 23 P C X
14 27 P C
15 14 P
25 31 P C X
28 25 P C X
20 28 P C X
21 20 P X
Tltresultsgttlt/resultgtcol3
2
Agg
3
JOIN col6col7
31
?prices, pricecol12
25
?book, titlet
23
28
?prices, bidcol7
?book, bidcol6
27
20
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
25XAT after Cutting
1
?col3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Tltresultsgttlt/resultgtcol3
Agg
3
2
Agg
JOIN col6col7
31
3
Reduced To
?prices, pricecol12
25
?book, titlet
?book, titlet
23
23
?R1, /book/rowbook
28
?prices, bidcol7
14
?book, bidcol6
27
20
Sdxv.xml R1
15
?R3, /prices/rowprices
?R1, /book/rowbook
14
Sdxv.xml R3
21
Sdxv.xml R1
15
26SQL Generated
SELECT book.title as t, book.bid as
col6, prices.price as col12, prices.bi
d as col7 FROM book book, prices
prices WHERE col6col7
1
?col3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Tltresultsgttlt/resultgtcol3
2
Agg
3
Agg
3
31
JOIN col6col7
?book, titlet
23
?prices, pricecol12
25
23
?book, titlet
?R1, /book/rowbook
14
28
?prices, bidcol7
27
?book, bidcol6
Sdxv.xml R1
15
20
?R3, /prices/rowprices
?R1, /book/rowbook
14
SELECT book.title as t, FROM book book,
Sdxv.xml R3
21
15
Sdxv.xml R1
27Outline
- XAT Optimization
- XAT Rewrite
- XAT Cleanup
- Preliminary Evaluation
- Related Work
- Summary
28Preliminary Evaluation
- Experiment Setup
- XQuery over Kweelt Parser
- PIII800 256 MB, Win 2k Pro.
- Data Setup
- Synthetic Data
- Synthetic Queries
- Query Execution
- Native XML Engine.
29Performance Gain in Execution
30Query Engine Overhead
XAT Optimizer
XAT Merger
XAT Generator
XAT Decorrelator
User XAT
User XAT
XAT Rewrite
User XQuery
XAT
XAT
XAT
XAT Executor
SQL Generator
XAT
View XAT
View XAT
XAT Cleanup
View XQuery
Total 32,522 ms
31Outline
- XAT Optimization
- XAT Rewrite
- XAT Cleanup
- Preliminary Evaluation
- Related Work
- Summary
32Related Work
- Rainbow
- Optimize on XAT. (static analysis)
- Algebra level rewriting.
- SQL Optimization
- Algebra based optimization.
- Static analysis.
- XQuery by Views Optimize in SQL.
- XPERANTOVLDBJ2000 XQGM vs. XAT
- Extension by UDFs for XML features.
- SilkRouteIEEE2001(242)
- Generate SQL Efficiently.
- AGORAVLDB2000
- Syntax level rewriting.
33Summary
- Efficient XQuery Processing
- XML Algebra Tree (XAT)
- XAT Optimization
- Rewrite by using equivalent rules
- Cleanup
- Schema cleanup
- Operator cutting
- Prototype system implementation.
34Questions?(Futures!)
- http//davis.wpi.edu/dsrg/rainbow
- https//sourceforge.net/projects/rainbow-engine/
- Special Thanks
- Brian Murphy, Luping Ding, DSRG group.
35(No Transcript)
36Schema Computation
Node Parent Produced Consumed Minimum Schema
1 col3 col3
2 1 col3 t col3
3 2 t
31 3 col6, col7 t
23 31 t book col6, t
27 23 col6 book book, col6
14 27 book R1 book
15 14 R1 R1
25 31 col12 prices col7, col12
28 25 col7 prices prices, col7
20 28 prices R3 prices
21 20 R3 R3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Agg
3
? col6col7
31
?prices, pricecol12
25
?book, titlet
23
?prices, bidcol7
28
?book, bidcol6
27
?R3, /prices/rowprices
20
?R1, /book/rowbook
14
Sdxv.xml R3
21
Sdxv.xml R1
15
37After Tagger Cancel Out
User Query
View Query
?col6col7
26
?
31
?book, titlet
23