Title: Schema-Based Query Optimization for XQuery over XML Streams
1Schema-Based Query Optimization for XQuery over
XML Streams
- Hong Su
- Elke A. Rundensteiner
- Murali Mani
- Worcester Polytechnic Institute, Massachusetts,
USA - VLDB 2005
2Schema-Based Query Optimization (SQO)
- Schema knowledge can be utilized to optimize
queries - Well studied in deductive/relational databases
- Join elimination
- predicate elimination,
- detection of empty answer set
- Equally applicable to XML for flat value
filtering
3SQO for XML Pattern Retrieval
- General XML SQO
- Applicable to both static and streaming XML
- E.g.. Query tree minimization Amer-Yahia02
- Static XML Specific SQO
- Focus on expediting random access of data
- E.g. Query rewrite using extents (indices
built on element types) Fernandez98, - Stream specific XML SQO
- Focus on expediting token-by-token sequential
access of data
4Stream Specific SQO Example
buffer
Without schema
Buffer seller element
ltsellergtltsameAddrgtlturlgtlturlgtlt/sellergt
Retrieve /shipTo
buffer
Buffer seller element
Retrieve /shipTo
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
5Related Work
- YFilter Diao02 and XSM Ludscher 03
- Use schema to decide whether pattern results are
recursive or types of child elements - Essentially propose general XML SQO
- FluXQuery Koch04
- Use schema to minimize buffer size
- Is complementary to our focus (aim to skip
unnecessary computations) - SIX Gupta03
- Use indices interleaved with XML data to reduce
parsing - Could be combined with our techniques
6Challenge Constraint Useful?
/seller/shipTo
Retrieve /shipTo
Nothing to save /shipTo is the only pattern
retrieval
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
When retrieved
7Challenge Benefits/Overhead?
- Maximal benefits no beneficial optimization
should be missed - Any failed patterns should be detected as early
as possible - Minimal overhead no redundant optimization
should be introduced - Whether a particular pattern fails should not be
repeatedly checked
8Challenge Plan Execution
- Optimization at lower level than query rewrite
- Specific physical implementations are needed
No query can capture this optimization
Buffer seller element
/sellershipTo
Retrieve /shipTo
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
When retrieved
9Outline
- SQO Technique Design
- SQO Application
- Execution of Optimized Plan
- Experimentations
10Physical Implementation of Pattern Retrieval
- Note
- Important to understand physical stream engine
implementation for designing effective SQO - Our implementation
- Widely used automata implementation e.g.,
Tukwila, YFilter
11Example Query and its Automata
3
?
auctions
auction
for a in /auctions/auction, b in
a/sellershipTo where b//phone508-123-4
567 return ltauctiongt for c in a/item
where c//keywordauto return
b//phone lt/auctiongt
0
1
2
shipTo
10
seller
9
11
12
primary, secondary
phone
12Example Query and its Automata
3
?
auctions
auction
0
1
2
shipTo
10
seller
9
11
12
primary, secondary
phone
13Is Constraint Useful for Opt.?
- Constraints used to find ending marks of a
pattern within a context element
lt!element seller((billTo, shipTo)sameAddr?, )gt
ltsameAddrgt is ending mark of /shipTo within
seller element context
14Is Constraint Useful for Opt.?
- Ending mark helpful if
- Context element can be filtered out earlier
15Is Constraint Useful for Opt.?
- Ending mark helpful if
- Context element can be filtered out earlier
- Pattern may fail to appear
Ending mark for a/seller is not helpful
lt!element auction(seller, )gt
for a in /auctions/auction, b in
a/seller
Ending mark for a/seller is helpful
lt!element auction(seller?, )gt
16Is Constraint Useful for Opt.?
- Ending mark helpful if
- Context element can be filtered out earlier
- Pattern may fail to appear
- Pattern is required
Ending mark for a/seller is not helpful
lt!element auction(seller, )gt
for a in /auctions/auction, b in
a/seller
Ending mark for a/seller is helpful
lt!element auction(seller?, )gt
17Is Constraint Useful for Opt.?
- Ending mark helpful if
- Context element can be filtered out earlier
- Pattern may fail to appear
- Pattern is required
for c in a/item return ltcgta/categorylt/cgt
Ending mark for a/category is not helpful
lt!element item (category?, desc, )gt
for c in a/itemcategory return
ltcgta/categorylt/cgt
Ending mark for a/category is helpful
18Is Constraint Useful for Opt.?
- Ending mark helpful if
- Context element can be filtered out earlier
- Pattern may fail to appear
- Pattern is required
- and
- The early filtering can be beneficial
- Transitions may happen after ending marks
- Buffering flags may be raised before ending marks
19SQO Design
- Helpful ending marks identified by our SQO
- Three SQO rules designed using
- Occurrence constraints
- Exclusive constraints
- Order constraints
20Example SQO Rule
- Use occurrence constraint
- Event-condition-action output by rule
for a in /auctions/auction, b in
a/seller Where b//phone 508-1234567
Event second lt/phonegt is encountered in a
seller Condition b//phone 508-1234567 not
satisfied yet Action skip rest computations
within current seller element
lt!element seller(primary, secondary,
)gt lt!element primary (phone)gt lt!element
secondary (phone)gt
21Outline
- SQO Technique Design
- SQO Application
- Execution of Optimized Plan
- Experimentations
22Properties of SQO Application
- Maximal benefits
- Minimal overhead
23Maximal Benefit
- Definition of rule independence
- Proof of maximal benefits given
If rules are all independent, as long as each
rule is applied on each pattern,
maximal benefits are ensured
24Minimal Overhead Redundancy
- Same pattern redundancy
Multiple ending marks adopted for same pattern
Ending mark ltbillTogt for b/shipTo
Ending mark lturlgt for b/shipTo
Query
Schema Constraints
for a in /auctions/auction, b in
a/sellershipTo
lt!element seller
( shipTo?, billTo, url )gt
ltbillTogt guarantees to capture failure of
/shipTo
Redundant
25Minimal Overhead Redundancy?
- Parent-child pattern redundancy ending marks of
child patterns early filter parent pattern
ltbillTogt for b/shipTo
ltbiddergt for a/seller
Query
Constraints
lt!element auction
(seller, bidder)gt
optional
Can be used to capture failure of
a/sellershipTo
lt!element seller (shipTo,
billTo?)gt
for a in /auctions/auction, b in
a/sellershipTo
lt!element auction
(seller, bidder)gt
required
Redundant
lt!element seller (shipTo,
billTo)gt
26SQO Application Algorithm
- Input
- XQuery represented as a tree
- XML Schema represented as a graph
- Processing
- Query tree traversed top-down
- maximal benefits ensured
- Tree node applied by local/regional appliers
- Same pattern redundancy excluded by local applier
- Parent-child pattern redundancy excluded by
regional applier - Output
- Event-condition-actions attached to tree nodes
27Outline
- SQO Technique Design Guideline
- SQO Application
- Execution of Optimized Plan
- Experimentations
28Encoding ECAs in Automata
- E push-in or pop-out of state
- C pattern result buffer checked
- A actions include
- Suspend computations by removing automata
transitions - Clean up result generated within current context
element - Prepare for recovering computation for next
context element (e.g., backup transitions)
29Example ECAs in Automata
for a in /auctions/auction, b in
a/sellershipTo where b//phone508-123-4
567 return ltauctiongt for c in a/item
lt/auctiongt
(, state 3)
5
3
(1, startTag, none,state 2)
item
0
1
2
13
ltauctiongt ltsellergt
auction
sameAddr
auctions
ltsameAddrgt lt/sameAddrgt
seller
10
9
ltitemgt lt/itemgt
shipTo
ltprimarygt lt/primarygt
primary, secondary
11
12
phone
30Outline
- SQO technique design guideline
- SQO application
- Execution of optimized plan
- Experimentations
31Optimization Effected by ?
- How often pattern fails (pattern selectivity)
- How much gain each early filtering brings (unit
gain)
32Necessity of Design Guideline
Plan without SQO
Plan with SQO (1 ending mark)
Plan with SQO but no guideline considered (30
ending marks)
Selectivity of Pattern with the Only Useful
Ending Mark
33Conclusion
- First SQL on streaming XML
- Support SQO on nested XQuery with or //
- Offer criteria of useful constraints
- Ensure maximal benefits and minimal overhead in
SQO application - Provide execution strategy in widely-used
automata-based model - Implement SQO optimizer in Raindrop system
(VLDB04 demo) - Experimentally demonstrate SQO brings significant
improvement with little overhead
34- Visit our XQuery engine over XML stream
project (RAINDROP) website - http//davis.wpi.edu/dsrg/raindrop/
Supported by USA National Science
Foundation and IBM PhD Fellowship