Schema-Based Query Optimization for XQuery over XML Streams - PowerPoint PPT Presentation

About This Presentation
Title:

Schema-Based Query Optimization for XQuery over XML Streams

Description:

Schema-Based Query Optimization (SQO) Schema knowledge can be utilized to optimize ... Supported by USA National Science Foundation and IBM PhD Fellowship ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 34
Provided by: hong174
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Schema-Based Query Optimization for XQuery over XML Streams


1
Schema-Based Query Optimization for XQuery over
XML Streams
  • Hong Su
  • Elke A. Rundensteiner
  • Murali Mani
  • Worcester Polytechnic Institute, Massachusetts,
    USA
  • VLDB 2005

2
Schema-Based Query Optimization (SQO)
  • Schema knowledge can be utilized to optimize
    queries
  • Well studied in deductive/relational databases
  • Join elimination
  • predicate elimination,
  • detection of empty answer set
  • Equally applicable to XML for flat value
    filtering

3
SQO for XML Pattern Retrieval
  • General XML SQO
  • Applicable to both static and streaming XML
  • E.g.. Query tree minimization Amer-Yahia02
  • Static XML Specific SQO
  • Focus on expediting random access of data
  • E.g. Query rewrite using extents (indices
    built on element types) Fernandez98,
  • Stream specific XML SQO
  • Focus on expediting token-by-token sequential
    access of data

4
Stream Specific SQO Example
buffer
Without schema
Buffer seller element
ltsellergtltsameAddrgtlturlgtlturlgtlt/sellergt
Retrieve /shipTo
buffer
  • /sellershipTo

Buffer seller element
Retrieve /shipTo
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
5
Related Work
  • YFilter Diao02 and XSM Ludscher 03
  • Use schema to decide whether pattern results are
    recursive or types of child elements
  • Essentially propose general XML SQO
  • FluXQuery Koch04
  • Use schema to minimize buffer size
  • Is complementary to our focus (aim to skip
    unnecessary computations)
  • SIX Gupta03
  • Use indices interleaved with XML data to reduce
    parsing
  • Could be combined with our techniques

6
Challenge Constraint Useful?
/seller/shipTo
Retrieve /shipTo
Nothing to save /shipTo is the only pattern
retrieval
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
When retrieved
7
Challenge Benefits/Overhead?
  • Maximal benefits no beneficial optimization
    should be missed
  • Any failed patterns should be detected as early
    as possible
  • Minimal overhead no redundant optimization
    should be introduced
  • Whether a particular pattern fails should not be
    repeatedly checked

8
Challenge Plan Execution
  • Optimization at lower level than query rewrite
  • Specific physical implementations are needed

No query can capture this optimization
Buffer seller element
/sellershipTo
Retrieve /shipTo
Retrieve /sameAddr
lt!element seller((billTo,shipTo)sameAddr, )gt
When retrieved
9
Outline
  • SQO Technique Design
  • SQO Application
  • Execution of Optimized Plan
  • Experimentations

10
Physical Implementation of Pattern Retrieval
  • Note
  • Important to understand physical stream engine
    implementation for designing effective SQO
  • Our implementation
  • Widely used automata implementation e.g.,
    Tukwila, YFilter

11
Example Query and its Automata

3
?
auctions
auction
for a in /auctions/auction, b in
a/sellershipTo where b//phone508-123-4
567 return ltauctiongt for c in a/item
where c//keywordauto return
b//phone lt/auctiongt
0
1
2
shipTo
10
seller
9
11
12
primary, secondary

phone
12
Example Query and its Automata

3
?
auctions
auction
0
1
2
shipTo
10
seller
9
11
12
primary, secondary

phone
13
Is Constraint Useful for Opt.?
  • Constraints used to find ending marks of a
    pattern within a context element

lt!element seller((billTo, shipTo)sameAddr?, )gt
ltsameAddrgt is ending mark of /shipTo within
seller element context
14
Is Constraint Useful for Opt.?
  • Ending mark helpful if
  • Context element can be filtered out earlier

15
Is Constraint Useful for Opt.?
  • Ending mark helpful if
  • Context element can be filtered out earlier
  • Pattern may fail to appear

Ending mark for a/seller is not helpful
lt!element auction(seller, )gt
for a in /auctions/auction, b in
a/seller

Ending mark for a/seller is helpful
lt!element auction(seller?, )gt
16
Is Constraint Useful for Opt.?
  • Ending mark helpful if
  • Context element can be filtered out earlier
  • Pattern may fail to appear
  • Pattern is required

Ending mark for a/seller is not helpful
lt!element auction(seller, )gt
for a in /auctions/auction, b in
a/seller

Ending mark for a/seller is helpful
lt!element auction(seller?, )gt
17
Is Constraint Useful for Opt.?
  • Ending mark helpful if
  • Context element can be filtered out earlier
  • Pattern may fail to appear
  • Pattern is required

for c in a/item return ltcgta/categorylt/cgt
Ending mark for a/category is not helpful
lt!element item (category?, desc, )gt

for c in a/itemcategory return
ltcgta/categorylt/cgt
Ending mark for a/category is helpful
18
Is Constraint Useful for Opt.?
  • Ending mark helpful if
  • Context element can be filtered out earlier
  • Pattern may fail to appear
  • Pattern is required
  • and
  • The early filtering can be beneficial
  • Transitions may happen after ending marks
  • Buffering flags may be raised before ending marks

19
SQO Design
  • Helpful ending marks identified by our SQO
  • Three SQO rules designed using
  • Occurrence constraints
  • Exclusive constraints
  • Order constraints

20
Example SQO Rule
  • Use occurrence constraint
  • Event-condition-action output by rule

for a in /auctions/auction, b in
a/seller Where b//phone 508-1234567
Event second lt/phonegt is encountered in a
seller Condition b//phone 508-1234567 not
satisfied yet Action skip rest computations
within current seller element

lt!element seller(primary, secondary,
)gt lt!element primary (phone)gt lt!element
secondary (phone)gt
21
Outline
  • SQO Technique Design
  • SQO Application
  • Execution of Optimized Plan
  • Experimentations

22
Properties of SQO Application
  • Maximal benefits
  • Minimal overhead

23
Maximal Benefit
  • Definition of rule independence
  • Proof of maximal benefits given

If rules are all independent, as long as each
rule is applied on each pattern,
maximal benefits are ensured
24
Minimal Overhead Redundancy
  • Same pattern redundancy
    Multiple ending marks adopted for same pattern

Ending mark ltbillTogt for b/shipTo
Ending mark lturlgt for b/shipTo
Query
Schema Constraints
for a in /auctions/auction, b in
a/sellershipTo
lt!element seller
( shipTo?, billTo, url )gt
ltbillTogt guarantees to capture failure of
/shipTo
Redundant
25
Minimal Overhead Redundancy?
  • Parent-child pattern redundancy ending marks of
    child patterns early filter parent pattern

ltbillTogt for b/shipTo
ltbiddergt for a/seller
Query
Constraints
lt!element auction
(seller, bidder)gt
optional
Can be used to capture failure of
a/sellershipTo
lt!element seller (shipTo,
billTo?)gt
for a in /auctions/auction, b in
a/sellershipTo
lt!element auction
(seller, bidder)gt
required
Redundant
lt!element seller (shipTo,
billTo)gt
26
SQO Application Algorithm
  • Input
  • XQuery represented as a tree
  • XML Schema represented as a graph
  • Processing
  • Query tree traversed top-down
  • maximal benefits ensured
  • Tree node applied by local/regional appliers
  • Same pattern redundancy excluded by local applier
  • Parent-child pattern redundancy excluded by
    regional applier
  • Output
  • Event-condition-actions attached to tree nodes

27
Outline
  • SQO Technique Design Guideline
  • SQO Application
  • Execution of Optimized Plan
  • Experimentations

28
Encoding ECAs in Automata
  • E push-in or pop-out of state
  • C pattern result buffer checked
  • A actions include
  • Suspend computations by removing automata
    transitions
  • Clean up result generated within current context
    element
  • Prepare for recovering computation for next
    context element (e.g., backup transitions)

29
Example ECAs in Automata
for a in /auctions/auction, b in
a/sellershipTo where b//phone508-123-4
567 return ltauctiongt for c in a/item
lt/auctiongt

(, state 3)
5
3
(1, startTag, none,state 2)
item
0
1
2
13
ltauctiongt ltsellergt
auction
sameAddr
auctions
ltsameAddrgt lt/sameAddrgt
seller
10
9
ltitemgt lt/itemgt
shipTo
ltprimarygt lt/primarygt
primary, secondary
11
12

phone
30
Outline
  • SQO technique design guideline
  • SQO application
  • Execution of optimized plan
  • Experimentations

31
Optimization Effected by ?
  • How often pattern fails (pattern selectivity)
  • How much gain each early filtering brings (unit
    gain)

32
Necessity of Design Guideline
Plan without SQO
Plan with SQO (1 ending mark)
Plan with SQO but no guideline considered (30
ending marks)
Selectivity of Pattern with the Only Useful
Ending Mark
33
Conclusion
  • First SQL on streaming XML
  • Support SQO on nested XQuery with or //
  • Offer criteria of useful constraints
  • Ensure maximal benefits and minimal overhead in
    SQO application
  • Provide execution strategy in widely-used
    automata-based model
  • Implement SQO optimizer in Raindrop system
    (VLDB04 demo)
  • Experimentally demonstrate SQO brings significant
    improvement with little overhead

34
  • Visit our XQuery engine over XML stream
    project (RAINDROP) website
  • http//davis.wpi.edu/dsrg/raindrop/

Supported by USA National Science
Foundation and IBM PhD Fellowship
Write a Comment
User Comments (0)
About PowerShow.com