Title: AVOIDING UNNECESSARY ORDERING OPERATIONS IN XPATH
1AVOIDING UNNECESSARY ORDERING OPERATIONS IN XPATH
- Jan Hidders
- Philippe Michiels
2Problem
- XPath semantics require the result of a query to
be in doc-order and contain no duplicate nodes - Many implementations achieve this by explicitly
ordering intermediate results or sacrificing
correctness for efficiency - In case of large input documents, this approach
impacts the performance - Motivation Galax
3XPath implementation
- Explicit sorting and duplicate elimination after
each step by inserting distinct-docorder (ddo)
operations in query plan
- Semantics of implementation without ddos
sloppy semantics
- Example /descendant-or-selfa/childb
4XPath Properties
- We need to identify several properties of path
expressions that assist us in determining whether
their sloppy semantics are equal to their formal
semantics - Two main properties
- ord (result always in order)
- nodup (result never contains duplicate nodes)
- Additional properties necessary, e.g.
- gen (all nodes in result belong to same
generation) - max1 (result contains at most one node)
- unrel (result contains no related nodes)
- lin (all nodes in result are anc-desc related)
5Rules for the Inference of Path Properties
- We define a set of inference rules for the
deduction of the ord and nodup properties - For example
The gen property is preserved by de child,
parent, foll-sibl, prec-sibl axes.
If the gen property holds, then the ord property
is preserved by the parent axis.
6Deterministic Automata
- The rules allow us to construct a deterministic
automaton that verifies whether the sloppy
semantics of XPath queries have the nodup/ord
property - For brevity, we only discuss the automaton that
checks for the ord-property
7Aord Automaton
8Example (1)
/desc-or-selfa/childb/foll-siblb/parenta
9Example (2)
/ ? 1
10Example (3)
/ ? 1
desc-or-selfa ?1,6
11Example (4)
/ ? 1
desc-or-selfa ? 1,6
childb ? 2,3,4,5,9,10,7,8
12Example (5)
/ ? 1
desc-or-selfa ? 1,6
childb ? 2,3,4,5,9,10,7,8
foll-siblb ? 3,4,5,4,5,5,10,8
13Example (6)
/ ? 1
desc-or-selfa ? 1,6
childb ? 2,3,4,5,9,10,7,8
foll-siblb ? 3,4,5,4,5,5,10,8
Parenta ? 1,1,1,1,1,1,1,6
14Soundness Completeness
- For the XPath-fragment
- P A P/A
- A parentchildancestordescendant...
- the set of inference rules is sound complete
for the ord and nodup properties
15Conclusions
- We can derive whether a query evaluated by the
sloppy semantics, returns a result that is free
from duplicates and/or in document order - We can use this knowledge to
- eliminate unnecessary ddo-operations in the query
plan - rewrite the query to avoid generation of
unnecessary ordering operations
16Further Work
- Our automaton does not consider ddo-operations in
de query plan. The automaton does not define
transitions for ddo-operations - For example, if after every step in the
expression child/foll-sibl/child, a
sorting operation is performed, there is no need
for sorting at the end of the expression. But our
algorithm is incapable of deducing this.
17Further Work (2)
- How to decide where to sort?
Example /descendant-or-selfa/childb/parenta