Title: XPath Query Evaluation - A Top Down Approach
1XPath Query Evaluation- A Top Down Approach
- Mohammed Pithapurwala (mp66_at_cse.buffalo.edu)
- Pejus Das (pejusdas_at_cse.buffalo.edu)
2Introduction
- XPath Query Evaluation
- Uses
- Select nodes in XML document
- XSLT, XQuery
- Polynomial V/s Exponential
- Top Down Algorithm
3XPath
- What is XPath?
- childsectionposition()lt6 / descendantcite /
attributehref - selects all href attributes in cite elements in
the first 5 sections of an article document - Structure of XPath expression
- Axes
- Node types
- Node test
- Returns
- Number, node set, string, boolean
4Implementation
- XPath Axes
- Child
- Parent
- Descendant
- Axes Functions
- FirstChild
- nextSibling
- Child firstchild.nextsibling
- Parent (nextsibling-1).firstchild-1
- Descendant firstchild.(firstchild ??
nextsibling)
5Code Snippet
- public static Element firstChild(Element
currNode) - Element fChild
-
- fChild null
- List childNode currNode.getChildren()
- Iterator iterator childNode.iterator()
-
- if(iterator.hasNext())
- fChild (Element) iterator.next()
-
- return(fChild)
-
6Node Test Expressions
- Node Test Expression
- T(node()) all nodes in the document
- T(attribute(href)) all nodes labelled href
- attribute(S) child(S)?? T(attribute())
- Node Numbering
- lt doc, X
- The node order relative to the axes X in document
order - idxx(x,S)
- Context
- c ?x, k, n?
- x node
- k position of the node
- n context size
- Evaluation of XPath relative to context
7XPath Evaluation
- Xte
- X ? child, parent, descendant, .
- t node test expression
- e expression
- Expressions
- e ? node set, number, string, boolean
- ArithOp ? , -, , div, mod
- EqOP ? ?, ?
8XPath Semantics
????x, k, n? P???(x) ?position()? (?x, k, n?)
k ?last()? (?x, k, n?) n For all other
kinds of expressions, e Op(e1, , em) ?Op(e1,
, em)?(c) ???Op?(?e1?(c),.,?em?(c)) maps a
context to a value type.
9Intuitive Algorithm
P ?t?e1? ?em? (x) begin S
y x ? y, y ? T(t) for 1 ? i ? m
(in ascending order) do S y ?
S ??ei? (y, idx?(y,S), S true
return S end P??1?2?(x) P??1?(x)
P??2?(x) P?/?? (x) P???(root) P??1/?2?(x)
Uy ? P?1(x)P??2?(y)
10Runtime
- Ex
- Doc ltagtltb/gtltb/gtlt/agt
- Query //a/b/parenta/b/parenta/b
- Construct more queries /parenta/b
- procedure process-location-step(n0, Q)
- / n0 is the context node query Q is a list of
location steps / - begin
- node set S apply Q.head to node n0
- if (Q.tail is not empty) then
- for each node n 2 S do process-location-step(n,
Q.tail) - End
- Complexity Time(Q) DQ
11Algorithm
- S???te1em?(X1, ,Xk)
- begin
- S ??x,y? x?? Xi , x ? y, and y??
T(t) - for each 1 i m (in ascending order) do
- begin
- Fix some order S ??x1,y1 ?, , ?xl,yl?? for S
- ?r1,rl? ?ei?(t1,,tl)
- where tj ? yj , idx? (yj,, Sj ), Sj ?
- and Sj z ? xj, z ? ? S
- S ?xi,yi? ri is true
- end
- for each 1 i k do
- Ri y ? x, y ? ? S, x ? Xi
- return ?R1, ,Rk ?
- end
12Algorithm (contd.)
S??/??(X1, ., Xk) S????(root, ., k
times) S???1/?2?(X1, ., Xk)
S???2?(S???1?(X1, ., Xk)) S???1?2?(X1, .,
Xk) S???1?(X1, ., Xk) U??
(S???2?(X1, ., Xk))
13Semantics Function
?????(?x1, k1, n1?, , ?xl, kl, nl?)
S????(x1, ., xl) ???position()?(?x1, k1,
n1?, , ?xl, kl, nl?) ?k1, ., kl?
???last()?(?x1, k1, n1?, , ?xl, kl, nl?)
?n1, ., nl? And ???Op(e1, . em?(c1, ., cl)
??Op? ?? (???e1?(c1, ., cl), .,
???em?(c1, ., cl)) For remaining kind of
expressions
14Benchmark Results in seconds for IE6 vs. the
implementation
Q IE6 IE6 IE6 Top Down Algorithm Top Down Algorithm Top Down Algorithm Top Down Algorithm Top Down Algorithm
Q 10 20 200 10 20 200 500 1000
1 0.00 0.00 0.00 0.00 0.00
2 2 0.00 0.00 0.02 0.12 0.57
3 346 0.00 0.00 0.02 0.23 1.14
4 1 0.00 0.00 0.05 0.33 1.70
5 21 0.00 0.00 0.07 0.44 2.32
6 5 406 0.00 0.00 0.09 0.54 2.88
7 42 0.00 0.00 0.12 0.67 3.45
8 437 0.00 0.00 0.14 0.78 4.03
15References
- G. Gottlob, Ch. Koch, R. Pichler XPath
Processing in a Nutshell. SIGMOD Record,
March'03. - G. Gottlob, Ch. Koch, R. Pichler Efficient
Algorithms for Processing XPath Queries. ACM
TODS, to appear.
16 17