Title: Structural Joins: A Primitive for Efficient XML Query Pattern Matching
1Structural Joins A Primitive for Efficient XML
Query Pattern Matching
2Example XPath Query
- booktitleXML//author.jane
- Structural Relationships
- book/title, title/XML, book//author,
author/jane
book
title
author
XML
jane
3Overview
- Range-based XML Numbering Scheme
- (DocId, StartPosEndPos, LevelNum)
- Structural Relationships
- (D2, S2 E2, L2) is a descendant of (D1,S1
E1,L1) iff D1D2, S1 lt S2 and E2 ltE1 - Parent-child above conditions L1 1 L2
- 2 Families of Structural Join Algorithms
- Tree-Merge Anc Desc
- Stack-Tree Anc Desc
4A Sample XML Document Fragment Tree
Representation
5Structural Join Algorithms
- AList a1, a2,
- list of potential ancestors, sorted on StartPos
- DList d1, d2,
- list of potential descendants, sorted on StartPos
- OutputList (ai,dj),
- join results, sorted
- Either by (DocId, ai.StartPos, dj.StartPos) //
Anc version - Or by (DocId, dj.StartPos, ai.StartPos) // Desc
version
6Algorithm Tree-Merge-Anc
7Algorithm Tree-Merge-Desc
8Stack-Tree Algorithms
- Depth first traversal of XML tree
- Conceptual merge of AList nodes DList nodes on
StartPos - Stack of AList nodes
- Node pushed onto the stack is a descendant of the
node below it on the stack - 3 cases (Stack-Tree-Desc version)
- A/DList node is not a descendant of stack top
pop - AList node is a descendant of stack top push
- D List node is a descendant of stack top output
9Algorithm Stack-Tree-Desc (parent/child case)
a1 d1 a2 d2 . . . . an dn dn1 dn2
. . d2n
a1
d1
d2n
a2
an
d2
d2n-1
a3
. . .
...
d3
d2n-2
an
dn
dn1
a2
a1
? e.startPos gt stack-gttop.endPos
(a1,d1)
(a2,d2)
...
(an-1,dn-1)
(an,dn)
(an,dn1)
(an-1,dn2)
...
(a3,d2n-2)
(a2,d2n-1)
(a1,d2n)
10Algorithm Stack-Tree-Desc
11Algorithm Stack-Tree-Anc
- Problem
- Sorting on StartPos of DList nodes is
natural/easy - Sorting on StartPos of AList nodes is not
- Solution
- keep 2 lists of matching descendant nodes with
each stack node - self-list
- inherit-list
12Algorithm Stack-Tree-Anc(parent/child case)
? e.startPos gt stack-gttop.endPos
a1 d1 a2 d2 . . . . an dn dn1 dn2
. . d2n
an
(an,dn)
(an,dn1)
. . .
(an-1,dn-1)
(an,dn), (an,dn1)
. . .
a2
(a2,d2)
(a2,d2n-1)
(a3,d3),(a3,d2n-2)...(an,dn),(an,dn1)
a1
(a1,d1)
(a1,d2n)
(a2,d2),(a2,d2n-1)...(an,dn),(an,dn1)
13Algorithm Stack-Tree-Anc
14Experiment XML Data Queries
15Experimental Results
16Efficient Structural Joins on Indexed XML
Documents
17Motivation (Why using indices?)
18A Sample XML Document
19XML Data Indexed with B Tree
- Key (DocID, tag name, StartPos)
20Algorithm Anc_Des_B
21Typo
- Section 3. Structural Join using B-trees
Chie02 - 4-th paragraph (i.e. 1-st paragraph of the right
column of 4-th page) - Correction
- Figure 3a depicts (2) pop a3 and a2 from the
stack A list (4) push as follows after a2
is popped from the stack, directly go to a14.
Here than a2.end. - ?
- Figure 3a depicts (2) pop a3 from the stack
A list pop a2 (4) push as follows after a3 is
popped from the stack, directly go to a14 (after
popping up a2). Here than a2.end. - Note
- The above paragraph is where how algorithm
Stack_Tree_Desc Alkh 02 would work for the case
of Fig. 3(a) is described. According to algorithm
Stack_Tree_Desc Alkh 02, the corrected
description is accurate.