Title: On the Minimization of XPath Queries
1On the Minimization of XPath Queries
- Paper by
- S. Flesca, F. Furfaro, E. Masciari
- CmpE 521 Presentation
- Emre Yurtsever 2002701372
- Muammer Yüzügüldü 2003700183
2Outline
- Introduction
- Trees Tree Patterns
- Problem Statement
- A Framework for minimizing XPath queries
- Complexity Results
- Tractability Results
- Conclusions Future Works
3Introduction
- XML Queries are usually expressed by means of
XPath expressions. - XPath expressions.
- A way of navigating an XML tree to return the set
of nodes through the paths specified by the
expression.
4Introduction contd
- An XPath expression can be represented
graphically as a tree pattern.
An XML Tree...
5Introduction contd
- For example
- find the titles of all the books for which at
least one author is known. - XPath Expression
- bib/book//author/title
Descendant Edge
Output Node
A tree pattern
6Introduction contd
- Efficiency of XPath expression depends on size.
- Optimization Minimization
- We should minimize the expression.
7Introduction contd
- Example Query
- Retrieve the editors that published thrillers
and whose authors have written a thriller. - query containment.
- Reduced Query
- Retrieve the editors that published thriller.
8Introduction contd
- Minimization problem for XPath fragments can be
efficiently solved as - It can be reduced to solve a number of instances
of containment between pairs of tree patterns. - For these fragments, it can be reduced to find a
homomorphism between them.
9Trees Tree Patterns
- A tree t is a tuple (rt, Nt, Et, ?t) where
- Nt ? N, set of nodes.
- ?t Nt ? ? is a node labelling function.
- rt ? Nt is the distinguished root of t.
- Et ? Nt x Nt, set of edges.
10Trees Tree Patterns contd
- Given a tree t (rt, Nt, Et, ?t)
- Tree t (rt, Nt, Et, ?t) is the subtree if
- Nt ? Nt
- The edge (ni, nj) belongs to Et iff ni ? Nt, nj
? Nt and (ni, nj) ? Et.
11Trees Tree Patterns contd
- Definition
- A tree pattern p is a pair (tp, op), where
- tp (rp, Np, Ep, ?p) is a tree.
- Ep is partitioned into the two disjoint sets Cp
and Dp denoting, respectively, the child and
descendent branches - op ? Np is a distinguished output node.
12Trees Tree Patterns contd
- Grammar for XPath expressions
- exp ? exp exp/exp exp//exp expexp
? . - where ? is a symbol in ?, and the symbol .
stands for the current node. - Given XPath expression
- ab///c//d
a
b
d
c
13Trees Tree Patterns contd
- Given a tree t and a tree pattern p, an embedding
e of p into t is a total function e Np ? Nt,
such that - e(rp) rt,
- ?(x y) ? Cp, e(y) is a child of e(x) in t,
- ?(x y) ? Dp, e(y) is a descendant of e(x) in t,
and - ?x ? Np, if ?p(x) a (where a ? ) then ?t(e(x))
a.
14Trees Tree Patterns contd
- Models and Canonical Models of Tree Patterns
- The models of a tree pattern p defined over the
alphabet ? are the trees of T? which can be
embedded by p. The set of models of p is Mod(p)
t ? T? p(t) ? ? - Canonical models of a tree pattern p are models
having the same shape as p. That is, a canonical
15Trees Tree Patterns contd
- Model and Canonical Model of a tree pattern
16Trees Tree Patterns contd
- Given two tree patterns p1, p2, we say that p1 is
contained in p2 (p1 ? p2) iff ?t p1(t) ? p2(t). - We say that p1 and p2 are equivalent (p1 ? p2)
iff p1 ? p2 and p2 ? p1 (i.e. ?t p1(t) p2(t)). - The set of patterns which are equivalent to a
given pattern p will be denoted as Eq(p).
17Trees Tree Patterns contd
- Notations on tree patterns.
A pattern p and its subpatterns spb, spd, spa
18Trees Tree Patterns contd
- Tree pattern p whose root has 2 children
Subpattern examples
19Problem Statement
-
- Given a tree pattern p, construct a tree
pattern pmin which is equivalent to p and having
minimum size (i.e. size(pmin) minsize(p))
20Problem Statement contd
- a minimum size tree pattern equivalent to p can
be found among the subpatterns of p - the containment between two tree patterns p, q (p
? q) is equivalent to the problem of finding a
homomorphism from q to p. A homomorphism h from a
pattern q to a pattern p is a total mapping from
the nodes of q to the nodes of p such that - h preserves node types (i.e. ?u ? Nq ?q(u) ? '
) ?q(u) ?p(h(u))) - h preserves structural relationships
(i.e.whenever v is a child (resp. descendant) of
u in q, h(v) is a child (resp. descendant) of
h(u) in p).
21Problem Statement contd
A homomorphism between two tree patterns
22Problem Statement contd
Two tree patterns not related with homomorphism
23A framework for minimizing XPath Queries
- Two fundamental contribution
-
- Proving that property 1 holds for XP/, //, ,
- An algorithm for minimizing a tree pattern query
24Proving that Property holds for XP/, //, ,
- Various lemmas are introduced
- Lemma 1 Let p and q be two patterns with root
r, such that p contained in q. Then, for each
subpattern Qj element of P(q) there exists a
subpattern Pi element of P(p) s.t Pi contained in
Qi.
25Example for Lemma 1
26Proving that Property holds for XP/, //, ,
- Lemma 2 Let p and be two patterns rooted in r
s.t pq and let m and n, with mgtn, be the number
of children of r in p and, respectively, q. Then,
there exist a set S subset of SP(p) consisting of
m-n subpatterns spi, such that p-S p
27Example for Lemma 2
28Proving that Property holds for XP/, //, ,
- Lemma 3 Let p and q be two equivalent patterns
rooted in r having the same number of child and
descendant nodes of r, and let q be of minimum
size. Then, there not exists a subpattern spk
element of SP(p) such that p spk p
29Proving that Property holds for XP/, //, ,
- Lemma 4 Let p and q be two eqivalent patterns
whose roots have the same number of child and
descendant nodes, and let q be of minimum size.
For each subpattern Pi element of P(p) there
exists a unique subpattern Qj element of P(q)
directly connected to rq s.t pi?qj
30Proving that Property holds for XP/, //, ,
- Lemma 5 A pattern p in XP , /, //, is not
of minimum size iff at least one of the following
conditions hold - there exists a pair of subpatterns pi, pj s.t pi
contained in pj - there exists a subpattern pi of p which is not of
minimum size.
31Proving that Property holds for XP/, //, ,
- Theorem 1 Given a pattern p in XP/, //, ,
if minsize(p) k then there exists a subpattern
pmin of p such that p pmin and size(pmin)k
32An Algorithm for tree pattern minimization
- Function Minimize(pa tree pattern)pmin a
minimum tree pattern equivalent to p - Begin
- pmin p
- For each pi element of P(pmin) do
- if (pmin -spi contained in pmin)
- pmin pmin spi
- SPnew 0
- For each spi element of SP(pmin) do
- SPnew SPnew Minimize(spi)
- pmin assemble (pmin, SPnew)
- return pmin
- End
33Upper Bound
- Algorithm 1 works in O(br(p2)((w1)(d1)))
- p is the size of p
- d is the number descendant edges in p
- w is one the longest chain of in p
- b is the number branches of p as b
- r is the maximum degree of any node of p
34Complexity Results
- In XPath/, //, , it is not possible to
define an algorithm performing much better than
Algorithm 1 - Lemmma 6 Let p be a pattern in XP/, //, ,
and k is possitive integer. The problem of
testing if minsize(p)gtk is NP-complete problem
35Complexity Results
- Theorem 2 Let p be a pattern in XP/, //, ,
and k a positive integer. The problem of
testing if there exists a pattern p equivalent
to p such that size(p)lt k is coN-complete
36Tractability Results
- Definition A limited branched tree pattern p is
a tree pattern in XP/, //, , such that - Every non leaf node of p may have any number of
children - If a node n has k children n1...nk, then at least
k-1 of the patterns spn, (where i element
1...k) are linear.
37Example
38Tractability Results
- Theorem Let p a limited branched tree pattern.
A minimum pattern pmin equivalent to p can be
found in polynomial time. (w.r.t. The size of p) - Linear patterns have minimum size
- The containment between pairs of linear patterns
can be decided in polynomial time.
39Tractability Results
- Algorithm 2
- Function Minimize(pa boundend branched tree
pattern)pmin a minimum tree pattern equivalent
to p - Begin
- pmin p
- B b1, ...., bm
- while(B ! 0)
- b deepest(B)
- q spb
- Redq 0
- For each pi element of P(q) do
- For each qj of P(q) do
- if ((i!j) (qi is linear) (qj not element
of Redq) (qj contained in qi)) - Redq Redq qi
- q q Redq
- pmin replace(pmin, sqb, q)
- end while
- return pmin
- End
40Conclusion
- It has been proved the global minimality
property, a minimum tree pattern equivalent to a
given tree pattern p can be found amoung the
subpatterns of p, and thus obtained by prunning
redundant branches from p. - It has been characerized the complexity of the
minimization problem, showing that the
corresponding decisional problem is
coNP-complete.
41Conclusion
- It has been studied a tractable form of tree
pattern which can be minimized in polynomial
time. - It has been provided by some algorithms proposed
in the paper.
42Future Works
- Extending minimization framework to deal with
XPath queries that must satisfy some constraints
such as join conditions on tree pattern nodes. - The introduction of these constraints makes the
minimization problem harder, and global
minimality property does not hold.
43Questions?