On the Minimization of XPath Queries - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

On the Minimization of XPath Queries

Description:

'Retrieve the editors that published thrillers and whose authors have written a thriller. ... 'Retrieve the editors that published thriller.' 8. Introduction cont'd ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 44
Provided by: cmpeBo
Category:

less

Transcript and Presenter's Notes

Title: On the Minimization of XPath Queries


1
On the Minimization of XPath Queries
  • Paper by
  • S. Flesca, F. Furfaro, E. Masciari
  • CmpE 521 Presentation
  • Emre Yurtsever 2002701372
  • Muammer Yüzügüldü 2003700183

2
Outline
  • Introduction
  • Trees Tree Patterns
  • Problem Statement
  • A Framework for minimizing XPath queries
  • Complexity Results
  • Tractability Results
  • Conclusions Future Works

3
Introduction
  • XML Queries are usually expressed by means of
    XPath expressions.
  • XPath expressions.
  • A way of navigating an XML tree to return the set
    of nodes through the paths specified by the
    expression.

4
Introduction contd
  • An XPath expression can be represented
    graphically as a tree pattern.

An XML Tree...
5
Introduction contd
  • For example
  • find the titles of all the books for which at
    least one author is known.
  • XPath Expression
  • bib/book//author/title

Descendant Edge
Output Node
A tree pattern
6
Introduction contd
  • Efficiency of XPath expression depends on size.
  • Optimization Minimization
  • We should minimize the expression.

7
Introduction contd
  • Example Query
  • Retrieve the editors that published thrillers
    and whose authors have written a thriller.
  • query containment.
  • Reduced Query
  • Retrieve the editors that published thriller.

8
Introduction contd
  • Minimization problem for XPath fragments can be
    efficiently solved as
  • It can be reduced to solve a number of instances
    of containment between pairs of tree patterns.
  • For these fragments, it can be reduced to find a
    homomorphism between them.

9
Trees Tree Patterns
  • A tree t is a tuple (rt, Nt, Et, ?t) where
  • Nt ? N, set of nodes.
  • ?t Nt ? ? is a node labelling function.
  • rt ? Nt is the distinguished root of t.
  • Et ? Nt x Nt, set of edges.

10
Trees Tree Patterns contd
  • Given a tree t (rt, Nt, Et, ?t)
  • Tree t (rt, Nt, Et, ?t) is the subtree if
  • Nt ? Nt
  • The edge (ni, nj) belongs to Et iff ni ? Nt, nj
    ? Nt and (ni, nj) ? Et.

11
Trees Tree Patterns contd
  • Definition
  • A tree pattern p is a pair (tp, op), where
  • tp (rp, Np, Ep, ?p) is a tree.
  • Ep is partitioned into the two disjoint sets Cp
    and Dp denoting, respectively, the child and
    descendent branches
  • op ? Np is a distinguished output node.

12
Trees Tree Patterns contd
  • Grammar for XPath expressions
  • exp ? exp exp/exp exp//exp expexp
    ? .
  • where ? is a symbol in ?, and the symbol .
    stands for the current node.
  • Given XPath expression
  • ab///c//d

a
b
d

c
13
Trees Tree Patterns contd
  • Given a tree t and a tree pattern p, an embedding
    e of p into t is a total function e Np ? Nt,
    such that
  • e(rp) rt,
  • ?(x y) ? Cp, e(y) is a child of e(x) in t,
  • ?(x y) ? Dp, e(y) is a descendant of e(x) in t,
    and
  • ?x ? Np, if ?p(x) a (where a ? ) then ?t(e(x))
    a.

14
Trees Tree Patterns contd
  • Models and Canonical Models of Tree Patterns
  • The models of a tree pattern p defined over the
    alphabet ? are the trees of T? which can be
    embedded by p. The set of models of p is Mod(p)
    t ? T? p(t) ? ?
  • Canonical models of a tree pattern p are models
    having the same shape as p. That is, a canonical

15
Trees Tree Patterns contd
  • Model and Canonical Model of a tree pattern

16
Trees Tree Patterns contd
  • Given two tree patterns p1, p2, we say that p1 is
    contained in p2 (p1 ? p2) iff ?t p1(t) ? p2(t).
  • We say that p1 and p2 are equivalent (p1 ? p2)
    iff p1 ? p2 and p2 ? p1 (i.e. ?t p1(t) p2(t)).
  • The set of patterns which are equivalent to a
    given pattern p will be denoted as Eq(p).

17
Trees Tree Patterns contd
  • Notations on tree patterns.

A pattern p and its subpatterns spb, spd, spa
18
Trees Tree Patterns contd
  • Tree pattern p whose root has 2 children

Subpattern examples
19
Problem Statement
  • Given a tree pattern p, construct a tree
    pattern pmin which is equivalent to p and having
    minimum size (i.e. size(pmin) minsize(p))

20
Problem Statement contd
  • a minimum size tree pattern equivalent to p can
    be found among the subpatterns of p
  • the containment between two tree patterns p, q (p
    ? q) is equivalent to the problem of finding a
    homomorphism from q to p. A homomorphism h from a
    pattern q to a pattern p is a total mapping from
    the nodes of q to the nodes of p such that
  • h preserves node types (i.e. ?u ? Nq ?q(u) ? '
    ) ?q(u) ?p(h(u)))
  • h preserves structural relationships
    (i.e.whenever v is a child (resp. descendant) of
    u in q, h(v) is a child (resp. descendant) of
    h(u) in p).

21
Problem Statement contd
A homomorphism between two tree patterns
22
Problem Statement contd
Two tree patterns not related with homomorphism
23
A framework for minimizing XPath Queries
  • Two fundamental contribution
  • Proving that property 1 holds for XP/, //, ,
  • An algorithm for minimizing a tree pattern query

24
Proving that Property holds for XP/, //, ,
  • Various lemmas are introduced
  • Lemma 1 Let p and q be two patterns with root
    r, such that p contained in q. Then, for each
    subpattern Qj element of P(q) there exists a
    subpattern Pi element of P(p) s.t Pi contained in
    Qi.

25
Example for Lemma 1

26
Proving that Property holds for XP/, //, ,
  • Lemma 2 Let p and be two patterns rooted in r
    s.t pq and let m and n, with mgtn, be the number
    of children of r in p and, respectively, q. Then,
    there exist a set S subset of SP(p) consisting of
    m-n subpatterns spi, such that p-S p

27
Example for Lemma 2
28
Proving that Property holds for XP/, //, ,
  • Lemma 3 Let p and q be two equivalent patterns
    rooted in r having the same number of child and
    descendant nodes of r, and let q be of minimum
    size. Then, there not exists a subpattern spk
    element of SP(p) such that p spk p

29
Proving that Property holds for XP/, //, ,
  • Lemma 4 Let p and q be two eqivalent patterns
    whose roots have the same number of child and
    descendant nodes, and let q be of minimum size.
    For each subpattern Pi element of P(p) there
    exists a unique subpattern Qj element of P(q)
    directly connected to rq s.t pi?qj

30
Proving that Property holds for XP/, //, ,
  • Lemma 5 A pattern p in XP , /, //, is not
    of minimum size iff at least one of the following
    conditions hold
  • there exists a pair of subpatterns pi, pj s.t pi
    contained in pj
  • there exists a subpattern pi of p which is not of
    minimum size.

31
Proving that Property holds for XP/, //, ,
  • Theorem 1 Given a pattern p in XP/, //, ,
    if minsize(p) k then there exists a subpattern
    pmin of p such that p pmin and size(pmin)k

32
An Algorithm for tree pattern minimization
  • Function Minimize(pa tree pattern)pmin a
    minimum tree pattern equivalent to p
  • Begin
  • pmin p
  • For each pi element of P(pmin) do
  • if (pmin -spi contained in pmin)
  • pmin pmin spi
  • SPnew 0
  • For each spi element of SP(pmin) do
  • SPnew SPnew Minimize(spi)
  • pmin assemble (pmin, SPnew)
  • return pmin
  • End

33
Upper Bound
  • Algorithm 1 works in O(br(p2)((w1)(d1)))
  • p is the size of p
  • d is the number descendant edges in p
  • w is one the longest chain of in p
  • b is the number branches of p as b
  • r is the maximum degree of any node of p

34
Complexity Results
  • In XPath/, //, , it is not possible to
    define an algorithm performing much better than
    Algorithm 1
  • Lemmma 6 Let p be a pattern in XP/, //, ,
    and k is possitive integer. The problem of
    testing if minsize(p)gtk is NP-complete problem

35
Complexity Results
  • Theorem 2 Let p be a pattern in XP/, //, ,
    and k a positive integer. The problem of
    testing if there exists a pattern p equivalent
    to p such that size(p)lt k is coN-complete

36
Tractability Results
  • Definition A limited branched tree pattern p is
    a tree pattern in XP/, //, , such that
  • Every non leaf node of p may have any number of
    children
  • If a node n has k children n1...nk, then at least
    k-1 of the patterns spn, (where i element
    1...k) are linear.

37
Example
38
Tractability Results
  • Theorem Let p a limited branched tree pattern.
    A minimum pattern pmin equivalent to p can be
    found in polynomial time. (w.r.t. The size of p)
  • Linear patterns have minimum size
  • The containment between pairs of linear patterns
    can be decided in polynomial time.

39
Tractability Results
  • Algorithm 2
  • Function Minimize(pa boundend branched tree
    pattern)pmin a minimum tree pattern equivalent
    to p
  • Begin
  • pmin p
  • B b1, ...., bm
  • while(B ! 0)
  • b deepest(B)
  • q spb
  • Redq 0
  • For each pi element of P(q) do
  • For each qj of P(q) do
  • if ((i!j) (qi is linear) (qj not element
    of Redq) (qj contained in qi))
  • Redq Redq qi
  • q q Redq
  • pmin replace(pmin, sqb, q)
  • end while
  • return pmin
  • End

40
Conclusion
  • It has been proved the global minimality
    property, a minimum tree pattern equivalent to a
    given tree pattern p can be found amoung the
    subpatterns of p, and thus obtained by prunning
    redundant branches from p.
  • It has been characerized the complexity of the
    minimization problem, showing that the
    corresponding decisional problem is
    coNP-complete.

41
Conclusion
  • It has been studied a tractable form of tree
    pattern which can be minimized in polynomial
    time.
  • It has been provided by some algorithms proposed
    in the paper.

42
Future Works
  • Extending minimization framework to deal with
    XPath queries that must satisfy some constraints
    such as join conditions on tree pattern nodes.
  • The introduction of these constraints makes the
    minimization problem harder, and global
    minimality property does not hold.

43
Questions?
  • Thank you...
Write a Comment
User Comments (0)
About PowerShow.com