Core Labeling: A New Way to Compress Transitive Closure - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Core Labeling: A New Way to Compress Transitive Closure

Description:

... Core-II Conclusion Outline Motivation Algorithm for tree pattern query evaluation based on ordered tree matching - Tree ... a twig pattern Q into an XML ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 42
Provided by: ionUwinn6
Category:

less

Transcript and Presenter's Notes

Title: Core Labeling: A New Way to Compress Transitive Closure


1
Core Labeling A New Way to Compress Transitive
Closure
  • Yangjun Chen
  • Dept. Applied Computer Science,
  • University of Winnipeg
  • 515 Portage Ave.
  • Winnipeg, Manitoba, Canada R3B 2E9

2
Outline
  • Motivation
  • Tree labeling
  • Main algorithm
  • - Core tree
  • - Graph labeling Core-I
  • - Graph labeling Core-II
  • Conclusion

3
Motivation
  • Efficient method to evaluate sparse graph
    reachability queries
  • Given a directed sparse graph G, check whether a
    node v is reachable from another node u through a
    path in G.
  • Application
  • XML data processing, gene-regulatory networks or
    metabolic networks. It is well known that XML
    documents are often represented by tree
    structure. However, an XML document may contain
    IDREF/ID references that turn itself into a
    directed, but sparse graph a tree structure plus
    a few reference links. For a metabolic network,
    the graph reachability models a relationship
    whether two genes interact with each other or
    whether two proteins participate in a common
    pathway. Many such graphs are sparse.

4
Motivation
  • A simple method
  • - store a transitive closure as a matrix

O(n2) space
M ? M
5
Tree labeling
  • Tree encoding
  • Let G be a sparse graph. we will first find a
    spanning tree T of G.
  • Each node v in T will be assigned an interval
    start, end), where start is vs preorder number
    and end - 1 is the largest preorder number among
    all the nodes in Tv. So another node u labeled
    start, end) is a descendant of v (with respect
    to T) iff start ? start, end).

0, 12)
a
r
5, 9)
9, 12)
e
b
1, 5)
6, 9)
h
d
f
g
4, 5)
c
i
11, 12)
7, 8)
2, 4)
j
8, 9)
10, 11)
k
3, 4)
Let v and u be two nodes in T, labeled a, b) and
a, b), respectively. If a ? a, b), v is a
descendant of u. In this case, we say, a, b) is
subsumed by a, b). Also, we must have b ? b.
Therefore, if v and u are not on the same path in
T, we have either a ? b or a ? b. In the former
case, we say, a, b) is smaller than a,
b), denoted a, b) ? a, b). In the latter
case, a, b) is smaller than a, b).
6
Tree labeling
  • Tree encoding

Interval sequences (label space)
0, 12)
a
r
5, 9)
9, 12)
e
b
1, 5)
6, 9)
h
d
f
g
4, 5)
c
11, 12)
7, 8)
2, 4)
i
j
8, 9)
10, 11)
k
3, 4)
7
Main Algorithm
  • Core tree (core of G)
  • Let T be a spanning tree. We denote E the set
    of all the non-tree edges. Denote V the set of
    all the end points of the non-tree edges. Then,
    V
  • Vstart ? Vend, where Vstart stands for a set
    containing all the start nodes of the non-tree
    edges and Vend for all the end nodes of the
    non-tree edges.
  • Definition 1. (anti-subsuming subset) A subset S
    ? Vstart is called an anti-subsuming set iff S
    gt 1 and no two nodes in S are related by
    ancestor-descendant relationship with respect to
    T.

anti-subsumming subsets
Vstart d, f, g, h Vend c, k, e, d, g
a
d, f d, g d, h f, g f, h g, h
d, f, g d, f, h d, g, h f, g, h d, f, g,
h
r
e
b
h
d
f
g
c
i
j
k
8
Main Algorithm
  • Core tree (core of G)
  • Definition 2. (critical node) A node v in a
    spanning tree T of G is critical if
  • v ? Vstart or there exists an anti-subsuming
    subset S v1, v2, ..., vk for k ? 2 such that
    v is the lowest common ancestor of v1, v2, ...,
    vk. We denote Vcritical the set of all critical
    nodes. ?
  • In the graph, node e is the lowest common
    ancestor of f, g, and node a is the lowest
    common ancestor of d, f, g, h. So e and a are
    critical nodes. In addition, each v ? Vstart is a
    critical node. So all the critical nodes of G
    with respect to T are d, f, g, h, e, a.

a
r
e
b
h
d
f
g
c
i
j
k
9
Main Algorithm
  • Core tree (core of G)
  • Definition 3. (core of G) Let G (V, E) be a
    directed graph. Let T be a spanning tree of G.
    The core of G with respect to T is a tree
    structure with the node set being Vcritical and
    there is an edge from u to v (u, v ? Vcritical)
    iff there is a path p from u to v in T and p
    contains no other critical nodes. The core of G
    with respect to T is denoted Gcore (Vcore,
    Ecore).

a h e f d g
0, 12) 2, 4)4, 5)6, 9)9, 12) 2, 4)4, 5)6,
9) 3, 4)4, 5)7, 8) 3, 4)4, 5) 2, 4)8, 9)
Gcore
a
a
r
e
e
b
h
d
f
h
g
d
f
g
c
i
j
k
10
Main Algorithm
  • Core generation
  • Algorithm core-generation(T)
  • Mark any node in T, which belongs to Vstart.
  • Let v be the first marked node encountered during
    the bottom-up searching of T. Create the first
    node for v in Gcore.
  • Let u be the currently encountered node in T. Let
    u be a node in T, for which a node in Gcore is
    created just before u is met. Do (4) or (5),
    depending on whether u is a marked node or not.
  • If u is a marked node, then do the following.
  • (a) If u is not a child (descendant) of u,
    create a link from u to u, called a
    left-sibling link and denoted as
    left-sibling(u) u.

11
Main Algorithm
  • Core generation
  • Algorithm core-generation(T) (continued)
  • (b) If u is a child (descendant) of u, we will
    first create a link from u to u, called a
    parent link and denoted as parent(u) u.
    Then, we will go along a left-sibling chain
    starting from u until we meet a node u which
    is not a child (descendant) of u. For each
    encountered node w except u, set parent(w) ? u.
    Set left- sibling(u) ? u. Remove
    left-sibling(w) for each child w of u.
  • 5. If u is a non-marked node, then do the
    following.
  • (c) If u is not a child (descendant) of u, no
    node will be created.
  • (d) If u is a child (descendant) of u, we will
    go along a left-sibling chain starting from u
    until we meet a node u which is not a child
    (descendant) of u. If the number of the nodes
    encountered during the chain navigation (not
    including u) is more than 1, we will create
    new node in Gcore and do the same operation as
    (4.b). Otherwise, no node is created.

12
Main Algorithm
  • Core tree (core of G)

u is not a child of u.
u
u
u
u
u
u


link to the left sibling
d
d
f
d
f
(c)
(b)
(a)
a
h
r
(e)
(d)
d
f
g
d
f
g
e
b
h
a
d
f
g
c
e
i
(f)
j
f
h
g
d
k
13
Main Algorithm
  • Graph labeling Core-I
  • Definition 4. Let Vcore v1, ..., vg be the
    node set of Gcore. The core label for G is a set
    L(v1), ..., L(vg), where each L(vl) (l 1,
    ..., g) is an interval sequence associated with
    vl, satisfying the following two properties
  • (1) Let L(vl) al1, bl1), ..., alr, blr) for
    some r. Then, for any i, j ? 1, ..., r, ali ?
    blj if i lt j. That is, ali, bli) ? alj, blj)
    for i lt j. (In this sense, the intervals in
    L(vl) are considered to be sorted.)
  • (2) Let a, b) be the interval associated with a
    descendant of vl with respect to G. There exists
    an interval ali, bli) (1 ? i ? r) in L(vl) such
    that a ? ali, bli).
  • Definition 5. (link graph) Let G (V, E) be a
    directed graph. Let T be a spanning tree of G.
    The link graph of G with respect to T is a graph,
    denoted Glink, with the node set being V (the
    end points of all the non-tree edges) and the
    edge set E ? E, where (v, u) ? E iff v ?
    Vend, u ? Vstart, and there exists a path from v
    to u in T.

14
Main Algorithm
  • Graph labeling Core-I

Glink
e
h
g
c
d
f
k
Gcom Gcore ? Glink
a h e f d k g c
0, 12) 2, 4)4, 5)6, 9)9, 12) 2, 4)4, 5)6,
9) 3, 4)4, 5)7, 8) 3, 4)4, 5) 3, 4) 2,
4)8, 9) 2, 4)
0, 12)
a
h
reverse topological order
6, 9)
e
9, 12)
c
d
f
g
2, 4)
8, 9)
7, 8)
4, 5)
k
3, 4)
15
Main Algorithm
- Generation of interval sequences 1. Scan the
reverse topological order of Gcom. 2. For each
node v, the interval sequence L(v) is stored in a
linked list Av. Initially, Av contains only one
interval, which is generated by labeling
T. 3. Let v1, ..., vk be the children of v (in
Gcom). Merge Av with each Avl for the child
node vl (l 1, ..., k) as follows.
Assume Av p1 ? p2 ? ... ? pg and Avl q1
? q2 ? ... ? qh. Assume that both Av and Avl
are increasingly ordered. (As we will see soon,
any interval sequence generated by the following
algorithm has this nice property. It contains
only the intervals not on the same path in T.
Initially, Av contains only one interval. It is
considered to be sorted.)
16
Main Algorithm
  • - Generation of interval sequences
  • 4. We step through both Av and Avl from left
    to right. Let pi ai, bi) and qj aj, bj)
    be the intervals encountered. We will conduct
    the following checkings.
  • (i) If ai ? bj, insert qj into Av after pi-1 and
    before pi and move to qj1.
  • (ii) If ai ? aj, bj), remove pi from Av and
    move to pi1. (pi is subsumed by qj.)
  • (iii) If aj ? ai, bi), ignore qj and move to
    qj1. (qj is subsumed by pi but it should
    not be removed from Avl.)
  • (iv) If aj ? bi, ignore pi and move to pi1.
  • (v) If ai aj and bi bj, ignore both pi and
    qj, and move to pi

17
Main Algorithm
- Generation of interval sequences Example.
p
A1 2, 4)4, 5)7, 8) A2 2, 4)8, 9)
q
p
P nil
A
A1 2, 4)4, 5)7, 8) A2 2, 4)8, 9)
A1 2, 4)4, 5)7, 8)8, 9) A2 2, 4)8, 9)
q
q
18
Main Algorithm
- Core labels
0, 12)
a
2, 4)4, 5)6, 9)
e
2, 4)4, 5)6, 9)9, 12)
3, 4)4, 5)
g
f
d
h
2, 4)8, 9)
3, 4)4, 5)7, 8)
19
Main Algorithm
- Non-tree labeling Let Vcore v1, ..., vj.
We store the core label of G as a list s1
L(v1), ..., sj L(vj). Then, we define a
function f Vcore ? 1, ..., j such that for
each v ? Vcore f(v) i iff si L(v). Based on
the above concepts, we define Core-I below.
f(a) f (h) f (e) f (f) f (d) f (g)
1 2 3 4 5 6
s1 L(a) s2 L(h) s3 L(e) s4 L(f) s5 L(d) s6
L(g)
0, 12) 2, 4)4, 5)6, 9)9, 12) 2, 4)4,
5)6, 9) 3, 4)4, 5)7, 8) 3, 4)4, 5)
2, 4)8, 9)
20
Main Algorithm
- Non-tree labeling Each node v in V is
associated with two nodes v- and v. v- - a
critical node in Tv, which is closest to v.
v - the lowest ancestor of v (in T), which has a
non-tree incoming edge. Example.
r- e, r does not exist. e- e, e e.
a
r
h
e
b
d
f
g
i
c
j
k
21
Main Algorithm
- Non-tree labeling Definition (Core-I) Let v be
a node in G. The non-tree label of v is a pair
ltd, tgt, where - d i if v- exists and f(v-)
i. If v- does not exists, let d be the special
symbol -. - t x, y) if v exists and x,
y) is the interval of v. If v does not exist,
let y be -.
22
Main Algorithm
- Non-tree labeling Proposition Assume that u
and v are two nodes in G, labeled (a1, b1),
ltx1, y1gt) and (a2, b2), ltx2, y2gt),
respectively. Node v is reachable from u iff one
of the following conditions holds (i) a2, b2)
is subsumed by a1, b1), or (ii) There exists an
interval a, b) in sx1 such that for y2 a,
b) we have a ? a, b) (i.e., y2 is subsumed
by a, b) .)
23
Main Algorithm
  • Graph labeling Core-II
  • We can store the core label of G as a d ? g
    boolean matrix M, where d is the number of the
    end nodes of all non-tree edges and g the number
    of the nodes in Gcore.
  • Let u1, u2, ..., ud be all the end nodes of the
    non-tree edges. Let v1, v2, ..., vg be all the
    nodes in Gcore. Assign each ui an index, denoted
    index(ui) (i.e., u1, u2, ..., ud will be assigned
    contiguous integers, starting from 0.) Assign
    each vj an index, denoted index(vj). An entry
    Mindex(ui), index(vj) is set to 1 if there
    exists an interval a, b) in L(vj) such that
    for uis interval a, b) we have a ? a, b)
    otherwise, it is set to 0.

0 1 1 1 1 1
1 1 1 1 1 1
2 1 1 1 1 1
3 0 1 1 0 0
4 0 1 1 0 0
5 1 0 0 0 1
index(c) 0 index(k) 1 index(d) 2 index(e)
3 index(g) 4
Index(a) 0 Index(h) 1 Index(e)
2 Index(f) 3 Index(d) 4 Index(g) 5
0 1 2 3 1
24
  • Conclusion
  • A new algorithm for graph recheabiliy
  • - Core tree
  • - Graph labeling Core-I
  • query time O(log(minb, s))
  • labeling time O(n e t minb, s)
  • space overhead O(n s minb, s )
  • - Graph labeling Core-II
  • query time O(1)
  • labeling time O(n e t minb, s ds
    log(minb, s)
  • space overhead O(n d s)

25
Evaluation of Twig Pattern Queries Based on
Ordered Tree matching
Yangjun Chen Dept. Applied Computer
Science, University of Winnipeg 515 Portage
Ave. Winnipeg, Manitoba, Canada R3B 2E9
26
Outline
  • Motivation
  • Algorithm for tree pattern query evaluation based
    on ordered tree matching
  • - Tree encoding
  • - Algorithm description
  • Index-based algorithm
  • Conclusion

27
Motivation
  • XPath evaluation against XML documents
  • - XPath expression
  • abc and .//d/bc and e//d
  • booktitle Art of Programming//authorfn
    Donald and
  • ln Knuth

book
ltdocumentgt ltbookgt lttitlegt Art of
Programming lt/titlegt ltauthorgt ltfngtDonald
Knuthlt/fngt
title
author
Art of Programming
fn
ln
Knuth
Donald
28
Motivation
  • XPath evaluation against XML documents
  • Evaluation based on unordered tree matching
  • XPath expression
  • Definition An embedding of a twig pattern Q into
    an XML document T is a mapping f Q ? T, from the
    nodes of Q to the nodes of T, which satisfies the
    following conditions
  • (i) Preserve node label For each u ? Q,
    label(u) matches label(f(u)).
  • (ii) Preserve parent-child/ancestor-descendant
    relationships If u ? v in Q, then f(v) is a
    child of f(u) in T if u ? v in Q, then f(v) is a
    descendant of f(u) in T.

Q
T
a
d
b
c
e
g
f
29
Motivation
  • XPath evaluation against XML documents
  • - Evaluation based on ordered tree matching
  • XPath expression
  • abc/following-sibling .//d/following-sibli
    ngbc/following- sibling e//d

30
Motivation
  • XPath evaluation against XML documents
  • - Evaluation based on ordered tree matching
  • Definition An embedding of a twig pattern Q into
    an XML document T is a mapping f Q ? T, from the
    nodes of Q to the nodes of T, which satisfies the
    following conditions
  • (i) Preserve node label For each u ? Q,
    label(u) matches label(f(u)).
  • (ii) Preserve parent-child/ancestor-descendant
    relationships If u ? v in Q, then f(v) is a
    child of f(u) in T if u ? v in Q, then f(v) is a
    descendant of f(u) in T.
  • (iii) Preserve sibling order For any two nodes
    v1 ? Q and v2 ? Q, if v1 is to the left of v2,
    then f(v1) is to the left of f(v2) in T.

T
Q
q3
v6
q1
q2
v4
v5
v1
v3
v2
31
Algorithm for tree pattern query evaluation
  • Tree encoding
  • Let T be a document tree. We associate each node
    v in T with a quadruple (DocId, LeftPos,
    RightPos, LevelNum), denoted as a(v), where DocId
    is the document identifier LeftPos and RightPos
    are generated by counting word numbers from the
    beginning of the document until the start and end
    of the element, respectively and LevelNum is the
    nesting depth of the element in the document.
  • (i) ancestor-descendant a node v1 associated
    with (d1, l1, r1, ln1) is an ancestor of another
    node v2 with (d2, l2, r2, ln2) iff d1 d2, l1 lt
    l2, and r1 gt r2.
  • (ii) parent-child a node v1 associated with
    (d1, l1, r1, ln1) is the parent of another node
    v2 with (d2, l2, r2, ln2) iff d1 d2, l1 lt l2,
    r1 gt r2, and ln2 ln1 1.
  • (iii)from left to right a node v1 associated
    with (d1, l1, r1, ln1) is to the left of another
    node v2 with (d2, l2, r2, ln2) iff d1 d2, r1 lt
    l2.

32
Algorithm for tree pattern query evaluation
  • Tree encoding
  • Example.

T
(1, 1, 9, 1)
v6
(1, 2, 7, 2)
(1, 8, 8, 2)
v4
v5
(1, 3, 3, 3)
(1, 4, 6, 3)
v3
v1
v2
(1, 5, 5, 4)
33
Algorithm for tree pattern query evaluation
  • Main algorithm
  • 1. First, we will number both T and Q in
    postorder. So the nodes in both trees will be
    referenced by their postorder numbers.

T
Q
q3
v6
6
3
q1
q2
4
v4
5
v5
2
1
v1
v3
3
1
2
v2
2. We will access the nodes in T and the nodes
in Q along their postorder numbers. Each time
we meet a node i in Q, we will associate it with
an array, Ai, of length T, indexed from 0 to
T - 1. Ais are manipulated as follows.
34
Algorithm for tree pattern query evaluation
(i) We set a virtual node for T, numbered 0,
which is considered to be to the left of any
node in T. (ii) If we find Qi can be embedded
in Tj, we will set Aij1, ..., Aijk (0 ? k
? j - 1) to j, where each jl (0 ? l ? k) is a
node to the left of j, to record the fact that j
is the closest node to the right of jl such that
Tj embeds Qi.
T
v6
6
v0
4
v4
5
v5
v1
v3
3
1
2
v2
35
Algorithm for tree pattern query evaluation
  • (iii) If some time later we find another node p
    such that Qi can be embedded in Tp, we will
    set Aip1, ..., Aipq to p, where each ps (1 ?
    s ? q) is to the left of p but to the right of
    jk.
  • For all the other nodes j such that Tj embeds
    Qi, we will set values for the entries in Ai in
    the same way as (ii) and (iii).
  • 3. During the process, when we meet i in Q and j
    in T, we will do the following
  • Let i1, ..., ik be the child nodes of i in Q. We
    first check starting from Ai1l, where
  • l mindesc(j) - 1 and desc(j) represents all
    the descendants of j. We begin the
  • searching from mindesc(j) - 1 because it is
    the closest node to the left of a
  • descendant of j, which has the least postorder
    number. Let Ai1l j. If (i, i1) is /-
  • edge, we will check whether (j, j) is a /-edge.
    Otherwise, we only check whether
  • j is descendant of j. If it is not the case, we
    will check Ai1j. This process continues until
    one of the following conditions is satisfied
  • (i) Ai1 is exhausted (we cannot find a
    descendant j of j such that Tj contains
    Qi1 or
  • (ii) we find an j satisfying the parent-child
    or ancestor-descendant relationship, depending on
    whether (i, i1) is a /-edge or a //-edge. Then,
    we will check Ai2j.

36
Algorithm for tree pattern query evaluation
  • If Ai1l, is exhausted (case (i)), it shows that
    Qi1 cannot be embedded in any subtree rooted at
    a child node (for /-edge) or a descendant (for
    //-edge) of j. It indicates that Qi1 cannot be
    embedded into Tj and thus Tj cannot embed
    Qi. We will continue to check i against a next
    node in T.
  • If it is case (ii), we will check Ai2, starting
    from j. For all the other Ails (l 3, ...,
    k), we will do the same checkings. If for each il
    (l 1, ..., k) we can find j such that Tj
    embeds Qil , it shows that Tj embeds Qi and
    we will set some new values in Ai as described in
    (2).

l
Q
T
j
i
i2
i1
ik

j
j
l
37
Algorithm for tree pattern query evaluation
Example.
T
v6
6
v0
4
v4
5
v5
v1
v3
3
1
2
v2
(f)
The time complexity of the algorithm is O(TQ).
(e)
38
Index-base algorithm
  • XB-tree
  • An XB-tree is a variant of B-tree over a
    quadruple sequences.

(1, 3, 3, 3) (1, 5, 5, 4) (1, 4, 6, 3) (1, 2, 7,
2) (1, 8, 8, 2) (1, 1, 9, 1)
sorted by RightPos values
P1
P.parentIndex
3, 5 2, 7 1, 9
P.parent
P2
P3
P4
3, 3 5, 5
4, 6 2, 7
8, 8 1, 9
c
b
c
b
c
a
39
Index-base algorithm
  • Searching an XB-tree
  • - ? (P, i) indicates that the ith entry in
    the page P is currently accessed.
  • - advance(b) (going up from a page to its
    parent) If b (P, i) does not point to the
    last entry of P, i ? i 1. Otherwise, b ?
    (P.parent, P.parentIndex).
  • - drilldown(b) (going down from a page to one of
    its children) If b (P, i) and P is not a leaf
    page, b ? (P, 1), where P is the ith child
    page of P.
  • - Initially, b ? (rootPage, 1), pointing to the
    first entry in the root page. We finish a
    traversal of the XB-tree when b (rootPage,
    last), where last points to the last entry in
    the root page, and we advance it (in this case,
    we set b to nil).

40
Index-base algorithm
  • Searching an XB-tree
  • Assume that i in Q is the node currently
    encountered. We will find, by
  • searching the XB-tree, a node j of T with
    label(i) label(j), for which it is possible
    that Tj embeds Qi.
  • - L(i) - the most recently found node such that
    Qi can be embedded into TL(i).
  • Procedure search(XB, i)
  • Let i1, ..., ik be the children of i. Assume that
    L(ik) v. l ? v.LeftPos. r ? v.RightPos. If i is
    a leaf node, then l ? ?, r ? 0.
  • Assume that ? (P, c). Let j be the entry
    pointed to by ?. We will do the following
    checkings.
  • If P is a leaf page, label(j) label(i) and
    j.LeftPos lt l and j.RightPos gt r, then
  • ? ? advance(?), return j.
  • If P is an internal page, and j.LeftPos lt l and
    j.RightPos gt r, ? ? drilldown(?).
  • If j.RightPos lt r, then ? ? advance(?). If ?
    nil, return nil.
  • Repeat (2) until the whole XB-tree is traversed
    (i.e., when ? nil) or a node j is found (i.e.,
    the condition in (2)-(i) is satisfied).

41
  • Conclusion
  • Algorithm for evaluating tree pattern
  • queries based on ordered tree matching
  • time complexity O(TQ).
  • Space complexity O(TQ).
  • The algorithm can be integrated into an
  • index environment by using XB-trees.
Write a Comment
User Comments (0)
About PowerShow.com