Title: Constant-Time LCA Retrieval
1Constant-Time LCA Retrieval
- Presentation by Danny Hermelin,
- String Matching Algorithms Seminar,
- Haifa University.
2The Lowest Common Ancestor
- In a rooted tree T, a node u is an ancestor of a
node v if u is on the unique path from the root
to v. - In a rooted tree T, the Lowest Common Ancestor
(LCA) of two nodes u and v is the deepest node in
T that is the ancestor of both u and v.
3For example
1
2
3
4
5
6
- Node 3 is the LCA of nodes 4 and 6.
- Node 1 is the LCA of node 2 and 5.
4The LCA Problem
- The LCA problem is then, given a rooted tree T
for preprocessing, preprocess it in a way so that
the LCA of any two given nodes in T can be
retrieved in constant time. - In this presentation we shall present a
preprocessing algorithm that requires no more
then linear time and space complexity.
5The assumed machine model
- We make the following two assumptions on our
computational model. - Let n denote the size of our input in unary
representation - All arithmetic, comparative and logical
operations on numbers whose binary representation
is of size no more then logn bits can be done in
constant time. - We assume that finding the left-most bit or the
right-most bit of a logn sized number can be done
in constant time.
6- The first assumption is a very reasonable
straightforward assumption considering most
machines on the market today. - The second seems less reasonable but can be
achieved with the help of a few (constant
numbered) tables of size O( n ). - These assumptions helps our discussion focus on
the more interesting parts of the algorithm
solving the LCA problem.
7The Simple caseComplete Binary Tree
- Our discussion begins with a particularly simple
instance of the LCA problem, LCA queries on
complete binary trees. - We will use our knowledge of solving the LCA
problem on complete binary trees and expand it
later on, to solve the LCA problem on any
arbitrary rooted tree T.
8- Let B denote a complete binary tree with n nodes.
- The key here is to encode the unique path from
the root to a node in the node itself. We assign
each node a path number, a logn bit number that
encodes the unique path from the root to the
node.
9The Path Number
- For each node v in B we encode a path number in
the following way - Counting from the left most bit, the ith bit of
the path number for v corresponds to the ith
edge on the path from the root to v. - A 0 for the ith bit from the left indicates that
the ith edge on the path goes to a left child,
and a 1 indicates that it goes to a right child. - Let k denote then number of edges on the path
from the root to v, then we mark the k1 bit (the
height bit) of the path number 1, and the rest of
the logn-k-1 bits 0.
10For example
1
0
node j
0
1
0
node i
- Node is path number is
- Node js path number is
1
0
0
1
1
0
1
0
The height bit is marked in blue Padded bits are
marked in red.
111000
0100
1100
0010
0110
1010
1110
0001
0011
0101
0111
1001
1011
1101
1111
- Path numbers can easily be assigned in a simple
O(n) in-order traversal on B.
12How do we solve LCA queries in B
- Suppose now that u and v are two nodes in B, and
that path(u) and path(v) are their appropriate
path numbers. - We denote the lowest common ancestor of u and v
as lca(u,v). - We denote the prefix bits in the path number,
those that correspond to edges on the path from
the root, as the path bits of the path number.
13- First we calculate path(u) XOR path(v) and find
the left most bit which equals 1. - If there is no such bit than path(u) path(v)
and so u v, so assume that the kth bit of the
result is 1. - If both the kth bit in path(u) and the kth bit
in path(v) are path bits, then this means that u
and v agree on k-1 edges of their path from the
root, meaning that the k-1 prefix of each nodes
path number encodes within it the path from the
root to lca(u,v).
14For example
lca(u,v)
0100
u
0010
v
0111
0 0 1 0 XOR 0 1 1 1 0 1 0 1
path(lca(u,v)
0
1
0
0
height bit
padded bits
15For example
lca(u,v)
1010
u
v
1001
1011
1 0 0 1 XOR 1 0 1 1 0 0 1 0
path(lca(u,v)
1
0
1
0
height bit
padded bit
16- This concludes that if we take the prefix k-1
bits of the result of path(u) XOR path(v), add 1
as the kth bit, and pad logn-k 0 suffix bits, we
get path(lca(u,v)). - If either the kth bit in path(u) or the kth bit
in path(v) (or both) is not a path bit then one
node is ancestor to the other, and lca(u,v) can
easily be retrieved by comparing path(u) and
path(v)s height bit.
17The general LCA algorithm
- The following are the two stages of the general
LCA algorithm for any arbitrary tree T - First, we reduce the LCA problem to the
Restricted Range Minima problem. The Restricted
Range Minima problem is the problem of finding
the smallest number in an interval of a fixed
list of numbers, where the difference between two
successive numbers in the list is exactly one. - Second, we solve the Restricted Range Minima
problem and thus solve the LCA problem.
18The Reduction
- Let T denote an arbitrary tree
- Let lca(u,v) denote the lowest common ancestor of
nodes u and v in T. - First we execute a depth-first traversal of T to
label the nodes in the depth-first order they are
encountered. - In that same traversal we maintain a list L, of
nodes of T, in the same order that they were
visited. - The only property of the depth-first numbering we
need is that the number given to any node is
smaller then the number given to any of its
descendents.
19For example
000
001
010
011
100
101
111
110
- The depth-first traversal creates these depth
numbers and the following list L
L 0,
1,
0,
2,
3,
2,
4,
2,
5,
6,
5,
7,
5,
2,
0
20- Now if want to find lca(u,v), we find the first
occurrence of the two nodes in L, this defines an
interval I in L. - Suppose u occurs in L before v. Now, I describes
the part of the traversal, from the point we
first discovered u to the point we first
discovered v. - lca(u,v) can be retrieved by finding the minimum
number in I.
21- This is due to the following two simple facts
- If u is an ancestor of v then all those nodes
visited between u and v are in us subtree, and
thus the depth-number assigned to u is minimal in
I. -
- If u is not an ancestor of v, then all those
nodes visited between u and v are in lca(u,v)s
subtree, and the traversal must visit lca(u,v).
Thus the minimum of I is the depth-number
assigned to lca(u,v).
22For example..
000
001
010
011
100
101
111
110
L 0,
1,
0,
2,
3,
2,
4,
2,
5,
6,
5,
7,
5,
2,
0
23The Restricted Reduction
- So far weve shown how to reduce the LCA problem
to the range minima problem. This next step shows
how to achieve reduction to the restricted range
minima problem. - Denote level(u) as the number of edges in the
unique path from the root to node u in T. - If L l1, l2, , lz then we build the
following list - Llevel(l1),level(l2),level(lz).
24- We use L in the same manner we used L in the
previous reduction scheme. - This works because in every interval I u,v in
L, lca(u,v) is the lowest node in I for the same
reasons mentioned earlier. - The difference between two adjacent elements in
L is exactly one. - This completes the reduction to the restricted
range minima problem.
25The reduction complexity.
-
- Denote n as the number of nodes in T.
- Depth-first traversal can be done in O( n ) space
and time complexity. - L is of size O( n ) and thus its creation and
initialization can be done in O( n ) space and
time complexity. - To find lca(u,v) we need the first occurrence of
u and v in L. This could be stored in a table of
size O( n ). Thus the creation and initialization
of this table can be done in O( n ) space and
time complexity. - The total space and time complexity of the
reduction is then O( n ).
26The Range Minima Problem
- The Range Minima problem is the problem of
finding the smallest number in an interval of a
fixed list of numbers. - The Restricted Range Minima problem is an
instance of the Range Minima problem where the
difference between two successive numbers is
exactly one.
27More Formally
- The Restricted Range Minima problem is stated
formally in the following - Given a list L l1 , l2 , , ln of n real
numbers, where for each i 1 n-1 li - li1
1, preprocess the list so that for any
interval li , li1 , , lj , - 1 ? i lt j ? n, the minimum over the interval can
be retrieved in constant time.
28Two preprocessing methods for the Range Minima
Problem
-
- The algorithm for solving the Range Minima
problem uses two preprocessing methods - Procedure I uses no assumptions regarding the
difference between adjacent elements, and
requires O(nlogn) space and time complexity. - Procedure II uses the restricted assumption
regarding adjacent elements, and requires
exponential space and time complexity.
29Procedure I
- Suppose that our list L is of size n, and for
convenience purposes suppose n is a power of
2.The procedure has two main stages - First, build a complete binary tree B of size
2n-1 with n leaves. Then for i from 1 to n,
record the ith element of L at leaf i. - Second, for each internal node (not a leaf) in B,
maintain a suffix-list and a prefix-list
containing all prefix minima and suffix minima
with respect to the leaves in its subtree.
30- Let Lv denote the number of nodes in the
subtree rooted by node v which is internal in B.
- A prefix list of an internal node v in B is a
list of size equal to the number of leaves in vs
subtree. The kth entry in the list is then the
smallest number among the numbers represented by
the first consecutive k leaves in vs subtree. - Likewise, a suffix list of v has the same size
and the kth entry in it contains the smallest
number among the numbers represented by the last
consecutive Lv - k 1 leaves in vs subtree.
31For Example
- Suppose L 6, 7, 4, 1, 5, 2, 9, 9
- Then Procedure I builds the following complete
binary tree for L
6 7 4 1 5 2
9 9
32 6 7 4 1 5 2
9 9
- The prefix list of the root node is then
-
6,
6,
4,
1,
1,
1,
1,
1
In the same manner, its suffix list is 1, 1,
1, 1, 2, 2, 9, 9
33Finding the Range Minima
- After the preprocessing stages are complete, the
smallest number in any interval u,v can be
found in constant time as follows - First find the LCA of u and v and call it z.
Recall, we already know how to answer LCA
quarries in complete binary trees, in constant
time. - The minima is then the minimum between the value
of zs left childs suffix list at entry u, and
zs right childs prefix list at entry v.
34For Example
- Suppose I 4, 1, 5, 2 .
- The endpoints of I, 4 and 2, are leaves in B
whos LCA is the root node. - Denote the roots left son as left and the roots
right son as right. - Leaf 4 is then,the third leaf from the left in
lefts subtree and leaf 2 is the second leaf from
the left in rights subtree.
35right
left
6 7 4 1 5 2
9 9
I
- lefts suffix list at entry 3 Min4, 1 1.
- rights prefix list at entry 2 Min2, 5 2.
- The minima over I is then Min1, 2 1.
36- Procedure I clearly requires O(nlogn) time and
space complexity. This is a result of these two
simple facts - The total size of all the prefix and suffix lists
of all the internal nodes of B is O(nlogn). - Each entry in these list requires constant time
to calculate if we use simple dynamic programming
techniques.
37Procedure II
- Procedure II uses the assumption that the
difference between any two adjacent elements of L
is exactly one. We assume without loss of
generality that the first element of L is zero
(since, otherwise, we can subtract from every
element in L the value of the first element, and
then add it to the range-minima result).
38- The procedure runs in two main stages
- First, a table is built with 2n-1 entries in it.
Each entry in this table represents a valid
instance of L, and is a reference to a particular
subtable. - Second, in each subtable we store the answer to
each of the n(n-1)/2 possible range queries.
39- All the possible instances of L are enumerable,
and so are all the range-minima queries, thus,
given an instance of L, any range-minima query on
this L can be answered in constant time.
main table
query table
n
n-1
2
n
40- It is easy to see then, that Procedure II uses
O( ) space and time complexity.
n
n
2
2
We shall now demonstrate how with the use of
Procedure I and Procedure II we achieve linear
time and space preprocessing in order to answer
all range-minima queries on L.
41The Restricted Range-Minima preprocessing
algorithm
- Our algorithm runs in three stages
- First we partition L into logn sized subsets,
giving us a total of n/logn subsets of this kind.
We apply Procedure I to an array of all the
minimums of these subsets.
42subset minima
logn
n
43- Furthermore, each subset of size logn we
partition into smaller subsets of size loglogn - giving us logn/loglogn partitions in each
subset. Again we apply Procedure I to an array of
all the minimums of these loglogn partitions.
44subset minima
subset partition minima
loglogn
logn
45- Finally, we run Procedure II to build the table
required for any array of size loglogn. For each
subset partition we identify its proper entry in
our table.
46loglogn
logn
procedure II table
47- After these stages are completed any
range-minima query on L, can be answered in
constant time. Consider a query requesting the
minimum over i, j. Then the range i, j can
easily be presented as the union of the following
(at most) five ranges
x
x
x
x
x
x
x
x
- i , , 1, , 1, ,
1, , 1, j
2
3
1
2
3
4
1
4
i
j
48- Where
- i , x1 and x4 1, j fall within a single
subset partition of size loglogn, its minimum is
available in its subtable. -
i , j
x
x
x
x
x
x
x
x
- i , , 1, , 1, ,
1, , 1, j
2
3
1
2
3
4
1
4
i
j
49- x1 1, x2 and x3 1, x4 are unions of
subset partitions of size loglogn and fall within
a single subset of size logn its minimum is
available from the application of Procedure I on
this subset.
i , j
x
x
x
x
x
x
x
x
- i , , 1, , 1, ,
1, , 1, j
2
3
1
2
3
4
1
4
i
j
50- x2 1 , x3 is the union of subsets of size
logn each, its minimum is available from the
first application of Procedure I.
i , j
x
x
x
x
x
x
x
x
- i , , 1, , 1, ,
1, , 1, j
2
3
1
2
3
4
1
4
i
j
51Space and Time Complexity
- Did we archive linear space and time complexity,
as promised? lets check. - Recall our preprocessing algorithm runs in three
stage. Well check each stage separately. - Denote n as the size of our input list L.
- We assume n is a power of 2 for convenience
purposes. -
-
52- The first stage space and time complexity can be
computed as follows - Partitioning L into n/logn subsets of size logn
each, and finding each new subsets minima - Time O( n ) - one pass through L is enough.
- Space O( n/logn ) for storing all subset
data. - Applying Procedure I on an array of n/logn
minima - Time and Space according to Procedure I
complexity - O( n/logn ?? log( n/logn )) ? O( n/ logn ? logn
) - O( n ).
- Total space and time complexity O ( n ).
n/logn lt n
53- The second stage space and time complexity can
be computed as follows - Partitioning each n/logn subset, into smaller
subsets of size loglogn each and finding each new
subsets minima - Time O( n ) - one pass through L is enough.
- Space O( n/loglogn ) for storing all subset
data. - Applying Procedure I on n/logn arrays of
logn/loglogn minima - Time and Space according to Procedure I
complexity - n/logn ?? O( logn/loglogn ?? log( logn/loglogn
)) ? - n/logn ? O( logn/ loglogn ? loglogn ) O( n ).
- Total space and time complexity O ( n ).
logn/loglogn lt logn
54- The third stage simply runs Procedure II on
inputs of size loglogn. So the space and time
complexity of the third stage of the algorithm
can be computed as - follows
- Time and Space according to Procedure II
complexity - O ( 2loglogn ? log2logn ) O( logn ?
log2logn ) ? O ( log2n ) - Total space and time complexity O ( log2n ).
log2logn lt logn
55Total space and time complexity O (n)
56Aftermath
- How much did we really gain by reducing the LCA
problem to the restricted range-minima problem? - Can we be satisfied by just reducing to the
range-minima problem? - If you recall, the restricted range-minima
reduction allows us to use Procedure II which
assumes input of restricted nature. We used
Procedure II to answer range queries of size on
subsets of size equal or smaller then loglogn.
57- We can instead apply Procedure I to each of these
loglogn subset which would total the space and
time complexity of the whole algorithm to O(
nloglogn ). - If we choose to further partition these subset
into subsets of size logloglogn, we would reach
O(nlogloglogn). We can continue in this
fashion for as much as we like, improving our
algorithms complexity along the way. - If k is the number of partition stages our
algorithm applied, then its space and time
complexity equals O(nloglog logn).
k
58- The space and Time complexity of our
preprocessing algorithm for the un-restricted
range minima problem is then O(nlogn) ! - For practical applications the un-restricted
range minima reduction is enough then,
considerably simplifying the implementation
process. - The restricted range minima reduction is needed
mostly for theoretical purposes.
59Bibliography