Title: 4. Search Trees
14. Search Trees
- Balanced Binary Search Trees
- Self-Adjusting Binary Search Trees
Read Tarjan 45-70, CLR 244-277
2Sorted Sets
- A collection of sorted sets is an abstract data
type representing a collection of items each
having a key and belonging to one of several
sets. Sets are identified by one of their items.
Initially, each item belongs to a singleton set.
The operations are - setkey(i,k) Initialize the key of item i to k. i
is assumed to belong to a singleton set. - access(k,s) Return the item in set s having key
k. - insert(i,s) Insert item i into s. i is assumed
to be in a singleton set initially. - delete(i,s) Remove item i from set s. This
leaves i in a singleton set. - join(s1,i,s2) Return set formed by combining s1,
i and s2, where every item in s1 is assumed to
have key less then key(i) and every item in s2 is
assumed to have key greater than key(i). This
operation destroys s1 and s2. - split(i,s) Split the sorted set s containing i
into three sets s1 containing all items with key
less than key(i), i and s2 containing all items
with key larger than key(i). Return the pair
s1,s2. This operation destroys s. - The keys of the items in each set must be
distinct.
3Sorted Sets and Binary Search Trees
- Symmetric ordering - items inserted in trees so
that for every node v - keys of nodes in vs left subtree are smaller
than key(v) - keys of nodes in vs right subtree are larger
than key(v) - To insert new key value, search for key and
insert at place where search falls out of tree.
4Implementing Sorted Sets with BSTs
- typedef int bst, item, keytyp struct tpair bst
t1,t2 - class bsts
- int n
- struct
- int lchild, rchild, parent keytyp
keyfield - vecSPSIZ1
- bsts(int)
- public keytyp key(item)
- void setkey(item,keytyp)
- . . .
-
- define left(x) vecx.lchild
- item bstsaccess(keytyp k, bst t)
- while (t ! Null k ! key(t))
- if (k lt key(t)) t left(t)
- else t right(t)
-
- return t
5- item function access(keytype k, bst t)
- do t ? null and k lt key(t) ? t left(t)
- t ? null and k gt key(t) ? t right(t)
- od
- return t
- end
- procedure insert(item i, bst t)
- item x x t
- do key(i) lt key(x) and left(x) ? null ? x
left(x) - key(i) gt key(x) and right(x) ? null ? x
right(x) - od
- if key(i) lt key(x) ? left(x) i
- key(i) gt key(x) ? right(x) i
- fi
- p(i) x
- end
if any node in tree has key k, then subtree
rooted at t does
proper insertion location for i is in subtree
with root x
6- procedure delete(item i, bst t)
- item j
- if left(i) ? null and right(i) ? null ?
- j left(i)
- do right(j) ? null ? j right(j) od
- swapplaces(i,j)
- fi
- if left(i) null ? left(i) ? right(i) fi
- p(left(i)) p(i)
- if i left(p(i)) ? left(p(i)) left(i)
- i right(p(i)) ? right(p(i)) left(i)
- fi
- left(i),right(i),p(i) null
- end
find node j with next smaller key
i has lt2 children
7- sorted set function join(bst t1, item i, bst
t2) - left(i) t1 right(i) t2
- p(t1), p(t2) i
- return i
- end
- bst, bst function split(item i, bst t)
- bst x,y,t1,t2
- x,y p(i),i t1,t2 left(y),right(y)
- do y ? t and y left(x) ? x,y,t2
p(x),x,join(t2,x,right(x)) - y ? t and y right(x) ? x,y,t1
p(x),x,join(left(x),x,t1) - od
- left(i),right(i),p(i) null
- p(t1), p(t2) null
- return t1, t2
- end
t1 (t2) includes all nodes at or below y that
belong in left (right) tree after split.
8Analysis of Binary Search Trees
- Access takes time proportional to the depth of
the accessed item. - Insert takes time proportional to the depth of
the item after insertion. - Delete takes time proportional to the depth of
the deleted item, if it has a null child and time
proportional to the depth of its symmetric order
predecessor if it has no null child. - Join take constant time.
- Split takes time proportional to the depth of the
item on which the split is taking place. - The depth of a binary search tree on n nodes can
be n-1 in the worst case, so most operations have
worst-case running time ?(n). - We can improve the running time of BST operations
to O(log n) by balancing subtrees.
9Balanced Binary Trees
- A balanced binary tree is a full binary tree each
of whose nodes x has an integer rank, denoted
rank(x) that satisfy the following properties. - if x is a node with a parent, rank(x) ?
rank(p(x)) ? rank(x) 1 - if x is a node with a grandparent, rank(x)
ltrank(p(p(x))) - if x is an external node, rank(x) 0 if x
also has a parent,rank(p(x)) 1 - Also called red-black trees.
- sufficient to store 1 bit of balance information
10Depth of Balanced Binary Trees
- Lemma 4.1. A node of rank k in a balanced binary
tree has height at most 2k and at least 2k1 -1
descendants. Therefore, a balanced binary tree
with n internal nodes has depth at most 2
lg(n1). - Proof. The proof of the first part is by
induction on k. The basis (k0) is obvious since
by definition of the ranks, any node of rank 0,
must be an external node, hence its height is 0
and it has 1 descendant. Assuming the lemma is
true for nodes of rank k, let x be a node of rank
k1. By the definition of ranks and the induction
hypothesis, the grandchildren of x have height at
most 2k, so x can have height at most 2(k1).
Similarly, its two subtrees must contain at least
2k1-1 nodes, so x has a total of at least
2(2k1-1) 1 descendants. A full binary tree
with n internal nodes contains a total of 2n1
nodes. By the first part of the lemma, the rank
of the root is at most lg(n1) and the height of
the root is at most twice its rank. ? - By Lemma 4.1, the access time in a balanced
binary tree is O(log n).
11Rotation Operations
single rotation
rrotate(x)
lrotate(y)
double rotation
rrotate(y),lrotate(x)
12Insertion in a Balanced Binary Tree
insert
promote(m)
rrotate(n)
13Implementation of Insertion Operation
- procedure insert(item i, bst t)
- item x, gpx
- left(i),right(i) NULL x t
- do key(i) lt key(x) and left(x) ? null ? x
left(x) - key(i) gt key(x) and right(x) ? null ? x
right(x) - od
- if key(i) lt key(x) ? left(x) i
- key(i) gt key(x) ? right(x) i
- fi
- p(i) x x i
- do p(x) ? null and p(p(x)) ? null and rank(x)
rank(p(p(x))) ? - gpx p(p(x))
- if rank(left(gpx)) rank(right(gpx)) ?
- rank(gpx) rank(gpx) 1 x gpx
- rank(left(gpx)) ? rank(right(gpx)) ?
- if x left(left(gpx)) ? x rrotate(gpx)
- x right(right(gpx)) ? x lrotate(gpx)
- x left(right(gpx)) ? x
rrotate(p(x)) x lrotate(p(x)) - x right(left(gpx)) ? x
lrotate(p(x)) x rrotate(p(x))
14Self-Adjusting Binary Trees
- By Theorem 5.1, a sequence of m dynamic tree
operations requires O(m log n) path set
operations. If path sets are implemented with
balanced binary search trees, each operation
takes O(log n) giving O(m (log n)2) time for m
dynamic tree operations. This can be improved
with self-adjusting binary search trees. - By restructuring a binary search tree after each
operation we can get an O(log n) running time per
operation in an amortized sense, without the need
for an explicit balance condition. - The restructuring operation is the splay, which
moves one vertex x to the root of the tree by a
sequence of rotations this restructuring also
moves other vertices closer to the root. - a descendant z of x moves at least ?depth(x)/2?
steps closer to root - an ancestor z of x moves at least ?depth(z)/2? 2
steps closer to root - an unrelated vertex z of x moves at least
?depth(y)/2? 2 steps closer to root where y is
the nearest common ancestor of x and z (before
the splay)
15Illustration of Splay Steps
splaystep(x)
grandparent and x is left-leftgrandchild
grandparent andx is right-leftgrandchild
no grandparentand x is left child
rrotate(z)rrotate(y)
rrotate(y)lrotate(z)
16Implementation of Splay
- sorted set function splay(item x)
- if x null ? return null fi
- do p(x) ? null ? splaystep(x) od
- return x
- end
- procedure splaystep(item x)
- item y,z
- if p(x) null ? return fi
- y p(x)
- if p(y) null and x left(y) ? rrotate(y)
return - p(y) null and x right(y) ? lrotate(y)
return - fi
- z p(y)
- if x left(left(z)) ? rrotate(z) rrotate(y)
- x right(right(z)) ? lrotate(z)
lrotate(y) - x left(right(z)) ? rrotate(y) lrotate(z)
- x right(left(z)) ? lrotate(y) rrotate(z)
- fi
- end
last step of splay
each moves descendants of x up 1
17Implementing Self-Adjusting BSTs
- item function access(keytype k, bst t)
- if t null ? return null fi
- do k lt key(t) and left(t) ? null ? t
left(t) - k gt key(t) and right(t) ? null ? t
right(t) - od
- t splay(t)
- if k key(t) ? return t
- k ? key(t) ? return null
- fi
- end
- bst, bst function split(item i, bst t)
- bst t1,t2
- splay(i)
- t1,t2 left(i),right(i) p(t1), p(t2)
null - left(i), right(i) null
- return t1, t2
- end
time bounded by number of splay steps
ditto
18- procedure insert(item i, bst t)
- item x x t
- do key(i) lt key(x) and left(x) ? null ? x
left(x) - key(i) gt key(x) and right(x) ? null ? x
right(x) - od
- if key(i) lt key(x) ? left(x) i
- key(i) gt key(x) ? right(x) i
- fi
- p(i) x
- splay(i)
- end
-
time bounded by number of splay steps
19- procedure delete(item i, bst t)
- item j
- if left(i) ? null and right(i) ? null ?
- j left(i)
- do right(j) ? null ? j right(j) od
- swapplaces(i,j)
- fi
- if left(i) null ? left(i) ? right(i) fi
- p(left(i)) p(i)
- if i left(p(i)) ? left(p(i)) left(i)
- i right(p(i)) ? right(p(i)) left(i)
- fi
- splay(p(i))
- left(i),right(i),p(i) null
- end
time bounded by number of splay steps
20Analysis of Self-Adjusting BSTs
- Objective is to show that a sequence of m
operations, on a collection of trees with a total
of n vertices takes O(m log n) time. - We use a credit scheme to account for running
time. - all operations but join include a splay, so we
can account for their running time by bounding
the time for all the splays - we allocate up to C lg n credits for each splay
and each join (C to be determined) - time for splay is proportional to number of splay
steps, so we can account for running time of
splay by spending one credit for each splay
step - credits not needed to pay for performing an
operation are retained for use in later steps - To ensure there are enough credits on hand to pay
for later operations, we maintain the following
credit invariant. - for a vertex x, keep rank(x) credits where
rank(x) ?lg( of descendants of x)? - Note that balanced trees need fewer credits than
unbalanced trees, so splay operations release
credits that can be used to pay for splay
21- Lemma 4.2. Splaying a tree with root v at a node
u while maintaining credit invariant requires at
most 3(rank(v)-rank(u))1 new credits. - Proof. The credits are divided among the
different splay steps. A splay step at node x
with parent y and grandparent z is allocated
3(rank(z)-rank(x)) credits. A splay step at a
node x with a parent y but no grandparent is
3(rank(y)-rank(x))1. Let rank and rank? be the
rank functions before and after the step. - Case 1. x has no grandparent. This is the last
step, and the extra credit pays for it. The
number of additional credits needed to maintain
the invariant is - (rank?(x) - rank(x)) (rank?(y) - rank(y))
rank?(y) - rank(x) ? rank(y) - rank(x) - which is one third of the available credits.
22- Case 2. x left(left(z)) or x right(right(z)).
If rank(z) rank(x) k we get no new credits
for this step, but rank'(z) lt k, so maintaining
the invariant frees up at least one credit, which
pays for the step. If rank(z) gt rank(x), the
number of credits needed to maintain the
invariant is - (rank?(x) - rank(x)) (rank?(y) - rank(y))
(rank?(z) - rank(z)) rank?(y) rank?(z)
- rank(x) - rank(y) ? 2(rank(z) - rank(x)) lt
3(rank(z) - rank(x))releasing at least one extra
credit to pay for the step. - Case 3. x left(right(z)) or x right(left(z)).
If rank(z) rank(x) k we get no new credits
for this step, but either rank?(z) lt k or
rank'(y) lt k , so maintaining the invariant frees
up at least one credit, which pays for the step.
If rank(z) gt rank(x), the number of credits
needed to maintain the invariant is - (rank?(x) - rank(x)) (rank?(y) - rank(y))
(rank'(z) - rank(z)) rank?(y) rank?(z) -
rank(x) - rank(y) - ? 2(rank(z) - rank(x)) lt 3(rank(z) - rank(x))
- releasing at least one extra credit to pay for
the step. ?
23- By the lemma, each splay takes at most 3?lg n?
1 credits. The number of credits needed for an
insert is this number plus the number of new
credits needed to maintain the credit invariant,
after the new item is inserted but before the
splay is done. The only nodes whose ranks can
increase are those on the path from the root to
the inserted node that have exactly 2k-1
descendants before the operation (where k?0..
?lg n?). There can be at most ?lg n? 1 of
these, so the total number of credits required
for an insert is at most 4 ?lg n? 2. - The join operations requires at most ?lg n?
credits. - All other operations require no credits beyond
those used by the splay - Theorem 4.1. The total time required for a
sequence of m sorted set operations on n
vertices, using self-adjusting binary search
trees is O(m log n), where n is the number of
insert and join operations.