Title: Monkey Business: In The Trees
1Monkey BusinessIn The Trees
- Helpful Reading CLR Ch. 13, 14, 19
- (MUCH more detail than this lecture will contain)
2Why Trees? Were Not Monkeys
- If a tree of n elements is balanced and complete,
then we can expect its height to be O(log n) --
this often leads to efficient operations on trees - Search trees are good priority queues and
dictionaries! - Weve already seen binary heaps, which are
complete trees with minimal ordering discipline - Not good for searches or deletes -- O(n)
- We will study searchable trees
- Binary search trees
- Red-black trees (balanced binary search trees)
- B-trees (generalized balanced trees)
3Foundation Binary Search Trees
- Data in the tree is attached to a key for
ordering purposes (keys must be comparable) - Binary search tree property For any node, the
key in the left child (if any) is less than or
equal to that nodes key, and the key in the
right child (if any) is greater than or equal to
that nodes key
12
12
7
13
7
15
3
9
14
17
3
9
14
17
LEGAL
ILLEGAL
4BST SEARCH(key) gt data
- To find a particular key, start at root
- If the root has the key, return its attached data
- Otherwise, recursively search the subtree in
which the key should be found - if the key youre looking for is less than the
roots key, search the left subtree - if the key youre looking for is bigger than the
roots key, search the right subtree - Expect O(log n), but really is O(n) -- WHY?
- Lets look at inserts
5BST INSERT(key, data)
- Insertion always occurs at a spot with an empty
pointer (no tree rearrangement) - Starting at the root, search for the leaf to
which to append the new key - If the new key is less than the current nodes
key, move left to access the lesser subtree
otherwise move right - Repeat the process until the subtree you wish to
move to is null, then attach the new key
6The Problem with BSTs
- Insertion always attaches a new leaf
- Height of a BST may be O(n) because there is no
requirement that the tree be balanced - Consider inserting the following keys in this
order 1, 2, 3, 4, 5, 6, 7, 8, 9 - In this and similar worst cases, BST degenerates
into a linked list - Thus, insertion is O(n)
7BST DELETE(key)
- To delete a node N from a BST
- if N has no children, just pluck it from tree
- if N has one child, splice out N
- if N has two children, find the successor node to
N (i.e. the node with the next largest key after
Ns key), swap the contents of N with the
successor nodes contents, and then delete the
successor node from the right subtree - successor node is leftmost node in right subtree
- Deletion is O(n)
8BST PQ operations
- MAXIMUM returns the rightmost node in the tree --
O(n) - MINIMUM returns the leftmost node in the tree --
O(n) - EXTRACT-MIN / EXTRACT-MAX are simple deletions of
nodes with at most one child -- O(n) - ALL operations would be O(log n) if we could
guarantee a balanced, complete tree
9Red-Black Trees
- Binary search trees with four important
additional red-black properties - 1. Every node is either red or black
- 2. Every leaf is black
- For purposes of RB trees, assume any pointer to
NULL actually points to an empty black node - 3. If a node is red, it has two black children
- 4. Every simple (non-retracing) path from a node
to a descendant leaf contains the same number of
black nodes
10Why is an RB-tree balanced?
- Let bh(x) represent the black height of node x
(the number of black nodes any simple path from x
to the bottom of the tree encounters, not
including x itself) - Because null pointers are replaced by empty black
nodes, any node with a NIL child limits its other
child to be a subtree of height at most two (one
node with two black empty children), or else
Property 2 is violated - The end result is that the tree is as bushy as
possible, and thus any subtree rooted at x of an
RB tree contains at least (2bh(x) - 1) non-empty,
key-bearing nodes (proof in CLR) - By property 3, the black height of the root of an
RB tree is at least h/2, and so for a tree with n
keyed nodes, it follows that n gt 2h/2 - 1, which
states h O(lg n)
11RB Trees are Complex
- See CLR chapter 14 for complete pseudocode and
explanations - I will briefly explain RB insert to show whats
involved - The minutiae of managing RB trees is not material
for this class (but the use of an RB tree and why
it is balanced are !) - Only difference in managing RB tree from BST is
that inserts and deletes have to preserve the
red-black properties of the tree - Example insert in CLR p. 269
- Example delete in CLR p. 276
12RB INSERT(key, data)
- Begins with insertion of new key done as if the
RB tree were a normal BST - In insert, the inserted node (call it x) is
colored red (and black empty children added) - This can ONLY cause a violation of property 3 (a
red node might have been attached to a red
parent) - Swap colors of parent of x and the grandparent of
x (the grandparent is black by definition) - This could violate property 3 further up the tree
- Move the violation up to where the immediate
ancestor of the two red nodes has a black child
as its other child, then perform rotations to
remove the violation (or until you recolor root) - After any insert, color the root of the tree black
13RB Rotations
y
x
RightRotate(T,y)
x
y
C
A
LeftRotate(T,x)
C
A
B
B
x and y are (red) nodes A, B, and C are
subtrees These O(1) rotations do not violate any
RB properties.
14RB Insert Solving Violations
z
z
D
x
y
D
y
x
z
C
y
A
x
A
B
C
D
B
C
A
B
A, B, C, and D are subtrees with black roots.
15B-Trees
- Generalized trees (see CLR Ch. 19)
- B-tree nodes may have many thousands of keys,
compared to RB trees, which have one - If a tree has N keys in it, we have to make a big
(N-1)-way decision on which node to visit next - A B-tree is a rooted tree with root R and
branching factor t such that - Every node other than the root must have at least
t-1 keys (and thus at least t children, if the
node is not a leaf) and at most 2t-1 keys (thus
at most 2t children) - If tree not empty, root must have at least one
key - Every leaf must be at same depth
16Why B-Trees?
- We fix t as constant for any particular run
- Height is O(logt n)
- Branching decision at each node is O(t), thus it
is O(1) - Usually use B-trees to maximize disk efficiency
(disk seek mechanical, and thus it takes a long
time) - Can store one node per disk page, and thus it
takes at worst O(logt n) disk accesses
17What, no implementation details?
- B-trees are also very complex
- Insert a key into some sorted list within a node
- If a node gets too full (it would have 2t
children, which is disallowed), on next insert it
is split into two nodes and the median key is
inserted into the parent node with pointers to
the new node - Deleting a key from a node may require parents or
siblings to contribute keys to it if the node has
t children - Insertion, deletion take O(t logt n) (each level
visited O(1) times, O(t) work done)
18B-Trees For More Examples
- See CLR p. 393 for a B-tree insert example
- See CLR pp. 396-397 for a B-tree delete example
- We will NOT cover examples of B-tree management
on homework, exams, or quizzes -- however, you
will be expected to judge when a B-tree or
RB-tree is an appropriate data structure to use
based on the time complexities of their operations