Title: BST Data Structure
1BST Data Structure
- A BST node contains
- A key (used to search)
- The data associated with that key
- Pointers to children, parent
- Leaf nodes have NULL pointers for children
- A BST contains
- A pointer to the root of the tree.
2BST Operations Insert
- BST property must be maintained
- Algorithm sketch
- To insert data with key k
- Compare k to root.key
- If k lt root.key, go left
- If k gt root.key, go right
- Repeat until you reach a leaf. That's where the
new node should be inserted. - Note keep track of prospective parent along the
way.
3BST Operations Insert
- Running time
- The new node is inserted at a leaf position, so
this depends on the height of the tree. - Worst case
- Inserting keys 1,2,3,... in this order will
result in a tree that looks like a chain - Tree has degenerated to list
- Height linear
- Note also that such a tree is worsethan a linked
list since it takes upmore space (more pointers)
1
2
3
4BST Operations Insert
- Running time
- The new node is inserted at a leaf position, so
this depends on the height of the tree. - Best case
- The top levels of the tree are filled up
completely - The height is then logn where n is the numberof
nodes in the tree.
12
4
14
2
8
16
5BST Operations Insert
- The height of a complete (i.e. all levels filled
up) BST with n nodes is logarithmic. Why? - Level i has 2i nodes, for i0 (top level)
through h (height) - The total number of nodes, n, is thenn
2021...2h (2h1-1)/(2-1)
2h1-1Solving for h gives us h ? logn
6BST Operations Insert
- Analysis conclusion
- An insert operation consists of two parts
- Search for the position
- best case logarithmic
- worst case linear
- Physically insert the node
- constant
7BST Operations Insert
- What if we allow duplicate keys?
- Idea 1 Always insert in the right subtree
- Results in very unbalanced tree
- Idea 2 Insert in alternate subtrees
- Makes it difficult to search for all occurrences
- Idea 3 All elements with the same key
are inserted in a single node - Good idea!
- Easy to search, does not affect balance any more
than non-duplicate insertion.
8BST Operations Insert
- What if we allow variable number of children?
(n-ary tree) - Idea Use a vector/list of pointers to children.
9BST Operations Search
- Take advantage of the BST property.
- Algorithm sketch
- Compare target to root
- If equal, return success
- If target lt root, search left
- If target gt root, search right
- Running time
- Similar to insert
10BST Operations Delete
- The Delete operation consists of two parts
- Search for the node to be deleted
- best case constant (deleting the root)
- worst case linear
- Delete the node
- best case?
- worst case?
11BST Operations Delete
- CASE 1
- The node to be deleted is a leaf node.
- Easy!
- Physically remove the node.
- Constant time
- We are just resetting its parent's child pointer
and deallocating memory
12BST Operations Delete
- CASE 2
- The node to be deleted has exactly one child
- Easy!
- Physically remove the node.
- Constant time
- We are just resetting its parent's child pointer,
its child's parent pointer and deallocating
memory
13BST Operations Delete
- CASE 3
- The node to be deleted has two children
- Not so easy
- If we physically delete the node, we'll have to
place its two children somewhere. This seems to
require too much tree restructuring. - But we know it's easy to delete a node that has
at most one child. What if we find such a node
whose contents can be copied over without
violating the BST property and then physically
delete that node?
14BST Operations Delete
- CASE 3, continued
- The node to be deleted, x, has two children
- Idea
- Find the x's immediate successor, y. It is
guaranteed to have at most one child - Copy the y's contents over to x
- Physically delete y.
15BST Operations Delete
- Finding the immediate successor
- We know that the node has two children. Due to
the BST property, the immediate successor will be
in the right subtree. - In particular, the immediate successor will be
the smallest element in the right subtree. - The smallest element in a BST is always the
leftmost leaf.
16BST Operations Delete
- Finding the immediate successor
- Since it requires traveling down the tree from
the current node to a leaf, it may take up to
linear time in the worst case. - In the best case it will take logarithmic time.
- The time to perform the copy and delete the
successor is constant.
17Binary Search Trees
- Traversing a tree visiting its nodes
- Three major ways to traverse a binary tree
- preorder
- visit root
- visit left subtree
- visit right subtree
- postorder
- visit left subtree
- visit right subtree
- visit root
When applied on a BST, it visits the nodes in
order from smaller to larger
- inorder
- visit left subtree
- visit root
- visit right subtree
18Binary Search Trees
void print_inorder(Node subroot ) if (subroot
! NULL) print_inorder(subroot ?
left) cout ltlt subroot?data print_inorder(sub
root ?right)
How long does this take? There is exactly one
call to print_inorder() for each node of the
tree. There are n nodes, so the running time
of this operation is ?(n)
19Binary Search Trees
- A tree may also be traversed one "level" at a
time (top to bottom, left to right). This is
usually called a level-order traversal. - It requires the use of a temporary queue
enqueue root while (queue is not empty) get
the front element, f print f enqueue f's
children dequeue
20Binary Search Trees
12
4
14
2
8
16
6
10
in-order 2 - 4 - 6 - 8 - 10 - 12 -
14 pre-order 12 - 4 - 2 - 8 - 6 - 10 - 14 -
16 post-order 2 - 6 - 10 - 8 - 4 - 16 - 14 -
12 level-order 12 - 4 - 14 - 2 - 8 - 16 - 6 - 10
21Binary Search Trees
- Idea for sorting algorithm
- Given a sequence of integers, insert each one in
a BST - Perform an inorder traversal. The elements will
be accessed in sorted order. - Running time
- In the worst case, the tree will degenerate to a
list. Creation will take quadratic time and
traversal will be linear. Total O(n2) - On average, the tree will be mostly balanced.
Creation will take O(nlogn) and traversal will
again be linear. Total O(nlogn)
22BSTs vs. Lists
- Time
- In the worst case, all dictionary operations are
linear. - On average, BSTs are expected to do better.
- Space
- BSTs store an additional pointer per node.
- The BST seemed like a good idea, but in the end
it doesn't offer much improvement. - We must find a way to keep the tree balanced and
guarantee logarithmic height.
23Balanced Trees
- There are several ways to define balance
- Examples
- Force the subtrees of each node to have almost
equal heights - Place upper and lower bounds on the heights of
the subtrees of each node. - Force the subtrees of each node to have similar
sizes (number of nodes)