Chapter 6: Dynamic Sets

About This Presentation

Title:

Chapter 6: Dynamic Sets

Description:

The binary tree implementation adds one new field called black ... Red-Black Tree Implementation ... Therefore, we ignore whether a node is black or red ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 67

Provided by: foxr

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6: Dynamic Sets

1
Chapter 6 Dynamic Sets Searching

We will study several different topics in this
chapter, mostly though we concentrate on a
height-balanced tree
First, we briefly look at array doubling and its
consequences in terms of added complexity to such
structures as stacks and lists
Next, we examine three related forms of
height-balanced trees 2-3 trees, 2-3-4 trees
and red-black trees (Note 2-3 and 2-3-4 trees
are not covered in this textbook and the
treatment of red-black trees will be somewhat
different from the book also!)
We will next examine hashing as an improved form
of search over log n based trees
We will conclude by finding the best way to
implement the Union-Find data structure (recall
from chapter 2)

2
Array Doubling

When creating a data structure, arrays often have
the drawback that they are static in size from
the point they are created while linked lists are
dynamic
However, linked lists do not offer random access
and so the complexity of a single access can be
far worse
Can we get the best of both? Sort of.
In languages like C and Java, an array can be
created at run-time and the contents of another
array can be copied into the new array
Thus, an array can grow in size with a basic
strategy
For example, see the Java code to the right

Assume a is an array with n as its
size arrayDouble( ) int temp new
int2 n for(int j0j
tempj aj a temp
3
When Will We Use Array Doubling?

It should be obvious that array doubling is ? ?
(n)
Consider a stack implemented using an array
The Push and Pop operations are both ? (1)
However, if we have enough Push operations to
fill the entire array, we must then perform array
doubling
Therefore, the worst case complexity of Push is ?
? (n) leading us to believe that a linked
list-based implementation would be better (since
it will be ? ? (1))
But this is misleading
We instead turn to the accounting practice of
amortization
In order to spread the cost of a purchase over
many years, companies will often amortize costs
For instance, if a purchase is K and is expected
to last for 5 years, the actual cost is K for
year 1 and 0 for years 2-5, but for the sake of
consistency, we might expect to see an amortized
cost of K/5 each year
We use this approach to amortize the expense of
the Push operation

4
Amortized Cost Analysis

We find that most Push operations are ? (1) with
an occasional operation requiring ? (n)
Notice however that once the array has been
doubled, the size of the array is now 2 n
So the next array doubling actually costs 2 n
instead of n
A third array doubling will cost 4 n
A fourth will cost 8 n, etc
So, even though an individual Push operation
costs 1, occasionally one of them will cost n,
2n, 4n, etc
To determine the amortized cost, we will work
things somewhat differently
Assume that instead of a Push costing 1 unit, it
really costs 1 t units (t is some constant
derived to offset the amortized cost)
Thus, a sequence of n Push operations will really
cost n t n
The t n is an amount that we have saved up by
distributing the cost
Now, our array doubling can be deducted from what
we have saved up, resulting in a Push that costs
no more than 1 t and so all Push operations are
? ? (1) since t is a constant

5
Trees and Search

Search is a very common activity for programs and
so we want to come up with an efficient data
structure to accommodate search
Sorted arrays have two drawbacks
Sorting can be time consuming
Arrays as a dynamic structure are inefficient
because of array doubling in spite of amortizing
costs
Trees are another approach, but trees can be
difficult to search if the tree is not well
balanced
Building an efficient (balanced) tree is
possible, but complicated
Techniques include using some form of rotation
(such as AVL rotation)
However, another approach is to design a tree
that will always be balanced. How?

6
2-3 Trees

The first height-balanced tree we will examine is
the 2-3 tree
A 2-3 tree is always balanced, all leaf nodes are
at the same level
The 2-3 tree contains nodes that have 1 or 2 data
items and 2 or 3 pointers to children (thus their
name 2-3 for the number of pointers)
A tree with all 2 nodes (1 datum, 2 pointers) is
identical in appearance to a binary tree (except
that this tree must always remain balanced)
However, as data are added and deleted, nodes
change from 2 nodes to 3 nodes and back to 2
nodes
The trick is to figure out how to keep the tree
balanced in the face of adding and deleting values

7
2-3 Tree Example and Searching

Below we have a 2-3 tree example
Notice that nodes with 2 data have 3 pointers
The relationship between nodes is similar to a
binary tree
If a datum is less than the current datum, it is
down the left subtree
If a datum is greater than the current datum, it
is down the right subtree
Here, however, we have a 3rd possibility, if the
node is a 3 node, a datum might be greater than
the 1st datum but less than the 2nd, in which
case the datum is found down the middle subtree

Searching the tree follows a fairly basic
strategy
Compare datum to first
if equal, done
else If first datum, search first ptr
else if the node is a 2 node, then search second
ptr
else if second
else search third ptr

8
Adding a Datum

As with any tree, we add a new datum at a leaf
level
But, we must not extend the level further or else
the tree will no longer be height balanced
So, how do we add a datum?
If the leaf node is a 2 node, easy, just make it
a 3 node and add the datum there
For instance, if we want to add 9, we make the 2
node storing 10 into a 3 node and now it stores 9
and 10
What if the node is a 3 node?
If this is the case, then the node already has 2
data and we want to store a 3rd, so we split the
node into 3 values, 2 2-nodes, and move the
middle value up to the parent and adjust the
parent nodes pointers appropriately
What if the parent was also a 3 node? We split
it and pass the middle value up to its parent

9
Example

Consider adding 3
3 will go into the 3 node with 1 and 2, but since
it is already a 3 node, we split it up
We create 2 2-nodes, one with 1, one with 3, and
the value 2 goes up to its parent and creates a 3
node with 5 giving us the tree on the right
Now consider adding 68, what will happen?
The 3-node with 61 and 69 must split pushing the
middle value (68) up one level
This in turn causes the 3-node with 55 and 70 to
split, pushing the middle value (68 again) up
another level
Finally, the root must split

Resulting tree after 68 is added
Note new levels are formed by extending upward
by adding a new root node
10
Adding More Detail

The add algorithm begins like a binary tree add
algorithm find the leaf node where the new
value should be inserted
Once the position is found, if the current node
is a 2-node, merely add the datum, rearranging
the node if necessary (i.e., if the new datum is
Otherwise, perform a split
Take the current 3-node and make it into 2
2-nodes moving the middle of the 3 values up to
the parents node
Rearrange the parents node to accommodate the
new value and rearrange the nodes pointers
appropriately
There are numerous cases, we will examine this
pictorially next

11
Rearranging Pointers
The new value is between a and b so, it is moved
up with c. The node of a and b are split and the
parent becomes a 3-node with a third pointer now
pointing at the new node with b The new value
is between c and d so, it is moved up with b. The
node with c and d are split and the parent
becomes a 3-node with a third pointer now
pointing at the new node with c
If the parent were already a 3-node, then it
would be split and the middle value passed up
thus repeating this whole process at another level
12
Deleting a Node?

The main reason that we will go on to study
red-black trees is that deleting a node from a
2-3 tree is difficult
There are many possibilities
Recall that the tree must always be balanced
whereby all leaf nodes are at the same level
When deleting a node in an ordinary binary tree,
we do not delete the node itself, but instead
copy the first value greater than that node into
the node-to-be-deleted and then delete the node
of the first value greater since that will be
guaranteed to be a leaf, thus we only delete leaf
nodes
Can we do something similar for the 2-3 tree?
Yes, but we cannot delete the leaf node itself as
it might unbalance our tree
We will visit a few examples next, but we will
not try to solve this problem

13
2-3 Tree Deletion Examples
Deleting 70 can be accomplished by moving 69 into
its place but what if all 3 children of 55-70
were 2-nodes?
Deleting 16 removes a leaf node, we can repair it
by rotating the 1st value of its 3-node sibling
around but what if its sibling was a 2-node?
Deleting 36 can be done by moving 44 into its
place, and then deleting 44 by rotating 55 down
and 61 up but what if we delete 36 from the
above tree where 70 has already been deleted?
When necessary, deletions will require collapsing
the tree down 1 level, moving values into
lower-level 2-nodes, making them into 3-nodes
14
2-3 Tree Analysis

The 2-3 tree will have a height of log n or less
Why? Since it is a balanced tree, it can be no
worse than log n, it might be less if there are
any 3-nodes
In the best case, the tree will be log 3 n / 2
log 2 n / log 2 3 log 3 2 log 2 n c1 c2
? ? (log n)
A search requires examining 1 or 2 data per level
so the complexity of search ? ? (c log n) ? ?
(log n)
Adding requires first searching for the proper
position and then adding the node and possibly
performing a split on that node
A split is a constant number of instructions
where the specific number depends on the number
of data in the parent node
However, a split may cause a split further up the
tree, etc
So in the worst case, adding requires log n
searches followed by log n splits which is ? ?
(log n)
Deleting will be similar, but we wont analyze it
since we didnt look at the deletion algorithm

15
2-3-4 Trees

These trees are much like 2-3 trees, we only
extend the idea so that a node can store 1 datum,
2 data, or 3 data, and have 2, 3 or 4 pointers
Why should we extend 2-3 trees into 2-3-4 trees?
There is no great reason to make 2-3-4 trees
except for that 2-3-4 trees can be represented as
binary trees, which we will call red-black trees
But first, we will examine 2-3-4 trees
As with 2-3 trees, 2-3-4 trees must remain height
balanced
Searching is similar except that now we have the
possibility of searching down the first, second,
third or fourth pointer
Adding however will be handled somewhat
differently

16
Adding to a 2-3-4 Tree

When we added to a 2-3 tree, we always added at a
leaf and split the node if necessary (possibility
also causing parent nodes to be split)
Here, when searching for the leaf node to insert
the new value, we will split any 4-nodes that we
come across
The split will not require that we also split a
parent node because, at most, it will become a
4-node and we only split 4-nodes on the way down
the tree
So, starting at the root, if its a 4-node, split
it, otherwise search down the tree, continuing
this process for each node until we reach a leaf
Since the leaf will not be a 4-node (it would
have already been split), adding to the node is
simple

17
Splitting a Node

There are 6 possibilities
Node to split is a root node
Node to split is the 1st child of a 2 node
Node to split is the 2nd child of a 2 node
Node to split is the 1st child of a 3 node
Node to split is the 2nd child of a 3 node
Node to split is the 3rd child of a 3 node

Case 2 Split the 4 node, sending the middle
value up making it a 3-node, and adding a new
child consisting of the current nodes largest
value
Case 1 Split the node into 3, distribute the
values, one per node, and reattach the subtrees
(1-4) as shown
18
Splitting a Node continued
Case 3 move middle value up and
redistribute other two values into
2 2-nodes Case 4 Move middle value up
creating a new 4-node, but that node is not split
until it is visited in the next add Case 5
Same as 4 with different pointers and values
moved Case 6 is omitted for space but is a
mirror image of Case 4
19
Complexity of 2-3-4 Trees

The 2-3-4 tree is always balanced, so the height
ranges between log 2 n and log 4 n / 3 log 4 n
log 4 3
Search requires at most 3 comparisons per node
So search is at most 3 log n ? ? (log n)
Adding requires searching and splitting combined
A split takes a constant number of operations
So even if there are multiple splits, adding is ?
? (log n)
Deleting would similarly be in ? ? (log n)
Unfortunately, as with 2-3 trees, deleting is
complex and we will not cover it here
There is one thing that makes 2-3-4 trees more
appealing than 2-3 trees
the 2-3-4 tree itself can be represented using a
binary tree
this is known as a red-black tree

20
Binary Tree Implementation

The binary tree implementation adds one new field
called black
This field denotes whether the node is a true
child of its parent, or if it is in fact a
co-resident in a 3-node or 4-node (that is, is
the parent its real parent?)
If true, the node is a true child
the root node is always a black node, and a 2-3-4
child is always a black node
If false, the node is part of a 3-node or 4-node
2-nodes and 4-nodes are easy to represent in this
way, 3-nodes are a little bit more awkward

3 2-nodes
a 4-node
21
3-Node Implementations

The only concern with the 2-3-4 tree implemented
as a binary tree is what to do with a 3-node
The 3-node has two data, say a and b
Should b be the root of the binary subtree or
should a be the root? This decision affects
where the pointers are placed and the shape of
the tree
As long as we are consistent there will be no
problem
For instance, we will arbitrarily choose to use
the larger value as the root of the subtree, so
this matches the figure below on the right

Note For convenience, in future red-black
trees, red nodes will be denoted by dashed lines
pointing to them instead of as the boolean
false, black nodes will be denoted by solid lines
22
Red-Black Tree Implementation

Now that we have described how the 2-3-4 tree can
be implemented as a binary tree
We examine how to implement the search, add and
delete operations in the binary tree
implementation
Search Same as any binary tree
A 2-node is identical to any binary tree node
A 4-node differs from the binary tree
implementation only in that the 1st and 3rd data
are distributed to separate nodes in a subtree
But the relationship between the 1st, 2nd and 3rd
data in the subtree is identical to their
placement in a binary tree
A 3-node differs from the binary tree, but like
the 4-node, retains the proper placement of the 2
nodes
Therefore, we ignore whether a node is black or
red
We use the ordinary binary tree traversal to
search for a value

23
Red-Black Tree Add

The book offers a mechanism for adding nodes to
the red-black tree
We will look at two approaches, both of which are
easier
The first method mimics the process of the 2-3-4
tree
Search for the proper place to insert the new
value while splitting any 4-nodes on the way down
the tree
Add the node at a leaf level as with any binary
tree add
any node added will be to a 2-node, creating a
3-node, or to a 3-node, creating a 4-node
the added node, since it is now part of a 3-node
or 4-node, will be a red node
We examine all 6 split cases from the 2-3-4 tree
and see how they will be performed on a red-black
tree
Note the other method we will look at is
actually easier so we will concentrate on it in
more detail than these 6 splits

24
Splitting in the Red-Black Tree
Root node is simply split from a 4-node to 3
2-nodes a and c were red nodes, now they are
black nodes 1st child of a 2-node is split b
is joined with d, becoming a red node while a and
c are placed into their own 2-nodes, becoming
black nodes 2nd child of a 2-node is split c
is joined with a, becoming a red node while b and
d are placed into their own 2-nodes, becoming
black nodes
25
Splitting Red-Black Trees cont.
The various splits for a child of a 3-node,
either 1st, 2nd or 4th child, are shown
here Notice that in two of these cases we must
physically rotate nodes instead of just changing
red or black information
26
Method 2 for Adding

Adding boils down to one of three basic
situations as described below
Case 1 the added node is the root, it is a
black node, we are done
In any other case, the added node is made a red
node, at least at first
Case 2 the added node is a child of a black
node, the added node is then a red node and we
are done (we are adding to a 2-node or 3-node, no
need to make any changes)
Case 3 the added node is a child of a red node
Now we have to be cautious because we have a red
child of a red node which means an imbalanced
tree
There are two possible subcases here
The added node has a parent who has a sibling
that is black (or the parent has no sibling) we
will call this case 3a
The added node has a parent who has a sibling
that is also red we will call this case 3b

27
Handling Case 3a

This case represents a situation in which the new
nodes parent and its parent made up a 3-node
(since the parent is red)
Since the newly added value is also red, we want
to create a 4-node of these three nodes
But we must make sure the 3 nodes are in the
correct order
We have to rotate the values in order to make it
correct while also maintaining the proper
pointers
Examples are shown to the right
When done, the new root is black while the two
children of the root are red
Thus, we have maintained a 4-node

In all cases, the rebalancing moves the middle
value to the root and makes both children red
nodes
28
Handling Case 3b

In this case, not only is the newly added nodes
parent red, but so is that parents sibling
The parent, grandparent, and parent-sibling make
up a 4-node
With the new value added to this node, it
overflows the 4-node and so a split must occur
To split this node, one value goes up to join the
parent 2-3-4 tree node
The node moved up will be the parents parent
which will join with the parents parents parent
(creating a 3-node, 4-node, or possibly adding to
a 4-node) and thus the parents parent node is
changed to red
The parent and parents sibling are split up into
their own nodes and are colored black
The newly added node stays with the parent and is
thus red
notice that if the parents parent was part of
a 4-node, adding a new value there requires a
split to occur
so in this one case, we must now go up to the
parents parents parent node and check to see if
it is a red node and if so, handle case 3a or 3b
again (so 3b may occur at every level up the tree
if necessary)

29
Red-Black Tree Deletion

The deletion starts just like the binary tree
deletion
Find the node to delete
if it is a leaf node, delete it, otherwise
find the nodes successor (smallest value greater
than the node to be deleted)
copy this successor value into the node to be
deleted
delete the successor node, which by definition is
a leaf
ensure the tree is properly balanced by altering
recoloring nodes and/or rotating nodes as
necessary

Like the red-black tree addition, deletion has
several possibilities, we explore each of these
next
In each case, assume the node to be deleted is v
vs parent is x
v is a left child since if v was xs right child,
it would not be the node being deleted, OR v is
the only node in the subtree under x
v may have a right child, we will call r (if it
exists)
r will be moved into the place of v, so that r
becomes xs right child
x may have another child, we will call it y (if
it exists)

x v y
r Dotted lines here denote optional nodes
30
Deletion Case 1

If y is black and has a red child z
We must now rotate the nodes x, y, and z
Recall that x is the parent of the node to be
deleted whereas y and z are a child and a
grandchild of x
Rotate x, y and z so that the middle value of x,
y and z becomes the root and the other two nodes
are distributed appropriately
Also make sure that r, another child of x, is
attached appropriately
Assign the following colors the new root takes
on the color that x had formerly while the two
children are black and r is made or kept black

31
Deletion Case 2

If y is black and both children of y are black
NOTE if y has null pointers, they are
considered to be black
Here, we have 1, 2 or 3 2-nodes and what we want
is to combine them into a 3-node or 4-node
This is done by recoloring these nodes
Color r black, y red and if x is red, color it
black
That is, the parent becomes the root of a larger
node with y as a red node within that larger node
r is kept as a separate node
Note that in doing this, since x may have shifted
from red to black, we may have separated the
parent from its 2-3-4 tree node
If so we must now move up to the parent of x and
see if the change of colors to x has affected the
parent
If so, we have to check the Case 1, 2 and 3 again
If Case 2 applies again, we must again check to
see if one of the 3 cases applies to the parent
In the worst case, Case 2 continues to apply all
the way up the tree!

32
Deletion Case 3

If y is red
We must perform a rotation on x, y and z (similar
to case 1)
In this case, y is the middle value between x and
z
Make y the parent with x and z being children of
y
Also, r must be moved appropriately
Make y black, x red, and r remains black
Case 1 or Case 2 may now apply to y and its
parent, so we must move up to y and check again
If case 2 does apply, it will not propagate any
further up the tree (unlike case 2 applying by
itself) and so we can stop after fixing ys
parent (if it is necessary to do so)

33
Example Adding to a Red-Black Tree
Start with a tree Add 7, a red node Add 12,
case 3a, After rotation, that has a single
requires rotating both children value,
4 (black node) remain red
Add 15, case 3b, Since 7 is the root, Add 3,
since 4 is Requires recoloring changing 12s
color does now a black node not affect 7, so
case 3b stops there is no affect to 4
34
Example Continued
Adding 5 also does not affect 4 But adding 14 is
an The tree after example of case
3a rotating 12-14-15
Adding 18, case 3b, recolor Note that
recoloring 14 does Add 16, 12, 14 and 15
not affect 7 (since it is the root), case 3a
otherwise we might have had to will require
rotation continue recoloring up the
tree
35
Example Concluded
After rotating 15-16-18 Add 17, case 3b,
have to But now notice that 16 is black, 15 and
18 recolor 15, 16 and 18 14 and 16 are
both are red red, so we must fix
this by rotation
Rotate 7, 14, 16 so that 14 is the new root, 7
and 16 are its children. Notice that 12 had to
be shifted to be a child of 7. 7-14-16 are a
4-node, so 7 and 16 are red
36
Deletion Example
Starting from our previously tree, Now, lets
remove 12 while 12 is also a leaf, lets delete
3 since 3 is a leaf and it leaves the tree
unbalanced since 7 and 12 were there is no node
to move into its both black. This is Case 1 and
is handled by place, we are done after removing
3 by rotation of 4-5-7
Delete 17 just by removing it Deleting
18 causes an But we dont have (same as with
deleting 3) imbalance, handled by to
recolor 14 since case 2, recoloring
15 and 16 it is the root
37
Red-Black Tree Complexity

Unlike a 2-3 or 2-3-4 tree
The red-black tree is not necessarily
height-balanced
However, we can guarantee, because of the add and
delete algorithms, that the trees height will be
a constant factor within log n. Why?
So, search will be ? ? (c log n)
Add requires possibly performing a split (rotate
and/or recolor) operation, which is ? ? (1)
How many of these might occur?
At most, one per black level, and there are log
n black levels, so add is ? ? (log n)
Delete is the same, searching for the item to
delete, shifting the successor into that
position, deleting the node that contained the
successor, and possibly rotating/recoloring
Rotating/recoloring is ? ? (1)
there will be no more than log n of these, so
delete is ? ? (log n)

38
Decisions decision decisions

You probably have learned, from 364, how to
height balance a binary tree through some form of
rotation
The rotations are complex, but yield a height
balanced tree so that search, add and delete are
all ? ? (log n)
So, which tree should we use?
Binary with height balancing
2-3 trees
2-3-4 trees
Red-black trees
In all cases, adding is a challenge, but deleting
is even harder
The 2-3 and 2-3-4 trees may be wasteful of space
each node in a 2-3 tree has space for 2 data even
if only 1 is used and 3 pointers even if only 2
are used
each node in a 2-3-4 tree has space for 3 data
and 4 pointers
The red-black tree is more space efficient and
somewhat simpler to implement than a binary tree
with rotation

39
Another IdeaHashing

Hashing provides for storage that permits ? (1)
add, delete and search routines in many cases
(but in the worst case is ?(n) )
Is hashing then better, worse or about the same
as one of the balanced trees?
We will now explore hashing to see how we can get
?(1) behavior
Since you probably already covered hashing in
364, some of this discussion will be review and
not covered in detail

40
Hashing

The basic idea behind hashing is that you have an
array to store your values and a function which
maps the value onto a storage location
H(x) i means that x is stored in the array at
location i
There are many ways to perform this mapping, but
they tend to mostly revolve around using mod as
in
H(x) x mod max
where max is the size of the array
Since the mod operation is ?(1), searching for an
item, storing a new item into the array or
finding and deleting an item in the array should
be ?(1)
This is not the case however, because of
collisions

41
Collisions

Imagine a hash table (array) of size 11 and items
to store there of 39, 17, and 28 (added in that
order)
Unfortunately, all three hash into the same
location, 6
So, what happens if we are searching for 17? We
look at the position where it is supposed to be
but we find 39
We must find a way to handle these collisions
There are numerous approaches to handling
collisions, but they all cause the hashing
operations of searching, adding and deleting, to
degenerate from ?(1) to ?(n) in the worst case
Collision handling
Closed address (or chained) hashing
Linear probing
Quadratic probing
Rehashing

42
Chained Hashing

Rather than implementing an array of data,
implement an array of pointers
Now, each array entry is not a datum, but instead
a pointer to a linked list of data
If we have a collision, then the collided datum
can be added at that location by extending the
linked list
Search hash into the location and then follow
the linked list until the datum is found, or the
list ends
Add hash into the location and insert the datum
at the front or rear of the linked list, or do an
ordered insert
Delete hash into the location and search for
the datum, if found, remove it from the linked
list otherwise report failure

If we have a good distribution of items, our
linked lists should rarely have more than 1 item
making search, add and delete ?(1)
However, if we have a poor distribution, we could
wind up with all n data colliding giving us an
?(n) list and thus all three operations will be
?(n)
What about on average?

43
Average Case for Chained Hashing

The load factor n / h where h is the size of
the array
Assume that a given item is in a linked list of
size Li
Then, to search for that item, on average, it
takes (Li 1) / 2
The average search then is
1 1 / n ?i0h-1 (Li 1) / 2
1 1 / n (h (h 1)) / 4)
or roughly h2 / n
As the hashing table is filled, h approaches n
yielding a search of n2 / n
or a search of n
whereas if the table is only half filled, then h
½ n yielding a search of roughly ¼ n
So, in part, our performance depends on the size
of the hash table (with respect to the expected
number of entries)

44
Open Address Hashing

A different approach on a collision is to perform
rehashing
Rehashing may either use the same function
applied to the current location, or some other
function
For instance, consider the function h(j) (j 1)
mod h
Then, rehashing will compute (h(j) 1) mod h
This will be the next array location
If there is another collision, just apply the
same function
This is known as linear probing
In effect, we are trying the next array location
iteratively until we have found the item we are
looking for (for a search or delete) or we have
found an opening (for an add or a failed search)

45
Implementing Linear Probing

Consider a table of size 11
We want to insert 13, 26, 3, 24
13 goes into position 2
26 goes into position 4
3 goes into position 3
24 should go into 2 but there is a collision, so
ultimately 24 goes into position 5
Now, we want to find 24
It should be in 2, but is not, do we stop?
No, we apply linear probing
For how long?
If we apply linear probing and find 3 in position
3, we might be tempted to stop
3 is in its correct position and so we did not
find 24
This would be a mistake
we need to continue to search for 24 until we
find it or we have reached a gap in the table

46
Linear Probing Algorithm

To add and to search, applying linear probing is
as you would expect
Use the hashing function as normal but if there
is a collision, just increment the position until
you find the first empty location or until you
find the item you are searching for
If you have reached an empty location and have
not found what you are looking for, the item is
not in the table

To add location h(key) while(tablelocation
is not empty) location (location 1) mod
size tablelocation key To
search location h(key) while(tablelocation
is not empty tablelocation!key)
location (location 1) mod size if(tablelocat
ion key) return key else return 1 //
item not found
47
Deleting a Value

Now consider deleting a value from the table
Searching for the value is the same as before
But what happens once you find and delete the
item? If other items collided with this one,
they would appear later in the table
And yet our search routine searches until we find
the item or find a gap
So, deleting opens a gap that may cause us to not
find an item
From our previous example, if we delete 3 and
then search for 24
we would reach a gap before finding 24 and stop,
even though 24 is in the table
Solution deletion should remove the item, but
place a special note in the location that an item
had been here
Consider for instance using the value 9999
Now, our search procedure is modified to continue
searching until the item is found or a true gap
is found
Add however can place an item either in a true
gap or in a position storing -9999

48
Linear Probing has Another Problem

While linear probing tends to be more memory
efficient than chained hashing and can
potentially have a better average case complexity
It has a problem known as clusters
A cluster occurs whenever there are collisions so
that linear probing places the collided item in
the next available location, creating a cluster
of at least 2 nodes
The problem is that another collision extends the
cluster
Even if a value doesnt collide with the original
item, it may collide with part of the cluster,
which continues to extend the cluster
There are two approaches to getting around this
clustering problem Quadratic probing and
Rehashing

49
Quadratic Probing

Rather than using the rehashing formula
(h(j) 1) mod h, we use instead
(h(j) i2) mod h where i is the iteration
attempt for rehashing
For instance, imagine trying to add 29 in a table
of size 11, at first, we try to hash into 7
Using linear probing, if there is a collision, we
would then try 8, 9, 10, 0, 1, 2,
Using quadratic problem, if there is a collision,
we would then try 8, 0, 5, 2, 10,
Clusters are not generated as quickly or easily
as with linear probing

50
Rehashing

Rehashing is a more general term of what we
should do when there is a collision
We could apply probing as we saw, whether linear
or quadratic
Or, we could have a different hashing formula
that is applied in the case of a collision
Let H(key) be the hashing function
If H(key) causes a collision, use H2(key)
For this to work properly, H and H2 should always
map to different sections of the hashing table so
that applying the second function is not a waste
of effort
What if the second function also causes a
collision? We could use a third function, etc
However, keeping track of which function to use
may be too much effort, and so the more common
forms of collision handling are either chained,
linear or quadratic probing

51
Hashing Functions

It is common to use mod as the basis for our
hashing functions
What size table should we use?
If we anticipate n keys, our table should have a
size m n to avoid collisions
However, this may be wasteful of space
We might also select m as a prime number which
tends to reduce collisions
Consider using 100 as a table size instead of
101, we would find more collisions with 100
because of the common factors of 2, 4, 5 and 10
in many numbers

What if our keys are not numeric?
We might translate strings into their equivalent
ascii values
And then what? Take the average ascii value?
This would cause collisions, why?
We could weigh the ascii values based on their
position, but then computing the hashing function
takes more time
Also, what if all keys have a common element,
such as CSC 260 and CSC 364 both starting with
CSC? We could remove the redundant elements if
all keys share those values (CSC in this case)

52
Union-Find ADT

Recall back in Chapter 2, we briefly introduced
the union-find ADT
It stores a collection of sets and has two
operations
Union takes two sets and makes them into a single
set
Find determines if two items are in the same set
or not
This ADT has few applications, but there are two
that we will examine
one in this chapter briefly (implementing
EQUIVALENCE in FORTRAN) and one in chapter 8
which will allow us to implement an efficient
graph application
To start with, lets look at an example
We start with 5 sets 1, 2, 3, 4, 5
Are 2 and 4 in the same set? No
Are 3 and 5 in the same set? No
Union 3 and 5 yields 1, 2, 3, 5, 4
Union 2 and 5 yields 1, 2, 3, 5, 4
Union 1 and 4 yields 1, 4, 2, 3, 5
Are 2 and 3 in the same set? Yes
Are 2 and 4 in the same set? No

53
Set Implementations

You may or may not have covered set
implementations in a previous course
Briefly, there are two implementations, as with
most ADTs
Array based si is element i in the universe
of which s is a subset and si is true if i is
in set s, false otherwise
This implementation has the advantage of ?(n)
intersection and union operations and ?(1) to
insert or delete elements and to perform
is-an-element-of operations
But this implementation requires that the
universe be finite in size and known in advance
and can be wasteful of space if the universe has
a lot of elements but sets have few
Linked-list based s points to a linked list of
elements that are in s
This implementation has a ?(n) worst case for
is-an-element-of and delete (and also insert if
items are stored in order) and as bad as ?(n2)
for intersection and union
The advantage of the linked-list based
implementation is that the universe can be
infinite in size (although a set cannot)
Additionally, it is much more memory efficient

54
Union-Find Implementation 1

So, how will we implement the Union-Find?
Recall that we do not need such operations as
intersection, add, delete
Union is not quite the same as an ordinary set
union because it removes one of the two sets from
existence by moving the elements into the other
set
Find is not quite the same as is-an-element-of
because find determines if two items are in the
same set
Lets try an array-based implementation
Each element of a union-find stores a number of
which set it is currently in
Initially, these values are initialized to key
For instance, our previous example starts as 1,
2, 3, 4, 5
element 1 is in array location 1, etc
And ends with 1, 2, 2, 1, 2
elements 1 and 4 are in the same set, and
elements 2, 3, 5 are in the same sets

55
Analysis

The find operation is ?(1) as it merely
determines if seti setj, that is, is the
set that i is currently in equal to the set that
j is currently in?
However, union is a more difficult operation
Consider union(1, 2) from our previous example
This requires finding all elements in set 2 and
changing their set to be 1
If there is 1 item in set 1 and n 1 items in
set 2, this requires ?(n) operations
Can we improve? Yes, but we need a different
implementation mechanism

56
Union-Find Implementation 2

Lets try a linked list implementation
Assume that we have an array of n pointers
Initially, each pointer points to one of the n
single-item sets
Thus, we have n linked-lists of size 1
To perform a union, simply manipulate a pointer
as follows
Union(1, 2) will take the last pointer of 1 and
point it at 2, changing the pointer in array
location 2 to nil
Notice to make this work efficiently, the array
of pointers should actually have both front and
rear pointers to the linked list
Now, union is ?(1)
What about find?
Unfortunately, to find if two elements are in the
same set, it requires searching through the first
items linked list to see if the second item is
in it, or, if both items are stored in linked
lists of other entries, it might take as many as
searching all linked lists
So, find is ?(n)

57
A Better Implementation

Recall in-trees from chapter 2
The in-tree was a tree where each node pointed to
its parent
Parents did not point to children
Lets use the in-tree to implement our sets
also use an array where each array element points
to a node in an in-tree
node i is found by following arrayi to i in one
of the in-trees
note that we manipulate pointers in the in-tree
but we will not change the array at all

Now, how do we use the in-tree?
To implement Union
combine the two nodes in-trees
combining is done by using the setParent
operation from in-tree
Thus, union(i, j) follows the arrays pointers to
find the in-tree with i and the in-tree with j
now, find is root node by moving up is tree
until we reach is root
next, add a pointer from is root node to j
now, the set with i and the set with j are the
same set
This operation can be as bad as ?(n) if either
set has n links to follow

58
Implementing Find

Given two values, i and j, are they in the same
set?
We must find is root node and js root node and
see if they are the same
We follow is parent pointer iteratively (or
recursively) until the pointers node causes
isRoot to be tree
note a root node has a parent whose pointer is
1, this allows us to determine when to stop our
upward searching
We do the same for js parent
We compare the two pointers to see if they are
the same
Like Union, this operation can also be as bad as
?(n) if unions are performed in such a way that
the in-tree has depth n 1

59
Example
n

Consider a program that performs n union
operations followed by m find operations
Notice the pattern of the unions
2 is attached to 1, 3 is then attached to 2, so
that we have an in-tree of depth 2
Continuing this trend, 4 would be attached to 3
so that the in-tree now has a depth of 3, etc
The in-trees depth is n-1 by the time we are
done
The tree is shown to the right
Each find requires traversing n-1 links, and we
do it m times
Complexity of this code
n-1 unions take a total of n(n-1)/2 operations
m finds, each takes n 1 operations
Complexity n(n-1)/2 m (n 1) ? ? (n2
mn) or a worst case complexity ? ? (n2) where n
is the number of elements in all of the sets
This shows that n unions and m finds average out
to ?(n) each in the worst case

Union(1, 2)
Union(2, 3)
Union(n-1, n)
Find(1, n)
Find(1, n)
Find(1, n)

3
2
1
60
An Improvement

We improve both union and find operations
We will be more clever about attaching subtrees
during unions
Add to each root node a weight
Which is the number of nodes stored in that
in-tree
initially, this value is 0
after every union, we add the two root nodes
weights and store the new value in the root node
We enhance union so that, when we do union(i, j)
we do not attach i to j, but instead we attach
the smaller subtree (the one with fewer nodes) to
the root node of the larger subree
this will keep the height of the subtree to a
more reasonable amount
How much of an improvement is this?
It turns out that we can keep the height of an
in-tree to ?(log n) or less, thus making both the
worst case of union and find ?(log n) instead of
potentially ?(n) operations

61
Example

Consider the code to the left
It generates trees as shown below and to the
right
The final tree, consisting of 10 nodes, has a
height of 3

The tree with 1, 2, 6, 7, 8 after the 6th
operation is
Union(1, 2) Union(3, 4) Union(4, 5) Union(6,
7) Union(1, 6) Union(1, 8) Union(9,
10) Union(5, 10) Union(1, 5)
2
7
8
1
6
The tree with 3, 4 and 5 after the 3rd operation
is
2
4
4
7
8
1
10
3
5
3
5
6
9
The final tree, height of 3
62
Proof

We will show that the new union operation will
maintain an in-tree of height no more than log n
by using induction
The base case is of an in-tree with 1 node
By definition, such a tree has a height 0
log 1 0
Assume that a tree of k nodes has height
When we want to union two in-trees of size k or
less, we will attach the smaller tree to the
larger tree
Assume the larger tree has k1 nodes and the
smaller tree has k2 nodes
The height of the larger tree
The height of the smaller tree
The height of our new tree is the maximum(log
k11, log k2) and the new tree has k k1 k2
nodes where k2
Since the height of the both trees individually

can at most be log k 1 where k k1 k2
So, with k 2 k2, the height of tree with k
nodes

63
Complexity of Union-Find

So, with our better union operation (known as
weighted union)
We have an in-tree whose height will never be
worse than log n for n nodes
And thus, union, which requires finding the root
node of the smaller tree and then manipulating
that root nodes pointer
The find operation requires traversing from any
node up to the root node
If the in-tree is never greater than log n in
height, this requires at most log n 1 link
traversals
So, find is log n 1 in the worst case and union
is log n 1 1 or log n in the worst case
Can we improve on this? Sort of.

64
Path Compression

Another improvement is to the find operation
Assume that find has been implemented recursively
Then, in searching up the in-tree to the parent
node, each child node in the path is stored on
the run-time stack
After reaching the root, when we recursively
return, we can change each node on the stack so
that its parent is now the root node
This can have the affect of changing a tree of
height log n into a tree of height 1 if our find
starts at a leaf node
This is not necessarily the case, we may have
started somewhere in the middle of the tree or we
may have union and find operations interspersed
so that we do get to compress all paths in the
tree
While Path Compression and Weighted Union do not
reduce the worst case complexity of union and
find beyond ?(log n), they do improve the average
case complexity to a very small log n value (that
is, it is log k n where k is larger than 2)
The book denotes this as log n