Tirgul 10

About This Presentation

Transcript and Presenter's Notes

Title: Tirgul 10

1
Tirgul 10

Rehearsal about Universal Hashing
Solving two problems from theoretical exercises
T2 q. 1
T3 q. 2

2
Universal Hashing

Starting point for every hash function, there is
a really bad input.
A possible solution choose the hash function
randomly from a family of hash functions.
The logic behind it For any given input, we want
that most of the hash functions in our family
will handle it with few collisions.

h10,5()
Our family of hash function
h68,53()
h2,13()
h24,82()
Specific hash function
3
Demonstration

Let us conduct an experiment
A family of about 10,000 hash functions (the
family you saw in class, details later on).
One fixed input (50 keys), inserted to a table of
size 70. (Student grades of exercises - dast
2003)
Question how many will behave really bad?
Next slide shows the results - the x-axis
describes the number of collisions (in this case
it was also equal to the number of pairs that
collide), and the y-axis describes how many
functions had such a number of collisions.

4
Results
5
Most functions perform close to average
performance

The average number of collisions is 8-9 all
entries of the hash table always contained at
most two elements, so the number of collisions is
actually the number of entries with more than one
element.
Very very few functions had more than twice this
number of collisions (or less than half).
This is no accident!
We constructed the family of functions so that
the average performance of all the functions over
any input will be good.
Probability laws (e.g the Markov inequality you
saw in class) tell us that very few elements of a
universe will behave much worse (or much better)
than the average behavior.

6
A good family of hash functions

Conclusion Designing a family with good average
performance is enough.
We need to know two things
A criteria that guarantees good average
performance.
How to construct a family that will have this
criteria.

7
Ensuring good average performance

Definition A family of hash functions H is
universal if for any two keys k1 and k2, and any
two slots in the table y1 and y2, the probability
that h(k1) y1 and h(k2) y2 is at most 1/m2 (m
is the size of the hash table).
Remark This means that the chance that two keys
will fall to the same slot is 1/m - just like if
the hash function was truly random!
Claim When using a universal hash family H, the
average time of any hash operation is at most n/m
1 (n is the number of elements we insert to the
table).

8
Is this better than a balanced tree?

If we have an estimation of n, the number of
elements we will insert to the table, we will
have constant time performance - no matter how
many elements we have 106, 108 , 1010, or
more...
In contrary, the performance of a balanced tree,
O(log n), is affected by the number of elements
we have! As we have more elements, we have slower
operations. For very large numbers, like 1010,
this makes a difference.

9
Constructing a universal family

Choose p - a prime larger than all keys.
For any a,b in Zp0,...,p-1 denote fix a hash
function
ha,b(k) ((ak b) mod p) mod m
The universal family Hp,m ha,b() a,b
in Zp
Theorem Hp,m is a universal family of hash
functions.
In our demonstration, the set of keys was all
possible grades. We chose p101, inserted 50
(real) grades into a hash table of size 70 (doing
this for all the hash functions in H101,70 and
counting collisions).

10
A second approach - average over inputs

In Universal Hashing - no assumptions about the
input (I.e. for any input, most hash functions
will handle it well).
For example, we dont know a-priori how the
grades are distributed. (surly they are not
uniform over 0-100).
If we know that the input has a specific
distribution, we might want to use this.
For example, if the input is uniformly
distributed, then the simple division method will
obtain simple uniform hashing.
In general, we dont know the inputs
distribution, and so Universal Hashing is
superior!

11
T2 q.1

Reminder - quicksort
quicksort(A1..n)
1. choose a pivot p from A.
2. re-arrange A s.t. all elements smaller than
p will be located before it in A, and all
larger elements will be after it.3. Suppose
now p is in slot k.
4. Recursively sort A1..k-1 and Ak1..n.
The connection to the previous discussion If we
choose the pivot randomly, we actually have a
family of algorithms, from which we choose one.
The average performance is good, and so, for any
input, most algorithms will perform well!

12
T2 q.1 - continued

Question How many calls to the random number
generator will we have in the worst case, and in
the best case?
Answer The number of these calls will always be
!
Proof Let us draw the recursion tree
An internal node represents a call toquicksort
with an array of size at least 2.
A leaf represents a call to quicksortwith an
array of size 1.
Any internal node is also a father ofa leaf that
represents the pivot it used.

13
The recursion tree

Any leaf represents a singleelement in the
array.
Therefore the number ofleaves is exactly n.
the ordered array is actuallythe leaves, from
left to right.
The random number generator is called once in
every internal nodes.
Therefore we actually ask how many internal
nodes are there? Let X be the number of internal
nodes.

14
Proof (continued)

Observation 1 X is at most n, since any internal
node points to at least one leaf.
Observation 2 X is at least n/3
Divide the set of leaves to subsetsaccording
their father.
Each subset contains at most 3 leaves,and
therefore there are at least n/3subsets.
Therefore
X no. of subsets gt n/3
Conclusion

Q.E.D
15
T3 q.2

Reminder - Red-Black trees A binary search tree,
with the following properties
1. Every node has a color - either red or black.
2. The root is black.
3. Every leaf (empty child) is black.
4. Both children of a red node are black.
5. Every path from a node to a descendant leaf
contains the same number of black nodes.
The black height of a tree is the number of black
nodes in a path from the root to some leaf (not
counting the root).

16
T3 q.2 - first part

Question What is the maximal number of internal
nodes of an RB tree with black height h?
First intuition The path from the root to a leaf
must contain exactly h black nodes. We want it to
be long, so we can put a red node between each
two black nodes. Thats the maximal we can do,
since otherwise well violate property 4.
Important This is just intuition, not a proof! A
proof must show, by one or more arguments,
without gaps between them, that the claim must be
true. For example, in the above intuition there
are two gap how do we know there is no way to
make it even longer? And can we actually
construct such a tree?

17
A maximal tree

First part showing we can actually construct
such a tree
Take a complete binary tree with 2h1 levels.
Color the root black, the second levelred, the
third level black, and so on.
Number of internal nodes 2(2h) - 1
Notice that is a valid RB tree (with black height
h)
Properties 1, 2 4 immediately hold.
3 - The number of levels is odd, and we colored
the first level black, then the last level
(leaves) is black too.
5 - All paths have alternating red black nodes,
and have the same length.

black
red
black
. . .
18
This tree is indeed maximal

Claim Any RB tree with black height h has at
most 2h levels (ignoring the leaves).
Proof What is the no. of nodes in some path from
a root to a leaf
All paths contain same no. of black nodes, h in
our case. Including the root, we have h1 black
nodes.
There must be at most h red nodes by property 4,
therefore the path has 2h1 nodes, or 2h if we
ignore the leaf.
Remark A binary tree with 2h levels contains at
most 2(2h)-1. This happens when the tree is
complete.
Answer An RB tree with maximal height h can have
at most 2(2h)-1 internal nodes.

19
T3 q.2 - second part

Question What is the minimal number of internal
nodes of an RB tree with black height h.
Claim There cannot be any red nodes in the
minimal RB tree.
Proof Suppose there is a red node, x. It must
have two black sons. We can delete x and one of
its sub-trees, T1, and connect xs father to the
other sub-tree, T2. The only property we need to
check is 5 - but for any path in T1 there is a
path with the same number of nodes it T2. So
property 5 holds. Therefore the original tree
wasnt minimal.

20
T3 q.2 - second part (continued)

Claim An RB tree with no red nodes must be a
complete binary tree.
Proof Consider only the internal nodes. If this
tree is not complete, there are missing nodes at
the last level. Then there is a node with two
paths to a leaf, with different lengths. Since
all nodes are black, this violates property 5.
Answer There is a single RB tree of black height
h with minimal no. of internal nodes. It has 2h
- 1 internal nodes.

Write a Comment

User Comments (0)

About PowerShow.com

Tirgul 10 PowerPoint PPT Presentation