Tirgul 10 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Tirgul 10

Description:

Question: What is the maximal number of internal nodes of an RB tree with black height h? ... Claim: There cannot be any red nodes in the minimal RB tree. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 21
Provided by: IRIS
Category:
Tags: rb | tirgul

less

Transcript and Presenter's Notes

Title: Tirgul 10


1
Tirgul 10
  • Rehearsal about Universal Hashing
  • Solving two problems from theoretical exercises
  • T2 q. 1
  • T3 q. 2

2
Universal Hashing
  • Starting point for every hash function, there is
    a really bad input.
  • A possible solution choose the hash function
    randomly from a family of hash functions.
  • The logic behind it For any given input, we want
    that most of the hash functions in our family
    will handle it with few collisions.

h10,5()
Our family of hash function
h68,53()
h2,13()
h24,82()
Specific hash function
3
Demonstration
  • Let us conduct an experiment
  • A family of about 10,000 hash functions (the
    family you saw in class, details later on).
  • One fixed input (50 keys), inserted to a table of
    size 70. (Student grades of exercises - dast
    2003)
  • Question how many will behave really bad?
  • Next slide shows the results - the x-axis
    describes the number of collisions (in this case
    it was also equal to the number of pairs that
    collide), and the y-axis describes how many
    functions had such a number of collisions.

4
Results
5
Most functions perform close to average
performance
  • The average number of collisions is 8-9 all
    entries of the hash table always contained at
    most two elements, so the number of collisions is
    actually the number of entries with more than one
    element.
  • Very very few functions had more than twice this
    number of collisions (or less than half).
  • This is no accident!
  • We constructed the family of functions so that
    the average performance of all the functions over
    any input will be good.
  • Probability laws (e.g the Markov inequality you
    saw in class) tell us that very few elements of a
    universe will behave much worse (or much better)
    than the average behavior.

6
A good family of hash functions
  • Conclusion Designing a family with good average
    performance is enough.
  • We need to know two things
  • A criteria that guarantees good average
    performance.
  • How to construct a family that will have this
    criteria.

7
Ensuring good average performance
  • Definition A family of hash functions H is
    universal if for any two keys k1 and k2, and any
    two slots in the table y1 and y2, the probability
    that h(k1) y1 and h(k2) y2 is at most 1/m2 (m
    is the size of the hash table).
  • Remark This means that the chance that two keys
    will fall to the same slot is 1/m - just like if
    the hash function was truly random!
  • Claim When using a universal hash family H, the
    average time of any hash operation is at most n/m
    1 (n is the number of elements we insert to the
    table).

8
Is this better than a balanced tree?
  • If we have an estimation of n, the number of
    elements we will insert to the table, we will
    have constant time performance - no matter how
    many elements we have 106, 108 , 1010, or
    more...
  • In contrary, the performance of a balanced tree,
    O(log n), is affected by the number of elements
    we have! As we have more elements, we have slower
    operations. For very large numbers, like 1010,
    this makes a difference.

9
Constructing a universal family
  • Choose p - a prime larger than all keys.
  • For any a,b in Zp0,...,p-1 denote fix a hash
    function
  • ha,b(k) ((ak b) mod p) mod m
  • The universal family Hp,m ha,b() a,b
    in Zp
  • Theorem Hp,m is a universal family of hash
    functions.
  • In our demonstration, the set of keys was all
    possible grades. We chose p101, inserted 50
    (real) grades into a hash table of size 70 (doing
    this for all the hash functions in H101,70 and
    counting collisions).

10
A second approach - average over inputs
  • In Universal Hashing - no assumptions about the
    input (I.e. for any input, most hash functions
    will handle it well).
  • For example, we dont know a-priori how the
    grades are distributed. (surly they are not
    uniform over 0-100).
  • If we know that the input has a specific
    distribution, we might want to use this.
  • For example, if the input is uniformly
    distributed, then the simple division method will
    obtain simple uniform hashing.
  • In general, we dont know the inputs
    distribution, and so Universal Hashing is
    superior!

11
T2 q.1
  • Reminder - quicksort
  • quicksort(A1..n)
  • 1. choose a pivot p from A.
  • 2. re-arrange A s.t. all elements smaller than
    p will be located before it in A, and all
    larger elements will be after it.3. Suppose
    now p is in slot k.
  • 4. Recursively sort A1..k-1 and Ak1..n.
  • The connection to the previous discussion If we
    choose the pivot randomly, we actually have a
    family of algorithms, from which we choose one.
    The average performance is good, and so, for any
    input, most algorithms will perform well!

12
T2 q.1 - continued
  • Question How many calls to the random number
    generator will we have in the worst case, and in
    the best case?
  • Answer The number of these calls will always be
    !
  • Proof Let us draw the recursion tree
  • An internal node represents a call toquicksort
    with an array of size at least 2.
  • A leaf represents a call to quicksortwith an
    array of size 1.
  • Any internal node is also a father ofa leaf that
    represents the pivot it used.

13
The recursion tree
  • Any leaf represents a singleelement in the
    array.
  • Therefore the number ofleaves is exactly n.
  • the ordered array is actuallythe leaves, from
    left to right.
  • The random number generator is called once in
    every internal nodes.
  • Therefore we actually ask how many internal
    nodes are there? Let X be the number of internal
    nodes.

14
Proof (continued)
  • Observation 1 X is at most n, since any internal
    node points to at least one leaf.
  • Observation 2 X is at least n/3
  • Divide the set of leaves to subsetsaccording
    their father.
  • Each subset contains at most 3 leaves,and
    therefore there are at least n/3subsets.
    Therefore
  • X no. of subsets gt n/3
  • Conclusion

Q.E.D
15
T3 q.2
  • Reminder - Red-Black trees A binary search tree,
    with the following properties
  • 1. Every node has a color - either red or black.
  • 2. The root is black.
  • 3. Every leaf (empty child) is black.
  • 4. Both children of a red node are black.
  • 5. Every path from a node to a descendant leaf
    contains the same number of black nodes.
  • The black height of a tree is the number of black
    nodes in a path from the root to some leaf (not
    counting the root).

16
T3 q.2 - first part
  • Question What is the maximal number of internal
    nodes of an RB tree with black height h?
  • First intuition The path from the root to a leaf
    must contain exactly h black nodes. We want it to
    be long, so we can put a red node between each
    two black nodes. Thats the maximal we can do,
    since otherwise well violate property 4.
  • Important This is just intuition, not a proof! A
    proof must show, by one or more arguments,
    without gaps between them, that the claim must be
    true. For example, in the above intuition there
    are two gap how do we know there is no way to
    make it even longer? And can we actually
    construct such a tree?

17
A maximal tree
  • First part showing we can actually construct
    such a tree
  • Take a complete binary tree with 2h1 levels.
  • Color the root black, the second levelred, the
    third level black, and so on.
  • Number of internal nodes 2(2h) - 1
  • Notice that is a valid RB tree (with black height
    h)
  • Properties 1, 2 4 immediately hold.
  • 3 - The number of levels is odd, and we colored
    the first level black, then the last level
    (leaves) is black too.
  • 5 - All paths have alternating red black nodes,
    and have the same length.

black
red
black
. . .
18
This tree is indeed maximal
  • Claim Any RB tree with black height h has at
    most 2h levels (ignoring the leaves).
  • Proof What is the no. of nodes in some path from
    a root to a leaf
  • All paths contain same no. of black nodes, h in
    our case. Including the root, we have h1 black
    nodes.
  • There must be at most h red nodes by property 4,
    therefore the path has 2h1 nodes, or 2h if we
    ignore the leaf.
  • Remark A binary tree with 2h levels contains at
    most 2(2h)-1. This happens when the tree is
    complete.
  • Answer An RB tree with maximal height h can have
    at most 2(2h)-1 internal nodes.

19
T3 q.2 - second part
  • Question What is the minimal number of internal
    nodes of an RB tree with black height h.
  • Claim There cannot be any red nodes in the
    minimal RB tree.
  • Proof Suppose there is a red node, x. It must
    have two black sons. We can delete x and one of
    its sub-trees, T1, and connect xs father to the
    other sub-tree, T2. The only property we need to
    check is 5 - but for any path in T1 there is a
    path with the same number of nodes it T2. So
    property 5 holds. Therefore the original tree
    wasnt minimal.

20
T3 q.2 - second part (continued)
  • Claim An RB tree with no red nodes must be a
    complete binary tree.
  • Proof Consider only the internal nodes. If this
    tree is not complete, there are missing nodes at
    the last level. Then there is a node with two
    paths to a leaf, with different lengths. Since
    all nodes are black, this violates property 5.
  • Answer There is a single RB tree of black height
    h with minimal no. of internal nodes. It has 2h
    - 1 internal nodes.
Write a Comment
User Comments (0)
About PowerShow.com