CS1102 Tut 9 Hashing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CS1102 Tut 9 Hashing

Description:

The elements are email addresses. ... will all be hashed to the same value e.g. 'hotmail.com' ... The search keys are integers in the range 0 through 99999. ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 28
Provided by: soc128
Category:

less

Transcript and Presenter's Notes

Title: CS1102 Tut 9 Hashing


1
CS1102 Tut 9 Hashing
  • Max Tan
  • tanhuiyi_at_comp.nus.edu.sg
  • COM1-01-09 Tel65164364http//www.comp.nus.edu.s
    g/tanhuiyi

2
Group Assignments
  • Group 1 Q4
  • Group 2 Q1
  • Group 3 Q2
  • Group 4 Q3

3
First 15 minutes
  • Hashing
  • Key idea is to map a range of keys to integers
    between 0 arraySize
  • To treat data access as array access
  • Keys can be any attributes in the object we want
    to store which can uniquely identify the object
  • E.g. NRIC number

4
First 15 minutes
  • ah(key) data
  • However its very hard to guarantee that the
    h(key1) h(key2)
  • This is called collision!
  • Several strategies to resolve collision
  • Linear probing (h(key) d) m
  • Quadratic probing (h(key) d2) m
  • Double hashing (h(key) dh2(key)) m
  • Separate chaining h(key) m

5
Question 1
  • a) Given a hash table with size 11, hash
    function h(key) key 11, where collisions are
    resolved using linear probing, show the contents
    of the hash table after each of the following
    operations.
  • Insert(17), Insert(37), Insert(59), Insert(70),
    Find(60), Delete(59), Find(70), Insert(16).

6
Question 1 Linear Probing
  • Insert 17, 37, 59

7
Question 1 Linear Probing
  • Insert 70, Find 60, Delete 59,

8
Question 1 Linear Probing
  • Find 70, Insert 16

9
Question 1 Quadratic Probing
  • Given a hash table with size 11, hash function
    h(key) key 11, where collisions are resolved
    using quadratic probing, show the contents of
    the hash table after each of the following
    operations.
  • Insert(20), Insert(82), Insert(28), Insert(93),
    Find(51), Delete(82), Find(93), Insert(24),
    Insert(68).

10
Question 1 Quadratic Probing
  • Insert 20, 82, 28

11
Question 1 Quadratic Probing
  • Insert 93, find 51, delete 82

12
Question 1 Quadratic Probing
  • Find 93, Insert 24, Insert 68

13
Question 1 Double Hashing
  • Insert 32, 49, 65

14
Question 1 Double Hashing
  • Insert 26, find 37, delete 26

15
Question 1 Double Hashing
  • Delete 98

16
Question 2
  • The hash table has size 2047. The search keys are
    English words.The hash function is h(key)
    (sum of positions in alphabet of keys letters)
    mod 2047
  • English words are short (10 letters or less)
  • Keys will be less than 10 26 260
  • Remaining 1800 space not map-able (non uniform
    distribution of key).
  • Words with the same letters will be hashed to the
    same value, e.g. h(post) h(stop)
    h(spot).

17
Question 2
  • The hash table has size 1024. The elements are
    email addresses.
  • The hashing function is h(key) (sum
    of ASCII values of last 10 characters) mod 1024.
  • Many email addresses have the same domain names,
    and they will all be hashed to the same value
    e.g. hotmail.com.

18
Question 2
  • The hash table is 10000 entries long. The search
    keys are integers in the range 0 to 9999. The
    hash function is h(key) floor(key random),
    where 0.0 random 1.0
  • Random number gives a different output each time!

19
Question 2
  • The hash table is 100000 entries long. The search
    keys are integers in the range 0 through 99999.
    The hash function is given by the following Java
    method
  • First, the x2 will overflow the integer since
    (100000)2 is way beyond the range of int.
  • Secondly, it takes 1000000 iterations to generate
    one hash value. This is too slow.

20
Question 3
  • a) Describe how you can use an additional
    Hashtable to improve the bound of addPenalty to
    worst case O(log n).
  • b) What is the complexity of addPenalty(String
    AthleteName, int penalty) if you use an
    additional AVL tree instead of Hashtable?

21
Question 3(a)
  • The O(n) complexity comes from the linear search
    to find the athlete with the corresponding name
    (heap is not optimized for specific element
    search).
  • Once we found it, we can perform the modified
    updateKey for Min Heap in O(log n) time (recall
    the updateKey method from Tut. 8 Qn 2).

22
Question 3(a)
  • Store the EACH ATHLETES NAME AND POSITION IN THE
    HEAP into a hashtable
  • Allows us to find the position of each name in
    the heap in constant time!
  • However updateKey operation might involve some
    bubbleUp or bubbleDown operations,
  • Every time we swap two nodes in the heap, we need
    to update the positions of two athletes in the
    heap into the Hashtable, each taking O(1) time.

23
Question 3(a)
  • This gives a total time complexity
  • Searching the position of the name to be updated
    O(1)
  • Do the update of the swapped athlete in the heap
    into the Hashtable for each bubbling step O(1)
    O(lg n) O(lg n)
  • Total O(lg n)
  • Which is a better than the worst case of linear
    search on a Min-Heap (O(n)) or treap (worst case
    still O(n), in the case of skewed treap).

24
Question 3(b)
  • Use a balanced binary search tree (e.g AVL tree)
    to store the names as the key and the positions
    as the value.
  • But since finding a name inside the tree would
    take O(lg n) time, the updating of the positions
    of each pair of athletes in one swap would take 2
    O(lg n) time which is in O(lg n).
  • The number of swapping/bubbling step in an
    updateKey operation is O(log n) giving a total
    complexity of O(lg2 n) for the whole updating
    steps.

25
Question 4(a)
  • a) Consider the hashing function h(key) key
    100, where the table size is 100 and the keys are
    even integers from 0 to 1000000. Is this a
    uniform hashing function?
  • No, because no keys will be hashed into
    odd-numbered positions in the table.

26
Question 4(b)
  • Consider the hashing function h(key) (key 7)
    49, where the table size is 49 and the keys are
    integers from 0 to 1000000. Is this a uniform
    hashing function?
  • No, because all keys will be hashed only into
    positions 0, 7, 14, 21, 28, 35 and 42.

27
Question 4(c)
  • Consider the hashing function h(key)
    floor(vkey) 100, where the table size is 100
    and the keys are integers from 1 to 10000. Is
    this a uniform hashing function?
  • No. Note that h(1 key 3) 1 i.e. 3 keys is
    mapped to slot 1, h(4 key 8) 2 i.e. 5 keys
    is mapped to slot 2, h(9 key 15) 3 i.e. 7
    keys is mapped to slot 3, and so on. The
    difference between successive square numbers
    increases as the numbers get larger leading to
    non-uniform distribution of keys into each slot.
Write a Comment
User Comments (0)
About PowerShow.com