CS1102 Tut 9 Hashing

About This Presentation

Title:

CS1102 Tut 9 Hashing

Description:

The elements are email addresses. ... will all be hashed to the same value e.g. 'hotmail.com' ... The search keys are integers in the range 0 through 99999. ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 28

Provided by: soc128

Category:

more less

Transcript and Presenter's Notes

Title: CS1102 Tut 9 Hashing

1
CS1102 Tut 9 Hashing

Max Tan
tanhuiyi_at_comp.nus.edu.sg
COM1-01-09 Tel65164364http//www.comp.nus.edu.s
g/tanhuiyi

2
Group Assignments

Group 1 Q4
Group 2 Q1
Group 3 Q2
Group 4 Q3

3
First 15 minutes

Hashing
Key idea is to map a range of keys to integers
between 0 arraySize
To treat data access as array access
Keys can be any attributes in the object we want
to store which can uniquely identify the object
E.g. NRIC number

4
First 15 minutes

ah(key) data
However its very hard to guarantee that the
h(key1) h(key2)
This is called collision!
Several strategies to resolve collision
Linear probing (h(key) d) m
Quadratic probing (h(key) d2) m
Double hashing (h(key) dh2(key)) m
Separate chaining h(key) m

5
Question 1

a) Given a hash table with size 11, hash
function h(key) key 11, where collisions are
resolved using linear probing, show the contents
of the hash table after each of the following
operations.
Insert(17), Insert(37), Insert(59), Insert(70),
Find(60), Delete(59), Find(70), Insert(16).

6
Question 1 Linear Probing

Insert 17, 37, 59

7
Question 1 Linear Probing

Insert 70, Find 60, Delete 59,

8
Question 1 Linear Probing

Find 70, Insert 16

9
Question 1 Quadratic Probing

Given a hash table with size 11, hash function
h(key) key 11, where collisions are resolved
using quadratic probing, show the contents of
the hash table after each of the following
operations.
Insert(20), Insert(82), Insert(28), Insert(93),
Find(51), Delete(82), Find(93), Insert(24),
Insert(68).

10
Question 1 Quadratic Probing

Insert 20, 82, 28

11
Question 1 Quadratic Probing

Insert 93, find 51, delete 82

12
Question 1 Quadratic Probing

Find 93, Insert 24, Insert 68

13
Question 1 Double Hashing

Insert 32, 49, 65

14
Question 1 Double Hashing

Insert 26, find 37, delete 26

15
Question 1 Double Hashing

Delete 98

16
Question 2

The hash table has size 2047. The search keys are
English words.The hash function is h(key)
(sum of positions in alphabet of keys letters)
mod 2047
English words are short (10 letters or less)
Keys will be less than 10 26 260
Remaining 1800 space not map-able (non uniform
distribution of key).
Words with the same letters will be hashed to the
same value, e.g. h(post) h(stop)
h(spot).

17
Question 2

The hash table has size 1024. The elements are
email addresses.
The hashing function is h(key) (sum
of ASCII values of last 10 characters) mod 1024.
Many email addresses have the same domain names,
and they will all be hashed to the same value
e.g. hotmail.com.

18
Question 2

The hash table is 10000 entries long. The search
keys are integers in the range 0 to 9999. The
hash function is h(key) floor(key random),
where 0.0 random 1.0
Random number gives a different output each time!

19
Question 2

The hash table is 100000 entries long. The search
keys are integers in the range 0 through 99999.
The hash function is given by the following Java
method
First, the x2 will overflow the integer since
(100000)2 is way beyond the range of int.
Secondly, it takes 1000000 iterations to generate
one hash value. This is too slow.

20
Question 3

a) Describe how you can use an additional
Hashtable to improve the bound of addPenalty to
worst case O(log n).
b) What is the complexity of addPenalty(String
AthleteName, int penalty) if you use an
additional AVL tree instead of Hashtable?

21
Question 3(a)

The O(n) complexity comes from the linear search
to find the athlete with the corresponding name
(heap is not optimized for specific element
search).
Once we found it, we can perform the modified
updateKey for Min Heap in O(log n) time (recall
the updateKey method from Tut. 8 Qn 2).

22
Question 3(a)

Store the EACH ATHLETES NAME AND POSITION IN THE
HEAP into a hashtable
Allows us to find the position of each name in
the heap in constant time!
However updateKey operation might involve some
bubbleUp or bubbleDown operations,
Every time we swap two nodes in the heap, we need
to update the positions of two athletes in the
heap into the Hashtable, each taking O(1) time.

23
Question 3(a)

This gives a total time complexity
Searching the position of the name to be updated
O(1)
Do the update of the swapped athlete in the heap
into the Hashtable for each bubbling step O(1)
O(lg n) O(lg n)
Total O(lg n)
Which is a better than the worst case of linear
search on a Min-Heap (O(n)) or treap (worst case
still O(n), in the case of skewed treap).

24
Question 3(b)

Use a balanced binary search tree (e.g AVL tree)
to store the names as the key and the positions
as the value.
But since finding a name inside the tree would
take O(lg n) time, the updating of the positions
of each pair of athletes in one swap would take 2
O(lg n) time which is in O(lg n).
The number of swapping/bubbling step in an
updateKey operation is O(log n) giving a total
complexity of O(lg2 n) for the whole updating
steps.

25
Question 4(a)

a) Consider the hashing function h(key) key
100, where the table size is 100 and the keys are
even integers from 0 to 1000000. Is this a
uniform hashing function?
No, because no keys will be hashed into
odd-numbered positions in the table.

26
Question 4(b)

Consider the hashing function h(key) (key 7)
49, where the table size is 49 and the keys are
integers from 0 to 1000000. Is this a uniform
hashing function?
No, because all keys will be hashed only into
positions 0, 7, 14, 21, 28, 35 and 42.

27
Question 4(c)

Consider the hashing function h(key)
floor(vkey) 100, where the table size is 100
and the keys are integers from 1 to 10000. Is
this a uniform hashing function?
No. Note that h(1 key 3) 1 i.e. 3 keys is
mapped to slot 1, h(4 key 8) 2 i.e. 5 keys
is mapped to slot 2, h(9 key 15) 3 i.e. 7
keys is mapped to slot 3, and so on. The
difference between successive square numbers
increases as the numbers get larger leading to
non-uniform distribution of keys into each slot.

Write a Comment

User Comments (0)

About PowerShow.com

CS1102 Tut 9 Hashing - PowerPoint PPT Presentation

CS1102 Tut 9 Hashing

The elements are email addresses. ... will all be hashed to the same value e.g. 'hotmail.com' ... The search keys are integers in the range 0 through 99999. ... – PowerPoint PPT presentation