11. Hash Tables - PowerPoint PPT Presentation

About This Presentation

Title:

11. Hash Tables

Description:

Title: 1 Author: WHITE Last modified by: Created Date: 3/23/2006 4:00:39 AM Document presentation format: – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 26

Provided by: WHIT1168

Category:

more less

Transcript and Presenter's Notes

Title: 11. Hash Tables

1
11. Hash Tables

Heejin Park
College of Information and Communications
Hanyang University

2
Contents

Direct-address tables
Hash tables
Hash functions
Open addressing

3
Hash functions

What makes a good hash function?
Assumption of simple uniform hashing
Each key is equally likely to hash to any of the
m slots.
Each key hashes independently of where any other
key has hashed to.

4
Hash functions

Interpreting keys as natural numbers
Hash functions assume that the universe of keys
is the set N 0, 1, 2, ... of natural numbers.
If the keys are not natural numbers, a way is
found to interpret them as natural numbers.
For example pt
p 112 and t 116 in the ASCII character set
Expressed as a radix-128 integer
pt becomes (112128)116 14452

5
The division method

The division method
Map a key k into one of m slots by taking the
remainder of k divided by m.
h(k) k mod m.
Example
m 12, k 100
h(k) 100 mod 12 4

6
The division method

Certain values of m are avoided.
m 2p
h(k) is just the p lowest-order bits of k.
m 24, h(k) k mod 24
k 10110100, m 00010000, h(k) 0100
Thus m should not be a power of 2.
m 2p 1
When k is a character string interpreted in radix
2p, permuting the characters of k does not change
its hash value.

7
The division method

A good choice
A prime not too close to an exact power of 2.
If the number of keys are about 2000 and we don't
mind examining an average of 3 elements in an
unsuccessful search, we can allocate a hash table
of size m 701.
701 is a prime near 2000/3 but not near any power
of 2.
h(k) k mod 701.

8
The multiplication method

The multiplication method
h(k) ?m(kA mod 1)?
Multiply the key k by a constant A in the range
(0 lt A lt 1) and extract the fractional part of
kA.
Multiply this value by m and take the floor of
the result.

9
The multiplication method

An advantage of the multiplication method is that
the value of m is not critical.
We typically choose m to be a power of 2 (m
2p).
It makes the implementation of the function easy.
Suppose that the word size of the machine is w
bits and that k fits into a single word.
We restrict A to be a fraction of the form s/2w.
where s is an integer in the range 0 lt s lt 2w

10
The multiplication method
w bits
k
x
s A2W
r0
r1
h(k)
p bits
11
Open addressing

Open addressing
All elements are stored in the hash table itself.
The advantage of open addressing
It avoids pointers altogether.
The extra memory provides the hash table with a
larger number of slots for the same amount of
memory.
Yielding fewer collisions and faster retrieval.

12
Open addressing

Insertion
Examine the hash table (probe) until it finds an
empty slot.
The sequence of positions probed depends upon the
key being inserted.
The probe sequence for every key k
be a permutation of lt0, 1, , m-1gt.

lt h(k, 0), h(k, 1), . . . , h(k,m-1) gt
13
Open addressing

HASH-INSERT
It takes as input a hash table T and a key k.

14
Open addressing

HASH-SEARCH
It takes as input a hash table T and a key k.

15
Open addressing

Deletion
Can you remove the key physically?
Mark the slot by the special value DELETED.

16
Open addressing

Three common techniques for open addressing.
Linear probing
Quadratic probing
Double hashing

17
Linear probing

Given an ordinary hash function h U ? 0, 1,
... , m-1, which we refer to as an auxiliary
hash function, the method of linear probing uses
the hash function
for i 0, 1, , m-1

h(k, i) (h(k) i) mod m
18
Linear probing
T

m 13
k 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27

0
1
2
3
4
5
6
7
8
9
10
11
12
14
27
15
29
17
h(k, i) (k i) mod 13
5
18
32
20
21
9
25
19
Linear probing

Linear probing is easy to implement, but it
suffers from a problem known as primary
clustering.
Long runs of occupied slots build up, increasing
the average search time.
Clusters arise since an empty slot preceded by i
full slots gets filled next with probability (i
1) / m.
Long runs of occupied slots tend to get longer,
and the average search time increases.

20
Quadratic probing

Quadratic probing use a hash function of the form
where h is an auxiliary hash function, c1 and c2
? 0 are auxiliary constants, and i 0, 1, , m-1.

h(k, i) (h(k) c1 i c2 i²) mod m
21
Quadratic probing
T

m 13
k 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27

0
1
2
3
4
5
6
7
8
9
10
11
12
14
27
15
29
17
h(k, i) (k i 3i²) mod 13
5
18
32
20
21
9
25
22
Quadratic probing

If two keys have the same initial probe position,
then their probe sequences are the same, since
h(k1, 0) h(k2, 0) implies h(k1, i) h(k2, i).
This property leads to a milder form of
clustering, called secondary clustering.

23
Double hashing

Double hashing uses a hash function of the form
The initial probe is to position Th1(k).
Successive probe positions are offset from
previous positions by the amount h2(k), modulo m.

h(k, i) (h1(k) i h2(k)) mod m
24
Double hashing

The value h2(k) must be relatively prime to the
hash-table size m for the entire hash table to be
searched.
A way to ensure this condition is to let m be a
power of 2 and to design h2 so that it always
produces an odd number.
Another way is to let m be prime and to design h2
so that it always returns a positive integer less
than m.

25
Double hashing
T