11. Hash Tables - PowerPoint PPT Presentation

About This Presentation
Title:

11. Hash Tables

Description:

Title: 1 Author: WHITE Last modified by: Created Date: 3/23/2006 4:00:39 AM Document presentation format: – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 26
Provided by: WHIT1168
Category:
Tags: hash | story | tables

less

Transcript and Presenter's Notes

Title: 11. Hash Tables


1
11. Hash Tables
  • Heejin Park
  • College of Information and Communications
  • Hanyang University

2
Contents
  • Direct-address tables
  • Hash tables
  • Hash functions
  • Open addressing

3
Hash functions
  • What makes a good hash function?
  • Assumption of simple uniform hashing
  • Each key is equally likely to hash to any of the
    m slots.
  • Each key hashes independently of where any other
    key has hashed to.

4
Hash functions
  • Interpreting keys as natural numbers
  • Hash functions assume that the universe of keys
    is the set N 0, 1, 2, ... of natural numbers.
  • If the keys are not natural numbers, a way is
    found to interpret them as natural numbers.
  • For example pt
  • p 112 and t 116 in the ASCII character set
  • Expressed as a radix-128 integer
  • pt becomes (112128)116 14452

5
The division method
  • The division method
  • Map a key k into one of m slots by taking the
    remainder of k divided by m.
  • h(k) k mod m.
  • Example
  • m 12, k 100
  • h(k) 100 mod 12 4

6
The division method
  • Certain values of m are avoided.
  • m 2p
  • h(k) is just the p lowest-order bits of k.
  • m 24, h(k) k mod 24
  • k 10110100, m 00010000, h(k) 0100
  • Thus m should not be a power of 2.
  • m 2p 1
  • When k is a character string interpreted in radix
    2p, permuting the characters of k does not change
    its hash value.

7
The division method
  • A good choice
  • A prime not too close to an exact power of 2.
  • If the number of keys are about 2000 and we don't
    mind examining an average of 3 elements in an
    unsuccessful search, we can allocate a hash table
    of size m 701.
  • 701 is a prime near 2000/3 but not near any power
    of 2.
  • h(k) k mod 701.

8
The multiplication method
  • The multiplication method
  • h(k) ?m(kA mod 1)?
  • Multiply the key k by a constant A in the range
    (0 lt A lt 1) and extract the fractional part of
    kA.
  • Multiply this value by m and take the floor of
    the result.

9
The multiplication method
  • An advantage of the multiplication method is that
    the value of m is not critical.
  • We typically choose m to be a power of 2 (m
    2p).
  • It makes the implementation of the function easy.
  • Suppose that the word size of the machine is w
    bits and that k fits into a single word.
  • We restrict A to be a fraction of the form s/2w.
  • where s is an integer in the range 0 lt s lt 2w

10
The multiplication method
w bits
k
x
s A2W
r0
r1
h(k)
p bits
11
Open addressing
  • Open addressing
  • All elements are stored in the hash table itself.
  • The advantage of open addressing
  • It avoids pointers altogether.
  • The extra memory provides the hash table with a
    larger number of slots for the same amount of
    memory.
  • Yielding fewer collisions and faster retrieval.

12
Open addressing
  • Insertion
  • Examine the hash table (probe) until it finds an
    empty slot.
  • The sequence of positions probed depends upon the
    key being inserted.
  • The probe sequence for every key k
  • be a permutation of lt0, 1, , m-1gt.

lt h(k, 0), h(k, 1), . . . , h(k,m-1) gt
13
Open addressing
  • HASH-INSERT
  • It takes as input a hash table T and a key k.

14
Open addressing
  • HASH-SEARCH
  • It takes as input a hash table T and a key k.

15
Open addressing
  • Deletion
  • Can you remove the key physically?
  • Mark the slot by the special value DELETED.

16
Open addressing
  • Three common techniques for open addressing.
  • Linear probing
  • Quadratic probing
  • Double hashing

17
Linear probing
  • Given an ordinary hash function h U ? 0, 1,
    ... , m-1, which we refer to as an auxiliary
    hash function, the method of linear probing uses
    the hash function
  • for i 0, 1, , m-1

h(k, i) (h(k) i) mod m
18
Linear probing
T
  • m 13
  • k 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27

0
1
2
3
4
5
6
7
8
9
10
11
12
14
27
15
29
17
h(k, i) (k i) mod 13
5
18
32
20
21
9
25
19
Linear probing
  • Linear probing is easy to implement, but it
    suffers from a problem known as primary
    clustering.
  • Long runs of occupied slots build up, increasing
    the average search time.
  • Clusters arise since an empty slot preceded by i
    full slots gets filled next with probability (i
    1) / m.
  • Long runs of occupied slots tend to get longer,
    and the average search time increases.

20
Quadratic probing
  • Quadratic probing use a hash function of the form
  • where h is an auxiliary hash function, c1 and c2
    ? 0 are auxiliary constants, and i 0, 1, , m-1.

h(k, i) (h(k) c1 i c2 i²) mod m
21
Quadratic probing
T
  • m 13
  • k 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27

0
1
2
3
4
5
6
7
8
9
10
11
12
14
27
15
29
17
h(k, i) (k i 3i²) mod 13
5
18
32
20
21
9
25
22
Quadratic probing
  • If two keys have the same initial probe position,
    then their probe sequences are the same, since
    h(k1, 0) h(k2, 0) implies h(k1, i) h(k2, i).
  • This property leads to a milder form of
    clustering, called secondary clustering.

23
Double hashing
  • Double hashing uses a hash function of the form
  • The initial probe is to position Th1(k).
  • Successive probe positions are offset from
    previous positions by the amount h2(k), modulo m.

h(k, i) (h1(k) i h2(k)) mod m
24
Double hashing
  • The value h2(k) must be relatively prime to the
    hash-table size m for the entire hash table to be
    searched.
  • A way to ensure this condition is to let m be a
    power of 2 and to design h2 so that it always
    produces an odd number.
  • Another way is to let m be prime and to design h2
    so that it always returns a positive integer less
    than m.

25
Double hashing
T
  • m 13
  • k 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27

0
1
2
3
4
5
6
7
8
9
10
11
12
14
27
15
29
h1(k) k mod 13
17
5
18
h2(k) 1 (k mod 11)
32
h(k, i) (h1(k) i h2(k)) mod 13
20
21
9
25
Write a Comment
User Comments (0)
About PowerShow.com