Hash table - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Hash table

Description:

betty. 73. 100. 20. 56.8. 81.5. 90. studid. name. score. 9908080. bill. 49. Consider this problem. ... Common errors (page 749) Providing a poor hash function ... – PowerPoint PPT presentation

Number of Views:278

Avg rating:3.0/5.0

Slides: 30

Provided by: phi762

Learn more at: http://www.cs.gsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hash table

1
Hash table
2
Objective

To learn
Hash function
Linear probing
Quadratic probing
Chained hash table

3
A basic problem

We have to store some records and perform the
following
add new record
delete record
search a record by key
Find a way to do these efficiently!

4
Unsorted array

Use an array to store the records, in unsorted
order
add - add the records as the last entry fast O(1)
delete a target - slow at finding the target,
fast at filling the hole (just take the last
entry) O(n)
search - sequential search slow O(n)

5
Sorted array

Use an array to store the records, keeping them
in sorted order
add - insert the record in proper position. much
record movement slow O(n)
delete a target - how to handle the hole after
deletion? Much record movement slow O(n)
search - binary search fast O(log n)

6
Linked list

Store the records in a linked list (sorted /
unsorted)
add - fast if one can insert node anywhere O(1)
delete a target - fast at disposing the node, but
slow at finding the target O(n)
search - sequential search slow O(n) (if we only
use linked list, we cannot use binary search even
if the list is sorted.)

7
Array as table
studid
name
score
andy
81.5
0012345
0033333
betty
90
0056789
david
56.8
...
9801010
peter
20
9802020
mary
100
...
9903030
tom
73
9908080
bill
49
Consider this problem. We want to store 1000
student records and search them by student id.
8
Array as table
studid
name
score
0
One naive way is to store the records in a huge
array (index 0..9999999). The index is used as
the student id, i.e. the record of the student
with studid 0012345 is stored at A12345

12345
andy
81.5

33333
betty
90

56789
david
56.8

9908080
bill
49

9999999
9
Array as table

Store the records in a huge array where the index
corresponds to the key
add - very fast O(1)
delete - very fast O(1)
search - very fast O(1)
But it wastes a lot of memory! Not feasible.

10
Hash function
function Hash(key KeyType) integer
Imagine that we have such a magic function Hash.
It maps the key (stud_id) of the 1000 records
into the integers 0..999, one to one. No two
different keys maps to the same number.
H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9908080) 3
11
Hash table
studid
name
score
0
To store a record, we compute Hash(stud_id) for
the record and store it at the location
Hash(stud_id) of the array. To search for a
student, we only need to peek at the location
Hash(target stud_id).

3
bill
49
9908080

67
betty
90
0033333

134
andy
81.5
0012345

764
david
56.8
0056789

999

12
Hash table with Perfect Hash

Such magic function is called perfect hash
add - very fast O(1)
delete - very fast O(1)
search - very fast O(1)
But it is generally difficult to design perfect
hash. (e.g. when the potential key space is large)

13
Hash function

A hash function maps a key to an index within in
a range
Desirable properties
simple and quick to calculate
even distribution, avoid collision as much as
possible

function Hash(key KeyType)
14
Division Method
h(k) k mod m

Certain values of m may not be good
Good values for m are prime numbers which are not
close to exact powers of 2. For example, if you
want to store 2000 elements then m701 (m hash
table length) yields a hash function

h(key) k mod 701
15
Collision

For most cases, we cannot avoid collision
Collision resolution - how to handle when two
different keys map to the same index

H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9903030) 3 H(9908080) 3
16
Hash Tables

The problem arises because we have two keys that
hash in the same array entry, a collision. There
are two ways to resolve collision
Hashing with Chaining every hash table entry
contains a pointer to a linked list of keys that
hash in the same entry
Hashing with Open Addressing every hash table
entry contains only one key. If a new key hashes
to a table entry which is filled, systematically
examine other table entries until you find one
empty entry to place the new key

17
Open Addressing

The key is first mapped to a slot
If there is a collision subsequent probes are
performed
If the offset constant, c and m are not
relatively prime, we will not examine all the
cells. Ex.
Consider m4 and c2, then only every other slot
is checked.
When c1 the collision resolution is done as a
linear search. This is known as linear probing.

18
Linear Probing example1
Insert 89, 18, 49, 58, 9 to table size10,
hash function is tablesize
19
Linear Probing Example-2

Single character keys, table size, m8
Hash function (map characters to range
0...7)k APQ BOR CNS DMT ELU
FKN GJWZ HIXY
h1(k) 0 1 2
3 4 5 6
7

20
Choosing a Hash Function

Notice that the insertion of Q required several
probes (5). This was caused by A and P mapping
to slot 0 which is beside the C and D keys.
The performance of the hash table depends on a
having a hash function which evenly distributes
the keys.
The statistics of the key distribution needs to
be accounted for. For example, choosing the
first letter of a surname will cause problems
depending on the nationality of the population
the variable names in a compiler often differ by
one character, eg., t1, t2, t3, etc.
Consult computer science texts, such as Knuths
The Art of Computer Programming.

21
Clustering

Even with a good hash function, linear probing
has its problems
The position of the initial mapping i 0 of key k
is called the home position of k.
When several insertions map to the same home
position, they end up placed contiguously in the
table. This collection of keys with the same
home position is called a cluster.
As clusters grow, the probability that a key will
map to the middle of a cluster increases,
increasing the rate of the clusters growth.
This tendency of linear probing to place items
together is known as primary clustering.
As these clusters grow, they merge with other
clusters forming even bigger clusters which grow
even faster.

22
Performance Analysis

If n slots in a table of size m are occupied, the
load factor is defined aswhere ?1 means the
table is full, and ?0 means the table is empty.
It can be shown that the number of probes in a
successful search, C, and the number of probes in
an unsuccessful search, C is given by

23
Quadratic Probing

h(k)h(k) f(i) ( i0,1,2,)TS
h(k)Rmod TS
f(i)i2
Theorem 20.4 If quadratic probing is used and
the table size is prime, then a new element can
always be inserted if the table is at least half
empty. Furthermore, in the course of the
insertion, no cell is probed twice.

24
Quadratic probing-example
Insert 89, 18, 49, 58, 9 to table size10,
hash function is tablesize
25
Double Hashing

Recall that in open addressing the sequence of
probes follows
We can solve the problem of primary clustering in
linear probing by having the keys which map to
the same home position use differing probe
sequences. In other words, the different values
for c should be used for different keys.
Double hashing refers to the scheme of using
another hash function for c
Note that h1 and h2 need to be evaluated only
once per key.

26
Chained Hash Table
One way to handle collision is to store the
collided records in a linked list. The array now
stores pointers to such lists. If no key maps to
a certain hash value, that array entry points to
nil.
0
1
nil
2
nil
3
4
nil
5

Key 9903030 name tom score 73
HASHMAX
nil
27
Chained Hash table