Hash%20Tables - PowerPoint PPT Presentation

About This Presentation

Title:

Hash%20Tables

Description:

... of the integer (table size 100) ... Table is an array of TableSize, hash(key) is a function that ... Folding (integer or bits) Divide value into subgroups (k ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 19

Provided by: ellenw4

Learn more at: http://cs.hiram.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hash%20Tables

1
Hash Tables

Ellen Walker
CPSC 201 Data Structures
Hiram College

2
Breaking the Rules

The fastest possible search algorithm, if you
only compare two items at once, is O(log n) where
n is the number of items in the table.
But, if we can figure out a way to compare
multiple items at once, we can beat that!

3
Magic Address Calculator

Represent your table as an array
Add a new function, the magic address
calculator
The input to this function is the key
The output of this function is the address to
look in
No comparisons, so were not limited to log n.
In fact, if the calculator takes the same time
for every input, its constant time search!

4
Hash Function

The magic calculator function is called a hash
function
It treats the key as a sequence of bits or an
integer, regardless of its original type
Example hash functions (not very good ones)
Last two digits of the integer (table size 100)
Divide the bit string into sequences of 8 bits
and XOR all sequences together (table size 256)

5
Hash Table

Table is an array of TableSize, hash(key) is a
function that returns a value from 0 to
TableSize.
To insert
Tablehash(key) key
To retrieve
Result Tablehash(key)
To delete
Tablehash(key) empty marker
Can it really be that simple?

6
Hash Table Collisions

If the size of the the table is smaller than the
number of possible keys, then there must be at
least two keys with the same hash value.
E.g. 202 and 102 if key is last 2 digits
If we want to insert both values, we will get a
collision
The item we retrieve might not really have a
matching key
The location to insert into might already be full

7
Avoiding Collisions

Make the table big (if you can afford it)
Pick the right hash function
If you know all possible keys, create a perfect
hash function (unique value for each possible
key)
Try to distribute all possible keys evenly among
the addresses
Try to distribute the most likely keys evenly
among the addresses

8
Choosing a Hash Function

Should return integers in a fixed range
Should be quick to compute
Should avoid obvious patterns of results
Should involve the entire search key

9
Typical Hash Functions

Taking an integer modulo a prime number
Prime number has only 1 and itself as factors
This avoids patterns of addresses
Easiest to analyze and most common
Folding (integer or bits)
Divide value into subgroups (k bits or digits)
Add or XOR together subgroups

10
Resolving Collisions byOpen Addressing

Find another place within the table for the item
Linear probing new item goes in first empty
space after the result of the hash function
(Offsets are sequence of numbers)
Quadratic probing first look in next space,
then skip to 4th space, then 9th, then 16th, etc.
(Offsets are sequence of squares)
Double hashing use a second hash function on the
key to find the offset. (Offsets are multiples
of the second hash value)

11
Insertion with Open Addressing

void insert(E item)
int address hash(item)
while(Tableaddress!null)
compute next offset
address address offset
Tableaddress item

12
Retrieval with Open Addressing

E retrieve(E item)
int address hash(item)
while((!tableaddress.equals(item))
(tableaddress ! null))
compute next offset
address address offset
return( tableaddress) //returns null if not
found

13
Issues with Open Addressing

Retrieval must follow same sequence of probes as
insertion
If a collision fills a cell, then it forces a
collision with the value that hashes directly to
the cell.
Consider
Hash(key) key11
Sequence of items 1,14,12,2,3,41,27,15
Try linear, quadratic, double hash key7

14
Comparing Open Addressing Schemes

Linear probing is most prone to clustering
Large clumps of cells fill, causing long
sequences of probes for each insertion
Quadratic probing is less prone to clustering
Each probe is even further from the cluster
No guarantee every slot will be searched, though!
Double hashing depends on the other hash function
Its base should be relatively prime to the
original base so there is no pattern
In this case, it is as good or better than
quadratic

15
Restructuring the Hash Table

Each address can contain multiple items
Bucket (set max items per hash key)
Separate chaining (array of linked lists)
Our example again
Hash(key) key11
Sequence of items 1,14,12,2,3,41,27,15

16
Bucket Multiple Cells per Hash Value
0 Data with hash value 0
Another data with hash value 0
Third data with hash value 0
1 First data with hash value 1
(etc).

2

17
Separate chaining

Hash table as array of linked lists

0
1 null
2
3 null
4
18
Growing a Hash Table

Open addressing
When the hash table is full, allocate a bigger
one.
Rehashing add each element from the original
table to the full one using the new hash code.
Chaining
When the lists are getting too long, allocate a
bigger table
Rehash as above.

Write a Comment

User Comments (0)