Hash Tables - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Hash Tables

Description:

key words: collision, hash function. Implementation 1 open addressing ... Alleviates problem of clustering. Time consuming calculating new probe position ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 25

Provided by: patric190

Category:

more less

Transcript and Presenter's Notes

Title: Hash Tables

1
Hash Tables
2
Overview

What are hash tables ?
what
why
operations
- key words collision, hash function
Implementation 1 open addressing
Implementation 2 chained lists

3
Definition

A hash table is a data structure that uses a hash
function to efficiently map certain identifiers
or keys to associated values
In a hash table
A container/collection i.e. an object that holds
a bunch of other objects (just like arrays,
lists, stacks, queues, trees and graphs)
VALUES are associated with KEYS
(just as values in an array are associated with
an index, values in a list are associated with a
position)
Hashing function
A hash function maps a search key into an integer
between 0 and n-1.
A single integer that may serve as an index into
an array.
The values returned by a hash function are called
hash values, hash codes, hash sums, or simply
hashes.

4
Why?

Using balanced trees AVL trees) we can implement
table operations (retrieval, insertion and
deletion) efficiently. ? O(logN)
Can we find a data structure so that we can
perform these table operations better than
balanced search trees? ? O(1)
In a hash table
Searching for a value is O(1) ie constant time
Inserting a value is O(1)
Better than a binary search tree!

5
How?

Uses an array to store data
The position of an item in the array is computed
Using a hash function applied to the key i.e.
position hashFunction(key)
Example hash functions
ASCII value of first letter 65 MOD array size
sum of digits in student number MOD array size
Store values in the array (open addressing) or
store lists in the array (chained lists)

6
Problems

Will two keys map to the same location in the
table?
How to decide the size of the table?
If the data set is of known size
a perfect hashing function can be used, then the
table can be made as the size as the data set.
Otherwise, , to make the table 150 the size of
the dataset.
If we do not know the size of the data set
Dynamic resizing
When to resize?
Can we simply expand the table when it is full?

7
Terminology

Perfect hashing function
A hashing function that maps each element to a
unique position in a table.
Collision
The situation where two elements or keys map to
the same location in the table
Dynamic resizing
Dynamics resizing of a hash table involves
creating a new hash table that is larger than the
original, inserting all of the elements of the
original table into the new table, and then
discarding the original one.
Load factor
The ratio of the number of elements in a hash
table to its size
Used to describe how full the table currently is

8
Hashing Functions

We do not need the hashing function to be perfect
to get good performance from the hash table
Have a function that does reasonably good job of
distributing our elements in the table such that
we avoid collisions.
A reasonably good hashing function will still
result in constant time access
Examples
ASCII value of the first letter MOD array size
Sum of digits MOD array size
Division use the remainder of the key divided by
some positive integer (table size for example) as
the index of the given element

Hashcode(key) Math.abs(key)size
9
Resolving collisions Chaining

Definition
The chaining method for handling collisions
simply treats the hash table conceptually as a
table of collection rather than as a table of
individual cells.
Uses an array of lists
Key and hash function used to compute location
which list the value will be stored in
Each cell in the hash table would be something
like the LinearNode class
Advantages
No problems with collisions as values are just
added to the end of the appropriate list
Hash table never be full
Disadvantages
Need to use lists, Constructing new chain nodes
is relatively expensive
Parts of the array might never be used.
As chains get longer, search time increases to
O(n) in the worst case.

10
Example
11
Resolving Collision Open Addressing

Definition
The open addressing method for handling
collisions looks for another open position in the
table rather than the one to which the element is
originally hashed.
Values stored directly in the array - ie an array
of Objects
Problem
collisions two keys compute to the same location
Solutions
linear probing look in slots pos1, pos 2,
pos3,pos4 etc. (i.e. use next available free
slot)
Quadratic probing look in slots pos1, pos4,
pos9, pos16 etc
Rehash
calculate another position

12
Examples
13
Linear probing

In linear probing, we search the hash table
sequentially starting from the original hash
location.
If a location is occupied, we check the next
location
We wrap around from the last table location to
the first table location if necessary.
Advantages
Simple to implement
Disadvantages
Tends to create clusters of filled position
within the table
These clusters will affect the performance of
insertions/search
Deletion becomes trickier.
The array can become full

14
Linear probing an Example

If the hash table is not full, attempt to store
key in the next array element (t1)N, (t2)N,
(t3)N until you find an empty slot
Example
Table Size is 11 (0..10)
Hash Function h(x) x mod 11
Insert keys 20, 30, 2, 13, 25, 24, 10, 9

10
0
15
Quadratic Probing

In quadratic probing,
We start from the original hash location i
If a location is occupied, we check the locations
i12 , i22 , i32 , i42 ...
We wrap around from the last table location to
the first table location if necessary
Advantages and disadvantages
Tends to distribute keys better than linear
probing
Alleviates problem of clustering
Time consuming calculating new probe position
Runs the risk of an infinite loop on insertion
and might not find free space for item even if
table not full
Consider inserting the key 16 into a table of
size 16, with positions 0, 1, 4 and 9 already
occupied - table size should be prime.
Deletion becomes trickier.

16
Quadratic Probing an Example

If the hash table is not full, attempt to store
key in the next array element (t12)N,
(t22)N, (t32)N until you find an empty slot
Example
Table Size is 11 (0..10)
Hash Function h(x) x mod 11
Insert keys 20, 30, 2, 13, 25, 24, 10, 9

10
0
17
Double Hashing

Resolving collisions by providing a secondary
hashing function, h2, to be used when the primary
hashing function, h1, results in a collision.
Basic requirement
h2(key) ? 0
h1 ? h2
Implementation Let a second hash function
h2(key)d. Attempt to store key in array
elements (td)N, (t2d)N, (t3d)N until you
find an open slot.
Using the division method to maintain the
calculated index within the bounds of the table

18
Double Hashing an Example

Typical second hash function
h2(x)R - ( x R )
where R is a prime number, R lt N (size of the
table)
Example
Table Size is 11 (0..10)
Hash Function
h1(x) x mod 11
h2(x) 7 (x mod 7 )
Insert keys 20, 30, 2, 13, 25, 24, 10, 9

19
Open Addressing Retrieval Deletion

In open addressing, to find an item with a given
key
We probe the locations (same as insertion) until
we find the desired item or we reach to an empty
location.
Deletions in open addressing cause complications
Examples elements Ann, Andrew, and Amy all
mapped to the same location in the table and
collision was resolved using linear probing. What
happens if we now remove Andrew?

Ann
Bob
Andrew
Doug
Bill
Amy
20
Solutions

Solution To mark items as deleted but not
actually remove them from the table until some
future point when the deleted element is
overwritten by
A new inserted table
The entire table is rehashed.
Each cell is in one of 3 possible states
active
empty
deleted
For Find or Delete
only stop search when EMPTY state detected (not
DELETED)

A deleted location will be treated as an occupied
location during retrieval and insertion.

21
Hash Table Operations

public
insert(key, item)
store the item in the hash table at the position
dictated by the key
delete(key)
delete the item in the hash table at the position
dictated by the key
fetch(key) -gtitem
get the item in the hash table at the position
dictated by the key
private
hashFunction(key) gtposition
calculate the position for the given key

22
Java Implementation
interface HashTable public void put(String
key, Object value) public Object get(String
key) public void remove(String key)
23
DataItem
class DataItem private String key private
Object value private boolean
deleted DataItem(String key, Object
value) this.key key this.value
value deleted false public String
getKey() return key public Object
getValue() return value public void
markDeleted()deleted true public boolean
isDeleted()return deleted
24
HashTable Java Implementation