Hashing - PowerPoint PPT Presentation

About This Presentation

Title:

Hashing

Description:

Problem: store and retrieving an item using its key (for example, ID number, name) ... Applications: symbol table. on-line spelling checker ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 42

Provided by: veeramu

Category:

more less

Transcript and Presenter's Notes

Title: Hashing

1
Hashing
Problem store and retrieving an item using its
key (for example, ID number, name)

Linked List
takes O(N) time
Binary Search Tree
take O(logN) time
Array List
take O(1) time

2
Array
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Problem a lot of empty space
3
Hashing
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Map the key into some number between 0 to
ArraySize-1
4
Hashing

Map the key into an array position using a hash
function
ArrayIndex hash(key)
Take O(1) time to access an item
Much less empty space than using normal array

5
Hash Function

Must return a valid array index.
Should be 1-to-1 mapping.
If key1 ! key2 then hash(key1) ! hash(key2)
A collision occurs when two distinct keys hash to
the same location in the array
Should distribute the keys evenly
Any key value k is equally likely to hash to any
of the m array locations.

6
Simple Hash Function

ArrayIndex key mod TableSize
Example
4112041 -gt 12041 mod 1000 -gt 41
4163490 -gt 63490 mod 1000 -gt 490
TableSize should be a prime number for even
distribution

7
Another Hash Function

ArrayIndex (k0 37k1 372k2 . . . ) mod
TableSize
Example 3-character key
ArrayIndex (k0 37k1 372k2) mod TableSize
ArrayIndex k0 37 (k1 37 (k2)) mod
TableSize

8
Hash Function
public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37
hashVal key.charAt( i ) hashVal
tableSize if ( hashVal lt 0 ) // overflow
hashVal tableSize return hashVal
9
Collision

When an element is inserted, if it hashes to the
same value as an already inserted element, then
we have a collision.
Collision resolving techniques
Separate Chaining
Open Addressing
Linear Probling, Quadratic Probling, Double
Hashing

10
Separate Chaining
11
Separate Chaining

Load factor l number of elements / table size
average length of list l
successful search cost 1 (l/2) link traversals
cost depends on l

12
Separate Chaining evenly distributed
13
Separate Chaining last digit is zero
Solution TableSize is prime
14
Open Addressing

No linked-list. All items are in the array
If a collision occurs, alternative locations are
tried until an empty cell is found
try h0(x), h1(x), h2(x),
hi(x) (hash(x) f(i)) mod TableSize
f(i) is a collision resolution strategy
Require bigger table, l should be below 0.5

15
Linear Probing

If a collision occurs, try the next cell
sequentially
f(i) i
hi(x) (hash(x) i) mod TableSize
Try hash(x) mod TableSize, (hash(x) 1) mod
TableSize, (hash(x) 2) mod TableSize, (hash(x)
3) mod TableSize, . . .

16
Linear Probing
Insert 89, 18, 49, 58, 69
89 is directly inserted into cell 9 18 is
directly inserted into cell 8 49 has a collision
at cell 9 and finally put into cell 0 58 has
collisions at cell 8, 9, 0 and finally put into
cell 1 69 has a collisions at cell 9, 0, 1 and
finally put into cell 2
17
Primary Clustering

Forming of blocks of occupied cells (called
clusters)
A collision occurs if a key is hashed into
anywhere in a cluster. Then there may be several
attempts to resolve the collision before a free
space is found. The new data is added into the
cluster.

18
Linear Probing

Problem Primary Clustering
Normal deletion cannot be performed (some
following find operations will fail because the
link of collisions that leads to the data is cut)
Use lazy deletion
Insertion cost number of probes to find an
empty cell
1/(fraction of
empty cells)
1/(1- l)

19
Quadratic Probing

Eliminate primary clustering
f(i) i2
hi(x) (hash(x) i2) mod TableSize
Try hash(x) mod TableSize, hash(x)12 mod
TableSize,
hash(x)22 mod TableSize, hash(x)32 mod
TableSize, . . .
Table must be at most half full and table size
must be prime, otherwise insertion may fail
(always have a collision)

20
Quadratic Probing
Insert 89, 18, 49, 58, 69
Insert 89, try cell 9 Insert 18, try cell
8 Insert 49, try cell 9, 0 Insert 58, try cell 8,
9, 2 Insert 69, try cell 9, 0, 3
21
Quadratic Probing
Insert 10, 20, 30, 40, 50, 60, 70
Insert 10, try cell 0 Insert 20, try cell 0,
1 Insert 30, try cell 0, 1, 4 Insert 40, try cell
0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6
(16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5
(25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5
(25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1
(121), 4 (144), 9 (169), 6 (196), . . .
20
30
60
50
22
Quadratic Probing

Secondary clustering
elements that hash to the same position will
probe the same alternative cells and put into the
next available space, forming a cluster.
In the first example, inserting 89, 49, 69 forms
a secondary cluster. Inserting 18, 58 forms
another secondary cluster.

23
Double Hashing

f(i) i hash2(x)
hi(x) (hash(x) i hash2(x)) mod TableSize
Try hash(x) mod TableSize, (hash(x) hash2(x))
mod TableSize,
(hash(x) 2hash2(x)) mod TableSize, . . .
Example hash2(x) R - (x mod R)
R is a prime number smaller than TableSize

24
Double Hashing
Insert 89, 18, 49, 58, 69, 23
hash2(49) 7-(49 mod 7) 7 hash2(58) 7-(58
mod 7) 5 hash2(69) 7-(69 mod 7) 1 hash2(23)
7-(23 mod 7) 5 Insert 49, try 9, (97) mod 10
6 Insert 58, try 8, (85) mod 10 3 Insert 69,
try 9, (91) mod 10 0 Insert 23, try 3, (3 5)
mod 10 8, (3 10) mod 10 3, (315) mod 10
8, . . .
25
Rehashing

When the table is too full, create a new table at
least twice as big (and size is prime), compute
the new hash value of each element, insert it
into the new table.
Rehash when the table is half full, or when an
insertion fails, or when a certain load factor is
reached.
Because of lazy deletion, deleted cells are also
counted when the load factor is calculated.
Rehashing time is O(N). But the cost is shared
by preceding N/2 insertions. So, it adds
constant cost to each insertion.

26
public interface Hashable int hash( int
tableSize ) public class MyInteger
implements Comparable, Hashable public int
hash( int tableSize ) if ( value lt 0 )
return -value tableSize else
return value tableSize
27
public static void main( String args )
SeparateChainingHashTable H new
SeparateChainingHashTable( ) final int NUMS
4000 final int GAP 37 for( int i
GAP i ! 0 i ( i GAP ) NUMS )
H.insert( new MyInteger( i ) ) for( int i
1 i lt NUMS i 2 ) H.remove( new MyInteger(
i ) ) for( int i 2 i lt NUMS i2 )
if( ((MyInteger)(H.find( new MyInteger( i ) ))).
intValue( ) ! i )
System.out.println( "Find fails " i )
28
public class SeparateChainingHashTable
private LinkedList theLists public
SeparateChainingHashTable( ) public
SeparateChainingHashTable( int size ) public
void insert( Hashable x ) public void remove(
Hashable x ) public void find( Hashable x )
public void makeEmpty( ) public static int
hash( String key, int tableSize ) private
static final int DEFAULT_TABLE_SIZE 101
private static int nextPrime( int n ) private
static boolean isPrime( int n )
29
public class SeparateChainingHashTable
public SeparateChainingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
SeparateChainingHashTable( int size )
theLists new LinkedList nextPrime( size )
for( int i 0 i lt theLists.length i )
theLists i new LinkedList( )
public void makeEmpty( ) for( int i 0
i lt theLists.length i ) theLists i
.makeEmpty( )
30
public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37 hashVal
key.charAt( i ) hashVal tableSize if(
hashVal lt 0 ) hashVal tableSize return
hashVal
31
public void insert( Hashable x ) LinkedList
whichList theLists x.hash(
theLists.length ) LinkedListItr itr
whichList.find( x ) if( itr.isPastEnd( ) )
whichList.insert( x, whichList.zeroth( )
) public void remove( Hashable x )
theLists x.hash( theLists.length ) .remove( x
) public Hashable find( Hashable x )
return (Hashable)theListsx.hash(theLists.length)
. find( x ).retrieve(
)
32
public class Employee implement Hashable
public int hash( int tableSize ) return
SeparateChainingHashTable.hash(
name, tableSize ) public boolean equals(
Object rhs ) return name.equals(
((Employee)rhs).name ) private String name
private double salary private int
seniority
33
public class QuadraticProbingHashTable public
static final int DEFAULT_TABLE_SIZE 11
protected HashEntry array private int
currentSize public QuadraticProbingHashTable(
) public QuadraticProbingHashTable( int size )
public void makeEmpty( ) public Hashable find
( Hashable x) public void insert( Hashable x )
public void remove( Hashable x ) public static
int hash( String key, int tableSize )
34
class HashEntry Hashable element // the
element boolean isActive // false is
deleted public HashEntry( Hashable e )
this( e, true ) public HashEntry(
Hashable e, boolean i ) element e
isActive i
35
public class QuadraticProbingHashTable public
QuadraticProbingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
QuadraticProbingHashTable( int size )
allocateArray( size ) makeEmpty( )
public void makeEmpty( ) currentSize
0 for( int i 0 i lt array.length i )
array i null private void
allocateArray( int arraySize ) array new
HashEntry arraySize
36
public Hashable find( Hashable x ) int
currentPos findPos( x ) return isActive(
currentPos ) ? array currentPos
.element null private int findPos(
Hashable x ) int collisionNum 0 int
currentPos x.hash( array.length ) while(
array currentPos ! null !array
currentPos .element.equals( x ) )
currentPos 2 collisionNum - 1 if(
currentPos gt array.length ) currentPos -
array.length return currentPos
37
private boolean isActive( int currentPos )
return array currentPos ! null
array currentPos .isActive public void
insert( Hashable x ) int currentPos
findPos( x ) if( isActive( currentPos ) )
return array currentPos new HashEntry( x,
true ) if( currentSize gt array.length / 2 )
rehash( ) public void remove( Hashable x
) int currentPos findPos( x ) if(
isActive( currentPos ) ) array currentPos
.isActive false
38
private void rehash( ) HashEntry oldArray
array // Create a new double-sized, empty
table allocateArray( nextPrime( 2
oldArray.length ) ) currentSize 0 //
Copy table over for( int i 0 i lt
oldArray.length i ) if( oldArray i !
null oldArray i .isActive ) insert(
oldArray i .element ) return
39
private static int nextPrime( int n ) if( n
2 0 ) n for( !isPrime( n ) n 2 )
return n private static boolean isPrime(
int n ) if( n 2 n 3 ) return
true if( n 1 n 2 0 ) return
false for( int i 3 i i lt n i 2 )
if( n i 0 ) return false return true
40
Summary