Title: Hashing
1Hashing
Problem store and retrieving an item using its
key (for example, ID number, name)
- Linked List
- takes O(N) time
- Binary Search Tree
- take O(logN) time
- Array List
- take O(1) time
2Array
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Problem a lot of empty space
3Hashing
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Map the key into some number between 0 to
ArraySize-1
4Hashing
- Map the key into an array position using a hash
function - ArrayIndex hash(key)
- Take O(1) time to access an item
- Much less empty space than using normal array
5Hash Function
- Must return a valid array index.
- Should be 1-to-1 mapping.
- If key1 ! key2 then hash(key1) ! hash(key2)
- A collision occurs when two distinct keys hash to
the same location in the array - Should distribute the keys evenly
- Any key value k is equally likely to hash to any
of the m array locations.
6Simple Hash Function
- ArrayIndex key mod TableSize
- Example
- 4112041 -gt 12041 mod 1000 -gt 41
- 4163490 -gt 63490 mod 1000 -gt 490
- TableSize should be a prime number for even
distribution
7Another Hash Function
- ArrayIndex (k0 37k1 372k2 . . . ) mod
TableSize - Example 3-character key
- ArrayIndex (k0 37k1 372k2) mod TableSize
- ArrayIndex k0 37 (k1 37 (k2)) mod
TableSize
8Hash Function
public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37
hashVal key.charAt( i ) hashVal
tableSize if ( hashVal lt 0 ) // overflow
hashVal tableSize return hashVal
9Collision
- When an element is inserted, if it hashes to the
same value as an already inserted element, then
we have a collision. - Collision resolving techniques
- Separate Chaining
- Open Addressing
- Linear Probling, Quadratic Probling, Double
Hashing
10Separate Chaining
11Separate Chaining
- Load factor l number of elements / table size
- average length of list l
- successful search cost 1 (l/2) link traversals
- cost depends on l
12Separate Chaining evenly distributed
13Separate Chaining last digit is zero
Solution TableSize is prime
14Open Addressing
- No linked-list. All items are in the array
- If a collision occurs, alternative locations are
tried until an empty cell is found - try h0(x), h1(x), h2(x),
- hi(x) (hash(x) f(i)) mod TableSize
- f(i) is a collision resolution strategy
- Require bigger table, l should be below 0.5
15Linear Probing
- If a collision occurs, try the next cell
sequentially - f(i) i
- hi(x) (hash(x) i) mod TableSize
- Try hash(x) mod TableSize, (hash(x) 1) mod
TableSize, (hash(x) 2) mod TableSize, (hash(x)
3) mod TableSize, . . .
16Linear Probing
Insert 89, 18, 49, 58, 69
89 is directly inserted into cell 9 18 is
directly inserted into cell 8 49 has a collision
at cell 9 and finally put into cell 0 58 has
collisions at cell 8, 9, 0 and finally put into
cell 1 69 has a collisions at cell 9, 0, 1 and
finally put into cell 2
17Primary Clustering
- Forming of blocks of occupied cells (called
clusters) - A collision occurs if a key is hashed into
anywhere in a cluster. Then there may be several
attempts to resolve the collision before a free
space is found. The new data is added into the
cluster.
18Linear Probing
- Problem Primary Clustering
- Normal deletion cannot be performed (some
following find operations will fail because the
link of collisions that leads to the data is cut)
Use lazy deletion - Insertion cost number of probes to find an
empty cell - 1/(fraction of
empty cells) - 1/(1- l)
19Quadratic Probing
- Eliminate primary clustering
- f(i) i2
- hi(x) (hash(x) i2) mod TableSize
- Try hash(x) mod TableSize, hash(x)12 mod
TableSize, - hash(x)22 mod TableSize, hash(x)32 mod
TableSize, . . . - Table must be at most half full and table size
must be prime, otherwise insertion may fail
(always have a collision)
20Quadratic Probing
Insert 89, 18, 49, 58, 69
Insert 89, try cell 9 Insert 18, try cell
8 Insert 49, try cell 9, 0 Insert 58, try cell 8,
9, 2 Insert 69, try cell 9, 0, 3
21Quadratic Probing
Insert 10, 20, 30, 40, 50, 60, 70
Insert 10, try cell 0 Insert 20, try cell 0,
1 Insert 30, try cell 0, 1, 4 Insert 40, try cell
0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6
(16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5
(25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5
(25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1
(121), 4 (144), 9 (169), 6 (196), . . .
20
30
60
50
22Quadratic Probing
- Secondary clustering
- elements that hash to the same position will
probe the same alternative cells and put into the
next available space, forming a cluster. - In the first example, inserting 89, 49, 69 forms
a secondary cluster. Inserting 18, 58 forms
another secondary cluster.
23Double Hashing
- f(i) i hash2(x)
- hi(x) (hash(x) i hash2(x)) mod TableSize
- Try hash(x) mod TableSize, (hash(x) hash2(x))
mod TableSize, - (hash(x) 2hash2(x)) mod TableSize, . . .
- Example hash2(x) R - (x mod R)
- R is a prime number smaller than TableSize
24Double Hashing
Insert 89, 18, 49, 58, 69, 23
hash2(49) 7-(49 mod 7) 7 hash2(58) 7-(58
mod 7) 5 hash2(69) 7-(69 mod 7) 1 hash2(23)
7-(23 mod 7) 5 Insert 49, try 9, (97) mod 10
6 Insert 58, try 8, (85) mod 10 3 Insert 69,
try 9, (91) mod 10 0 Insert 23, try 3, (3 5)
mod 10 8, (3 10) mod 10 3, (315) mod 10
8, . . .
25Rehashing
- When the table is too full, create a new table at
least twice as big (and size is prime), compute
the new hash value of each element, insert it
into the new table. - Rehash when the table is half full, or when an
insertion fails, or when a certain load factor is
reached. - Because of lazy deletion, deleted cells are also
counted when the load factor is calculated. - Rehashing time is O(N). But the cost is shared
by preceding N/2 insertions. So, it adds
constant cost to each insertion.
26public interface Hashable int hash( int
tableSize ) public class MyInteger
implements Comparable, Hashable public int
hash( int tableSize ) if ( value lt 0 )
return -value tableSize else
return value tableSize
27 public static void main( String args )
SeparateChainingHashTable H new
SeparateChainingHashTable( ) final int NUMS
4000 final int GAP 37 for( int i
GAP i ! 0 i ( i GAP ) NUMS )
H.insert( new MyInteger( i ) ) for( int i
1 i lt NUMS i 2 ) H.remove( new MyInteger(
i ) ) for( int i 2 i lt NUMS i2 )
if( ((MyInteger)(H.find( new MyInteger( i ) ))).
intValue( ) ! i )
System.out.println( "Find fails " i )
28public class SeparateChainingHashTable
private LinkedList theLists public
SeparateChainingHashTable( ) public
SeparateChainingHashTable( int size ) public
void insert( Hashable x ) public void remove(
Hashable x ) public void find( Hashable x )
public void makeEmpty( ) public static int
hash( String key, int tableSize ) private
static final int DEFAULT_TABLE_SIZE 101
private static int nextPrime( int n ) private
static boolean isPrime( int n )
29 public class SeparateChainingHashTable
public SeparateChainingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
SeparateChainingHashTable( int size )
theLists new LinkedList nextPrime( size )
for( int i 0 i lt theLists.length i )
theLists i new LinkedList( )
public void makeEmpty( ) for( int i 0
i lt theLists.length i ) theLists i
.makeEmpty( )
30public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37 hashVal
key.charAt( i ) hashVal tableSize if(
hashVal lt 0 ) hashVal tableSize return
hashVal
31 public void insert( Hashable x ) LinkedList
whichList theLists x.hash(
theLists.length ) LinkedListItr itr
whichList.find( x ) if( itr.isPastEnd( ) )
whichList.insert( x, whichList.zeroth( )
) public void remove( Hashable x )
theLists x.hash( theLists.length ) .remove( x
) public Hashable find( Hashable x )
return (Hashable)theListsx.hash(theLists.length)
. find( x ).retrieve(
)
32public class Employee implement Hashable
public int hash( int tableSize ) return
SeparateChainingHashTable.hash(
name, tableSize ) public boolean equals(
Object rhs ) return name.equals(
((Employee)rhs).name ) private String name
private double salary private int
seniority
33public class QuadraticProbingHashTable public
static final int DEFAULT_TABLE_SIZE 11
protected HashEntry array private int
currentSize public QuadraticProbingHashTable(
) public QuadraticProbingHashTable( int size )
public void makeEmpty( ) public Hashable find
( Hashable x) public void insert( Hashable x )
public void remove( Hashable x ) public static
int hash( String key, int tableSize )
34 class HashEntry Hashable element // the
element boolean isActive // false is
deleted public HashEntry( Hashable e )
this( e, true ) public HashEntry(
Hashable e, boolean i ) element e
isActive i
35public class QuadraticProbingHashTable public
QuadraticProbingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
QuadraticProbingHashTable( int size )
allocateArray( size ) makeEmpty( )
public void makeEmpty( ) currentSize
0 for( int i 0 i lt array.length i )
array i null private void
allocateArray( int arraySize ) array new
HashEntry arraySize
36public Hashable find( Hashable x ) int
currentPos findPos( x ) return isActive(
currentPos ) ? array currentPos
.element null private int findPos(
Hashable x ) int collisionNum 0 int
currentPos x.hash( array.length ) while(
array currentPos ! null !array
currentPos .element.equals( x ) )
currentPos 2 collisionNum - 1 if(
currentPos gt array.length ) currentPos -
array.length return currentPos
37private boolean isActive( int currentPos )
return array currentPos ! null
array currentPos .isActive public void
insert( Hashable x ) int currentPos
findPos( x ) if( isActive( currentPos ) )
return array currentPos new HashEntry( x,
true ) if( currentSize gt array.length / 2 )
rehash( ) public void remove( Hashable x
) int currentPos findPos( x ) if(
isActive( currentPos ) ) array currentPos
.isActive false
38private void rehash( ) HashEntry oldArray
array // Create a new double-sized, empty
table allocateArray( nextPrime( 2
oldArray.length ) ) currentSize 0 //
Copy table over for( int i 0 i lt
oldArray.length i ) if( oldArray i !
null oldArray i .isActive ) insert(
oldArray i .element ) return
39private static int nextPrime( int n ) if( n
2 0 ) n for( !isPrime( n ) n 2 )
return n private static boolean isPrime(
int n ) if( n 2 n 3 ) return
true if( n 1 n 2 0 ) return
false for( int i 3 i i lt n i 2 )
if( n i 0 ) return false return true
40Summary
- insert and find take constant average time
- load factor affects performance
- load factor of separate chaining hashing should
be close to 1 - load factor of open addressing hashing should not
exceed 0.5
41Summary
- Hashing is good when ordering information is not
required - Applications
- symbol table
- on-line spelling checker