Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

Hashing

Description:

Problem: store and retrieving an item using its key (for example, ID number, name) ... Applications: symbol table. on-line spelling checker ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 42
Provided by: veeramu
Category:
Tags: checker | hashing | spell

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
Problem store and retrieving an item using its
key (for example, ID number, name)
  • Linked List
  • takes O(N) time
  • Binary Search Tree
  • take O(logN) time
  • Array List
  • take O(1) time

2
Array
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Problem a lot of empty space
3
Hashing
ID 4112041 Name Somsri Faculty Science
ID 4163490 Name Sompong Faculty Engineering
Map the key into some number between 0 to
ArraySize-1
4
Hashing
  • Map the key into an array position using a hash
    function
  • ArrayIndex hash(key)
  • Take O(1) time to access an item
  • Much less empty space than using normal array

5
Hash Function
  • Must return a valid array index.
  • Should be 1-to-1 mapping.
  • If key1 ! key2 then hash(key1) ! hash(key2)
  • A collision occurs when two distinct keys hash to
    the same location in the array
  • Should distribute the keys evenly
  • Any key value k is equally likely to hash to any
    of the m array locations.

6
Simple Hash Function
  • ArrayIndex key mod TableSize
  • Example
  • 4112041 -gt 12041 mod 1000 -gt 41
  • 4163490 -gt 63490 mod 1000 -gt 490
  • TableSize should be a prime number for even
    distribution

7
Another Hash Function
  • ArrayIndex (k0 37k1 372k2 . . . ) mod
    TableSize
  • Example 3-character key
  • ArrayIndex (k0 37k1 372k2) mod TableSize
  • ArrayIndex k0 37 (k1 37 (k2)) mod
    TableSize

8
Hash Function
public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37
hashVal key.charAt( i ) hashVal
tableSize if ( hashVal lt 0 ) // overflow
hashVal tableSize return hashVal
9
Collision
  • When an element is inserted, if it hashes to the
    same value as an already inserted element, then
    we have a collision.
  • Collision resolving techniques
  • Separate Chaining
  • Open Addressing
  • Linear Probling, Quadratic Probling, Double
    Hashing

10
Separate Chaining
11
Separate Chaining
  • Load factor l number of elements / table size
  • average length of list l
  • successful search cost 1 (l/2) link traversals
  • cost depends on l

12
Separate Chaining evenly distributed
13
Separate Chaining last digit is zero
Solution TableSize is prime
14
Open Addressing
  • No linked-list. All items are in the array
  • If a collision occurs, alternative locations are
    tried until an empty cell is found
  • try h0(x), h1(x), h2(x),
  • hi(x) (hash(x) f(i)) mod TableSize
  • f(i) is a collision resolution strategy
  • Require bigger table, l should be below 0.5

15
Linear Probing
  • If a collision occurs, try the next cell
    sequentially
  • f(i) i
  • hi(x) (hash(x) i) mod TableSize
  • Try hash(x) mod TableSize, (hash(x) 1) mod
    TableSize, (hash(x) 2) mod TableSize, (hash(x)
    3) mod TableSize, . . .

16
Linear Probing
Insert 89, 18, 49, 58, 69
89 is directly inserted into cell 9 18 is
directly inserted into cell 8 49 has a collision
at cell 9 and finally put into cell 0 58 has
collisions at cell 8, 9, 0 and finally put into
cell 1 69 has a collisions at cell 9, 0, 1 and
finally put into cell 2
17
Primary Clustering
  • Forming of blocks of occupied cells (called
    clusters)
  • A collision occurs if a key is hashed into
    anywhere in a cluster. Then there may be several
    attempts to resolve the collision before a free
    space is found. The new data is added into the
    cluster.

18
Linear Probing
  • Problem Primary Clustering
  • Normal deletion cannot be performed (some
    following find operations will fail because the
    link of collisions that leads to the data is cut)
    Use lazy deletion
  • Insertion cost number of probes to find an
    empty cell
  • 1/(fraction of
    empty cells)
  • 1/(1- l)

19
Quadratic Probing
  • Eliminate primary clustering
  • f(i) i2
  • hi(x) (hash(x) i2) mod TableSize
  • Try hash(x) mod TableSize, hash(x)12 mod
    TableSize,
  • hash(x)22 mod TableSize, hash(x)32 mod
    TableSize, . . .
  • Table must be at most half full and table size
    must be prime, otherwise insertion may fail
    (always have a collision)

20
Quadratic Probing
Insert 89, 18, 49, 58, 69
Insert 89, try cell 9 Insert 18, try cell
8 Insert 49, try cell 9, 0 Insert 58, try cell 8,
9, 2 Insert 69, try cell 9, 0, 3
21
Quadratic Probing
Insert 10, 20, 30, 40, 50, 60, 70
Insert 10, try cell 0 Insert 20, try cell 0,
1 Insert 30, try cell 0, 1, 4 Insert 40, try cell
0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6
(16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5
(25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5
(25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1
(121), 4 (144), 9 (169), 6 (196), . . .
20
30
60
50
22
Quadratic Probing
  • Secondary clustering
  • elements that hash to the same position will
    probe the same alternative cells and put into the
    next available space, forming a cluster.
  • In the first example, inserting 89, 49, 69 forms
    a secondary cluster. Inserting 18, 58 forms
    another secondary cluster.

23
Double Hashing
  • f(i) i hash2(x)
  • hi(x) (hash(x) i hash2(x)) mod TableSize
  • Try hash(x) mod TableSize, (hash(x) hash2(x))
    mod TableSize,
  • (hash(x) 2hash2(x)) mod TableSize, . . .
  • Example hash2(x) R - (x mod R)
  • R is a prime number smaller than TableSize

24
Double Hashing
Insert 89, 18, 49, 58, 69, 23
hash2(49) 7-(49 mod 7) 7 hash2(58) 7-(58
mod 7) 5 hash2(69) 7-(69 mod 7) 1 hash2(23)
7-(23 mod 7) 5 Insert 49, try 9, (97) mod 10
6 Insert 58, try 8, (85) mod 10 3 Insert 69,
try 9, (91) mod 10 0 Insert 23, try 3, (3 5)
mod 10 8, (3 10) mod 10 3, (315) mod 10
8, . . .
25
Rehashing
  • When the table is too full, create a new table at
    least twice as big (and size is prime), compute
    the new hash value of each element, insert it
    into the new table.
  • Rehash when the table is half full, or when an
    insertion fails, or when a certain load factor is
    reached.
  • Because of lazy deletion, deleted cells are also
    counted when the load factor is calculated.
  • Rehashing time is O(N). But the cost is shared
    by preceding N/2 insertions. So, it adds
    constant cost to each insertion.

26
public interface Hashable int hash( int
tableSize ) public class MyInteger
implements Comparable, Hashable public int
hash( int tableSize ) if ( value lt 0 )
return -value tableSize else
return value tableSize
27
public static void main( String args )
SeparateChainingHashTable H new
SeparateChainingHashTable( ) final int NUMS
4000 final int GAP 37 for( int i
GAP i ! 0 i ( i GAP ) NUMS )
H.insert( new MyInteger( i ) ) for( int i
1 i lt NUMS i 2 ) H.remove( new MyInteger(
i ) ) for( int i 2 i lt NUMS i2 )
if( ((MyInteger)(H.find( new MyInteger( i ) ))).
intValue( ) ! i )
System.out.println( "Find fails " i )
28
public class SeparateChainingHashTable
private LinkedList theLists public
SeparateChainingHashTable( ) public
SeparateChainingHashTable( int size ) public
void insert( Hashable x ) public void remove(
Hashable x ) public void find( Hashable x )
public void makeEmpty( ) public static int
hash( String key, int tableSize ) private
static final int DEFAULT_TABLE_SIZE 101
private static int nextPrime( int n ) private
static boolean isPrime( int n )
29
public class SeparateChainingHashTable
public SeparateChainingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
SeparateChainingHashTable( int size )
theLists new LinkedList nextPrime( size )
for( int i 0 i lt theLists.length i )
theLists i new LinkedList( )
public void makeEmpty( ) for( int i 0
i lt theLists.length i ) theLists i
.makeEmpty( )
30
public static int hash( String key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal 37 hashVal
key.charAt( i ) hashVal tableSize if(
hashVal lt 0 ) hashVal tableSize return
hashVal
31
public void insert( Hashable x ) LinkedList
whichList theLists x.hash(
theLists.length ) LinkedListItr itr
whichList.find( x ) if( itr.isPastEnd( ) )
whichList.insert( x, whichList.zeroth( )
) public void remove( Hashable x )
theLists x.hash( theLists.length ) .remove( x
) public Hashable find( Hashable x )
return (Hashable)theListsx.hash(theLists.length)
. find( x ).retrieve(
)
32
public class Employee implement Hashable
public int hash( int tableSize ) return
SeparateChainingHashTable.hash(
name, tableSize ) public boolean equals(
Object rhs ) return name.equals(
((Employee)rhs).name ) private String name
private double salary private int
seniority
33
public class QuadraticProbingHashTable public
static final int DEFAULT_TABLE_SIZE 11
protected HashEntry array private int
currentSize public QuadraticProbingHashTable(
) public QuadraticProbingHashTable( int size )
public void makeEmpty( ) public Hashable find
( Hashable x) public void insert( Hashable x )
public void remove( Hashable x ) public static
int hash( String key, int tableSize )
34
class HashEntry Hashable element // the
element boolean isActive // false is
deleted public HashEntry( Hashable e )
this( e, true ) public HashEntry(
Hashable e, boolean i ) element e
isActive i
35
public class QuadraticProbingHashTable public
QuadraticProbingHashTable( ) this(
DEFAULT_TABLE_SIZE ) public
QuadraticProbingHashTable( int size )
allocateArray( size ) makeEmpty( )
public void makeEmpty( ) currentSize
0 for( int i 0 i lt array.length i )
array i null private void
allocateArray( int arraySize ) array new
HashEntry arraySize
36
public Hashable find( Hashable x ) int
currentPos findPos( x ) return isActive(
currentPos ) ? array currentPos
.element null private int findPos(
Hashable x ) int collisionNum 0 int
currentPos x.hash( array.length ) while(
array currentPos ! null !array
currentPos .element.equals( x ) )
currentPos 2 collisionNum - 1 if(
currentPos gt array.length ) currentPos -
array.length return currentPos
37
private boolean isActive( int currentPos )
return array currentPos ! null
array currentPos .isActive public void
insert( Hashable x ) int currentPos
findPos( x ) if( isActive( currentPos ) )
return array currentPos new HashEntry( x,
true ) if( currentSize gt array.length / 2 )
rehash( ) public void remove( Hashable x
) int currentPos findPos( x ) if(
isActive( currentPos ) ) array currentPos
.isActive false
38
private void rehash( ) HashEntry oldArray
array // Create a new double-sized, empty
table allocateArray( nextPrime( 2
oldArray.length ) ) currentSize 0 //
Copy table over for( int i 0 i lt
oldArray.length i ) if( oldArray i !
null oldArray i .isActive ) insert(
oldArray i .element ) return
39
private static int nextPrime( int n ) if( n
2 0 ) n for( !isPrime( n ) n 2 )
return n private static boolean isPrime(
int n ) if( n 2 n 3 ) return
true if( n 1 n 2 0 ) return
false for( int i 3 i i lt n i 2 )
if( n i 0 ) return false return true
40
Summary
  • insert and find take constant average time
  • load factor affects performance
  • load factor of separate chaining hashing should
    be close to 1
  • load factor of open addressing hashing should not
    exceed 0.5

41
Summary
  • Hashing is good when ordering information is not
    required
  • Applications
  • symbol table
  • on-line spelling checker
Write a Comment
User Comments (0)
About PowerShow.com