Title: ROAD MAP
1DATA STRUCTURES ANDALGORITHMS
Lecture Notes 7 Prepared by Inanç TAHRALI
2REVIEW
- We have investigated the following ADTs
-
- LISTS
- Array
- Linked List
- STACKS
- QUEUE
- TREES
- Binary Trees
- Binary Search Trees
- AVL Trees
- What about their running times ?
3Running times of important operations
insertion deletion find
Array O(n) O(n) O(n)
Linked list O(1) O(n) O(n)
Tree O(log n) O(log n) O(logn)
Can we decrease the running times more ?
4ROAD MAP
- HASHING
- General Idea
- Hash Function
- Separate Chaining
- Open Adressing
- Rehashing
5Hashing
- Hashing implementation of hash tables
- hash table an array of elements
- fixed size TableSize
- Search is performed on a part of the item key
- Each key is mapped into a number
- in the range 0 to TableSize-1
- Used as array index
- Mapping by hash function
- Simple to compute
- Ensure that any two distinct keys get different
cells - How to perform insert, delete and find operations
in O(1) time ?
6An ideal hash table
- Each key is mapped to a different index !
- Not always possible
- many keys, finite indexes
- Even distribution
- Considerations
- Choose a hash function
- Decide what to do when two keys hash to the same
value - Decide on table size
7Hash function
- If keys are integers
- hash function return Key mod TableSize
- Ex TableSize 10
- Keys 120, 330, 1000
- TableSize should be prime
8Hash function
- If keys are strings
- Add ASCII values of the characters
- If TableSize is large and number of characters is
small - TableSize 10000 number of characters in a
key 8 - 12781016 lt 10000
int hash( const string key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal
keyi return hashVal tableSize
9Hash function
- If keys are strings
- Use all characters
- ? 32i Key KeySize -i -1
- Early characters does not count
- Use only some number of characters
- Use characters in odd spaces
10Hash function
- If keys are strings
- Use first three characters
- 729key2 27key1 key0
- If the keys are not random some part of the table
is not used.
int hash( const string key, int tableSize
) return ( key 0 27 key 1 729
key 2) tableSize
11 A good hash function
- int hash( const string key, int tableSize )
-
- int hashVal 0
- for( int i 0 i lt key.length( ) i )
- hashVal 37 hashVal key i
- hashVal tableSize
- if( hashVal lt 0 )
- hashVal tableSize
- return hashVal
12Collusion
- Main programming detail is collision resolution
- If when an element is inserted, it hashes to the
same value as an already inserted element, there
is collision. - There are several methods to deal with this
problem - Separate chaining
- Open addressing
13Separate Chaining Hash Table
- Keep a list of all elements that hash to the same
value - TableSize 10
- is not good
- not prime
14Type declaration for separate chaining hash table
- template ltclass HashedObjgt
- class HashTable
- public
- explicit HashTable(const HashedObj
notFound,int size 101) - HashTable( const HashTable rhs )
- ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists(
rhs.theLists ) - const HashedObj find( const HashedObj x )
const - void makeEmpty( )
- void insert( const HashedObj x )
- void remove( const HashedObj x )
- const HashTable operator( const HashTable
rhs ) - private
- vectorltListltHashedObjgt gt theLists // The
array of Lists - const HashedObj ITEM_NOT_FOUND
-
15- / Construct the hash table.
- template ltclass HashedObjgt
- HashTableltHashedObjgtHashTable( const HashedObj
notFound, int size ) - ITEM_NOT_FOUND(notFound), theLists(
nextPrime( size ) ) -
- / Make the hash table logically empty.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtmakeEmpty( )
- for( int i 0 i lt theLists.size( ) i )
- theLists i .makeEmpty( )
-
- / Deep copy.
- template ltclass HashedObjgt
- const HashTableltHashedObjgt HashTableltHashedObjgt
- operator( const HashTableltHashedObjgt rhs )
-
- if( this ! rhs )
16- / Remove item x from the hash table.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtremove( const
HashedObj x ) -
- theLists hash( x, theLists.size( ) ) .remove(
x ) -
- / Find item x in the hash table.
- template ltclass HashedObjgt
- const HashedObj HashTableltHashedObjgt
- find( const HashedObj x ) const
-
- ListItrltHashedObjgt itr
- itr theLists hash( x, theLists.size( ) )
.find( x ) - if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND
- else return itr.retrieve( )
17- / Insert item x into the hash table.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtinsert( const
HashedObj x ) -
- ListltHashedObjgt whichList theLists hash( x,
theLists.size( ) ) - ListItrltHashedObjgt itr whichList.find( x )
- if( itr.isPastEnd( ) )
- whichList.insert( x, whichList.zeroth( ) )
18Analysis
- Let ? be load factor of a hash table
- number of elements / TableSize
- ? is the avarage length of a list
- Successful Find ? ?/2 comparisons time to
evaluate hash function - Unsuccessful Find Insert ? ? comparisons time
to evaluate hash function - Good choise ? 1
Disadvantage of separate chaining is
allocate/deallocate memory !
19Open Adressing
- If collision ? try an alternate cell
- h0(x), h1(x), h2(x),
- hi(x) (hash(x) F(i)) mod TableSize
- F(0) 0
-
- ? lt 1
- Good choise lt 0.5
20Linear Probing
- F is a linear function of i
- F(i) i
- Insert keys
- 89, 18, 49, 58, 69
- When 49 is inserted collision occurs
- Put into the next available spot 0
- 58 collidates with 18, 89, 49
21Linear Probing
- Problem It is not easy to delete an element
- May have caused a collision before
- Mark the element deleted
- Problem Primary Clustering
22Linear Probing
Problem Primary Clustering
23Quadratic Probing
- F(i) is a quadratic function
- Ex F(i) i2
24Quadratic Probing
- When 49 collides with 89, next position attemped
is one cell away - 58 collides at position 8. The cell one away is
tried, another collision occurs. It is inserted
into the cell 224 away
25Quadratic Probing
- Solves primary clustering problem
- All empty cells may not be accessed
- A loop around full cells may happen
- Hash table not full but empty space not found
- Theorem If the table size is prime and ?lt0.5
new element can always be inserted. - Problem Secondary clustering!...
26Type declaration for open addressing hash table
- template ltclass HashedObjgt
- class HashTable
-
- public
- explicit HashTable(const HashedObj
notFound,int size 101) - HashTable( const HashTable rhs)
- ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array(
rhs.array ), currentSize( rhs.currentSize ) - const HashedObj find( const HashedObj x )
const - void makeEmpty( )
- void insert( const HashedObj x )
- void remove( const HashedObj x )
- const HashTable operator( const HashTable
rhs ) - enum EntryType ACTIVE, EMPTY, DELETED
27Type declaration for open addressing hash table
-
- private
- struct HashEntry
-
- HashedObj element
- EntryType info
- HashEntry( const HashedObj e HashedObj( ),
EntryType i EMPTY ) element( e ), info(i)
-
-
- vectorltHashEntrygt array
- int currentSize
- const HashedObj ITEM_NOT_FOUND
- bool isActive( int currentPos ) const
- int findPos( const HashedObj x ) const
- void rehash( )
28- / Construct the hash table.
- template ltclass HashedObjgt
- HashTableltHashedObjgt
- HashTable( const HashedObj notFound, int size )
- ITEM_NOT_FOUND( notFound ), array( nextPrime(
size ) ) -
- makeEmpty( )
-
- / Make the hash table logically empty.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtmakeEmpty( )
-
- currentSize 0
- for( int i 0 i lt array.size( ) i )
- array i .info EMPTY
29- / Find item x in the hash table.
- template ltclass HashedObjgt
- const HashedObj HashTableltHashedObjgt
- find( const HashedObj x ) const
- int currentPos findPos( x )
- if( isActive( currentPos ) )
- return array currentPos .element
- else return ITEM_NOT_FOUND
-
- / Method that performs quadratic probing
resolution. - template ltclass HashedObjgt
- int HashTableltHashedObjgtfindPos(const HashedObj
x) const - int collisionNum 0
- int currentPos hash( x, array.size( ) )
- while ( array currentPos .info ! EMPTY
- array currentPos .element ! x )
- currentPos 2 collisionNum - 1
30- / Return true if currentPos exists and is
active. - template ltclass HashedObjgt
- bool HashTableltHashedObjgtisActive( int
currentPos ) const -
- return array currentPos .info ACTIVE
-
- / Remove item x from the hash table.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtremove( const
HashedObj x ) -
- int currentPos findPos( x )
- if( isActive( currentPos ) )
- array currentPos .info DELETED
-
- / Insert routine with quadratic probing
- template ltclass HashedObjgt
- void HashTableltHashedObjgtinsert( const
HashedObj x )
31- / Deep copy.
- template ltclass HashedObjgt
- const HashTableltHashedObjgt HashTableltHashedObjgt
- operator( const HashTableltHashedObjgt rhs )
-
- if( this ! rhs )
-
- array rhs.array
- currentSize rhs.currentSize
-
- return this
32Double Hashing
- Use second hash function
- F(i) i hash2(x)
- Poor example
- hash2(x) X mod 9
- hash1(x) X mod 10
- TableSize 10
- If X 99 what happens ?
- hash2(x) ? 0 for any X
33Double Hashing
- Good choise
- hash2(x) R (X mod R)
- R is a prime and lt TableSize
-
34Double Hashing
hash2(x) 7 (X mod 7)
35Analysis
- Random collision resolution
- Probes are independent
- No clustering problem
- Unsuccessful search and Insert
- Number of probes until an empty cell is found
- (1- ?) fraction of cells that are empty
- 1 / (1- ?) expected number of probes
- Successful search
- P(X)Number of probes when the element X is
inserted - 1/N? P(X) approximately
36Rehashing
- If ? gets large, number of probes increases.
- Running time of operations starts taking too long
and insertions might fail - Solution Rehashing with larger TableSize
(usually 2) - When to rehash
- if ? gt 0.5
- if insertion fails
37Rehashing Example
- Elements 13, 15, 24 and 6 is inserted into an
open addressing hash table of size 7 - H(X) X mod 7
- Linear probing is used to resolve collisions
38Rehashing Example
- If 23 is inserted, the table is over 70 percent
full.
?
A new table is created 17 is the first
prime twice as large as the old one so Hnew
(X) X mod 17
39Rehashing
- Rehashing is an expensive operation
- Running time is O(N)
- Rehashing frees the programmer from worrying
about table size - Amortized Analysis Average over N operations
- Operations take O(1) time
40- / Insert routine with quadratic probing
- template ltclass HashedObjgt
- void HashTableltHashedObjgtinsert( const
HashedObj x ) - int currentPos findPos( x )
- if( isActive( currentPos ) ) return
-
- array currentPos HashEntry( x, ACTIVE )
- if( currentSize gt array.size( ) / 2 )
- rehash( )
-
- / Expand the hash table.
- template ltclass HashedObjgt
- void HashTableltHashedObjgtrehash( )
- vectorltHashEntrygt oldArray array
- array.resize( nextPrime( 2 oldArray.size( ) )
) - for( int j 0 j lt array.size( ) j )
- array j .info EMPTY