CS223 Advanced Data Structures - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

CS223 Advanced Data Structures

Description:

Quadratic Probing, however, does not require the use of a second hash function ... Middle-of-the road strategy: rehash when the tables reaches a certain load factor ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 36
Provided by: holge
Category:

less

Transcript and Presenter's Notes

Title: CS223 Advanced Data Structures


1
CS223 Advanced Data Structures
  • Dr. Wenzhan Song
  • Assistant Professor, Computer Science

2
Chapter 5Hashing
An ideal hash table
3
  • int hash( const string key, int tableSize )
  • int hashVal 0
  • for( int i 0 i lt key.length( ) i )
  • hashVal key i
  • return hashVal tableSize
  • int hash( const string key, int tableSize )
  • return ( key 0 27 key 1 729
    key 2 ) tableSize
  • /
  • A hash routine for string objects.
  • /

Fig. 5.2. A simple hash function
Fig. 5.3 Another possible hash function not too
good
  • Fig. 5.4 A good hash function
  • Not necessarily the best respect to table
    distribution
  • But extremely simple and reasonably fast
  • Typically implementation may choose some chars
    (e.g., odd space) to calculate hash

4
Separate Chaining Hashing
5
  • template lttypename HashedObjgt
  • class HashTable
  • public
  • explicit HashTable( int size 101 )
  • bool contains( const HashedObj x ) const
  • void makeEmpty( )
  • void insert( const HashedObj x )
  • void remove( const HashedObj x )
  • private
  • vectorltlistltHashedObjgt gt theLists // The
    array of Lists
  • int currentSize
  • void rehash( )
  • int myhash( const HashedObj x ) const

Fig. 5.6 Type declaration for separate chaining
hash table
6
  • int myhash( const HashedObj x ) const
  • int hashVal hash( x )
  • hashVal theLists.size( )
  • if( hashVal lt 0 )
  • hashVal theLists.size( )
  • return hashVal

Fig. 5.7 myHash member function for hash tables
7
  • // Example of an Employee class
  • class Employee
  • public
  • const string getName( ) const
  • return name
  • bool operator( const Employee rhs ) const
  • return getName( ) rhs.getName( )
  • bool operator!( const Employee rhs ) const
  • return !( this rhs
  • // Additional public members not shown
  • private
  • string name
  • double salary
  • int seniority

Fig. 5.8 Example of a class that can be used as a
HashObj
8
  • void makeEmpty( )
  • for( int i 0 i lt theLists.size( ) i
    )
  • theLists i .clear( )
  • bool contains( const HashedObj x ) const
  • const listltHashedObjgt whichList
    theLists myhash( x )
  • return find( whichList.begin( ),
    whichList.end( ), x ) ! whichList.end( )
  • bool remove( const HashedObj x )
  • listltHashedObjgt whichList theLists
    myhash( x )
  • listltHashedObjgtiterator itr find(
    whichList.begin( ), whichList.end( ), x )
  • if( itr whichList.end( ) )
  • return false

Fig. 5.9 makeEmpty, contains and remove routines
for separate chaining hash table
9
  • bool insert( const HashedObj x )
  • listltHashedObjgt whichList theLists
    myhash( x )
  • if( find( whichList.begin( ),
    whichList.end( ), x ) ! whichList.end( ) )
  • return false
  • whichList.push_back( x )
  • // Rehash see Section 5.5
  • if( currentSize gt theLists.size( ) )
  • rehash( )
  • return true

Fig. 5.10 insert routine for separate chaining
hash table
10
Analysis of Separate Chaining
  • Load factor r the ratio of the number of
    elements in the hash table to the table size
  • The average length of list is r
  • Search cost
  • Unsuccessful search visit r nodes in average
    successful search traverse 1r/2 links in
    average
  • Conclusion the table size is not really
    important, but the load factor r is. The general
    rule for separate chaining hashing is to make the
    table size about as large as the number of
    elements expected. In other words, let r 1.

11
Hash Tables without Linked Lists
  • Linear Probing
  • Quadratic Probing
  • Double Hashing

12
Linear probing
f(i) i
Hash table with linear probing, after each
insertion
13
Linear probing
Primary clustering any key hash into the cluster
will require several attempts to resolve the
collision
Number of probes plotted against load factor for
linear probing (dashed) and random strategy
(solid). S successful search, U - unsuccessful
search, I - insertion
14
Quadratic probing
f(i) i2 a collision resolution method to
eliminate the primary clustering problem of
linear probing. Notice, also f(i)f(i-1)2i-1
Hash table with quadratic probing, after each
insertion
15
Quadratic probing
  • Theorem 5.1.
  • If quadratic probing is used, and the table
    size is prime, then a new element can always be
    inserted if the table is at least half empty.
  • Proof
  • Here 0 lt I, j lt TableSize/2. Suppose, for
    the sake of contradiction, that the probing
    locations are the same, but i ! j. Then
  • h(x)i2h(x)j2 (mod TableSize)
  • i2 j2 (mod TableSize)
  • (i-j)(ij) 0 (mod TableSize)
  • This is impossible. Contradiction induced.
  • It is crucial that the table size be prime. If it
    is not prime, the number of alternative locations
    can be severely reduced.
  • For example, if TableSize16, the only
    alternative location is 1,4,9
  • Secondary clustering elements that hash to same
    position will probe the same alternative cells.

16
Double hashing
f(i) ihash2(x)
Hash table with double hashing, after each
insertion
17
Double hashing
  • One popular choice is f(i) ihash2(x)
  • Probe at distance hash2(x), 2hash2(x),
  • Choose hash2(x) such that it is never 0
  • For example, hash2(x) x mod 9 is not good,
    because hash2(99)0
  • hash2(x) R (x mod R), with R a prime smaller
    than TableSize, will work well
  • TableSize must be prime number
  • In previous example, imagine insert 23 into
    table
  • hash2(23) 7-2 5
  • 1st try probe 5th slot away -gt collide with 58
  • 2nd try probe 10th slot (e.g., 0th) away, same
    as current location
  • Hence, only one alternative location is possible

18
Hash Tables without Linked Lists
  • Standard deletion can not be performed in a
    probing hash table, because the cell might have
    caused a collision to go past it.
  • Solution set flag to ACTIVE, EMPTY, DELETED
  • Fig 5.14, 5.15, 5.16, 5.17

19
  • template lttypename HashedObjgt
  • class HashTable
  • public
  • explicit HashTable( int size 101 )
  • bool contains( const HashedObj x ) const
  • void makeEmpty( )
  • bool insert( const HashedObj x )
  • bool remove( const HashedObj x )
  • enum EntryType ACTIVE, EMPTY, DELETED
  • private
  • struct HashEntry
  • HashedObj element
  • EntryType info

Fig. 5.14 Class interface for hash tables using
probing strategies, including the nested
HashEntry class
20
  • explicit HashTable( int size 101 ) array(
    nextPrime( size ) )
  • makeEmpty( )
  • void makeEmpty( )
  • currentSize 0
  • for( int i 0 i lt array.size( ) i )
  • array i .info EMPTY

Fig. 5.15 Routines to initialize quadratic
probing hash table
21
  • bool contains( const HashedObj x ) const
  • return isActive( findPos( x ) )
  • int findPos( const HashedObj x ) const
  • int offset 1
  • int currentPos myhash( x )
  • while( array currentPos .info ! EMPTY
  • array currentPos .element ! x
    )
  • currentPos offset // Compute ith
    probe
  • offset 2
  • if( currentPos gt array.size( ) )
  • currentPos - array.size( )
  • return currentPos

Fig. 5.16 Contains routine for hashing with
quadratic probing
22
  • bool insert( const HashedObj x )
  • // Insert x as active
  • int currentPos findPos( x )
  • if( isActive( currentPos ) )
  • return false
  • array currentPos HashEntry( x,
    ACTIVE )
  • // Rehash see Section 5.5
  • if( currentSize gt array.size( ) / 2 )
  • rehash( )
  • return true
  • bool remove( const HashedObj x )
  • int currentPos findPos( x )

Fig. 5.17 insert and remove routines for hash
tables with quadratic probing
23
Hash Tables without Linked Lists
  • If double hashing is correctly implemented,
    simulations imply that the expected number of
    probes is almost the same as for a random
    collision resolution strategy.
  • Quadratic Probing, however, does not require the
    use of a second hash function and is thus likely
    simpler and faster in practice.

24
Rehashing
  • Motivation
  • Insertion might fail with those probing method
    after the load factor r above a threshold, then
    HashTable shall be enlarged at least twice.
  • With quadratic probing, three strategies
  • Rehash as soon as the table is half full
  • Rehash only when an insertion fails
  • Middle-of-theroad strategy rehash when the
    tables reaches a certain load factor
  • With a good cutoff, it could be the best

25
Rehashing
Hash table with linear probing with input
13,15,6,24 h(x) x mod 7
26
Rehashing
Hash table with linear probing after 23 is
inserted
27
Rehashing
  • New hash table after rehashing
  • Scan previous table and add number sequentially
    6 15 23 24 12
  • In the left figure, enlarge hash table from 7 to
    17 (because 17 is the next prime at least twice
    of 7), and use new hash function h(x) x mod 17

28
Rehashing Implementation
  • /
  • Rehashing for quadratic probing hash
    table.
  • /
  • void rehash( )
  • vectorltHashEntrygt oldArray array
  • // Create new double-sized, empty
    table
  • array.resize( nextPrime( 2
    oldArray.size( ) ) )
  • for( int j 0 j lt array.size( ) j )
  • array j .info EMPTY
  • // Copy table over
  • currentSize 0
  • for( int i 0 i lt oldArray.size( ) i
    )
  • if( oldArray i .info ACTIVE )
  • insert( oldArray i .element )

Fig. 5.22
29
Rehashing Implementation
  • /
  • Rehashing for separate chaining hash
    table.
  • /
  • void rehash( )
  • vectorltlistltHashedObjgt gt oldLists
    theLists
  • // Create new double-sized, empty
    table
  • theLists.resize( nextPrime( 2
    theLists.size( ) ) )
  • for( int j 0 j lt theLists.size( ) j
    )
  • theLists j .clear( )
  • // Copy table over
  • currentSize 0
  • for( int i 0 i lt oldLists.size( ) i
    )
  • listltHashedObjgtiterator itr
    oldLists i .begin( )
  • while( itr ! oldLists i .end( ) )
  • insert( itr )

Fig. 5.22
30
Hash tables in the Standard Library
  • Hash_set http//msdn.microsoft.com/en-us/library/
    bksash1t(VS.80).aspx
  • Hash_map http//msdn.microsoft.com/en-us/library/
    6x7w9f6z(VS.80).aspx
  • Compare it with your AVL set implementation ...

31
Extensible Hashing
32
Extensible Hashing
33
Extensible Hashing
34
Summary
  • Implement insert and contains operations in
    constant average time
  • Load factor is important for eficiency
  • Compare to Binary search tree
  • Insert and contains (e.g., isElementOf) BST is
    better
  • But the input is sorted, BST could be expensive
    AVL and Splay tree need expensive operations to
    balance, then hashing is a better choice
  • Applications
  • Compilers use hash table to keep track of
    declared variables in source code symbol table
  • Any graph theory problem where the nodes have
    real names instead of numbers
  • Programs that play games transposition table
  • Online spelling checks

35
Summary (continued)
  • Separate chaining hashing requires the use of
    links, which costs some memory, and the standard
    method of implementing calls on memory allocation
    routines, which typically are expensive.
  • Linear probing is easily implemented, but
    performance degrades severely as the load factor
    increases because of primary clustering.
  • Quadratic probing is only slightly more difficult
    to implement and gives good performance in
    practice. An insertion can fail if the table is
    half empty, but this is not likely. Even if it
    were, such an insertion would be so expensive
    that it wouldnt matter and would almost
    certainly point up a weakness in the hash
    function.
  • Double hashing eliminates primary and secondary
    clustering, but the computation of a second hash
    function can be costly.
  • Gonnet and Baeza-Yates compare several hashing
    strategies their results suggest that quadratic
    probing is the fastest method.
Write a Comment
User Comments (0)
About PowerShow.com