Title: Hashing as a Dictionary Implementation
1Hashing as a Dictionary Implementation
2Chapter Contents
- What is Hashing?
- Hash Functions
- Computing Hash Codes
- Compressing a Hash Code into a Hash Table Index
- Resolving Collisions
- Open Addressing with Linear Probing
- Open Addressing with Quadratic Probing
- Open Addressing with Double Hashing
- A Potential Problem with Open Addressing
- Separate Chaining
3Chapter Contents (ctd.)
- Efficiency
- Load Factor
- Cost of Open Addressing
- Cost of Separate Chaining
- Rehashing
- Comparing Schemes for Collision Resolution
- A Dictionary Implementation that Uses Hashing
- Entries in the Hash Table
- Data Fields and Constructors
- The Methods getValue, remove, and addIterators
- Java Class Library the Class HashMap
4What is Hashing?
- A technique that determines an index for storage
of an item in a data structure - The hash function receives the search key
- Returns the index of an element in an array
called the hash table - The index is known as the hash index
- A perfect hash function maps each search key into
a different integer suitable as an index to the
hash table
5What is Hashing?
Assume a 911 system maps phone numbers to
street addresses Mapping phone number (key) to
index (hash) requires knowledge of domain of keys
Fig. 19-1 A hash function indexes its hash table.
6What is Hashing?
- Two steps of the hash function
- Convert the search key into an integer called the
hash code - Compress the hash code into the range of indices
for the hash table - Typical hash functions are not perfect
- They can allow more than one search key to map
into a single index - This is known as a collision
- Alternative is sparse table, which wastes memory
(i.e., Hash Tables are a tradeoff)
7What is Hashing?
Suppose table size 101 hash code is last
four digits of 1214 101 52 8132 101 52
Fig. 19-2 A collision caused by the hash function
h
8Hash Functions
- General characteristics of a good hash function
- Minimize collisions
- Distribute entries uniformly throughout the hash
table - Compute quickly
9Computing Hash Codes
- We will override the hashCode method of Object
- Guidelines
- If a class overrides the method equals, it should
override hashCode - If the method equals considers two objects equal,
hashCode must return the same value for both
objects - If an object invokes hashCode more than once
during execution of program on the same data, it
must return the same hash code - An object's hash code during one execution of a
program can differ from its hash code during
another execution of the same program
10Computing Hash Codes
- The hash code for a string, s
- Hash code for a primitive type
- Use the primitive typed key itself
- Manipulate internal binary representations
- Use folding (XOR of left/right for 64 bits)
int hash 0int n s.length()for (int i 0
i
is a positive constant
11Compressing a Hash Code
- Must compress the hash code so it fits into the
index range - Typical method for a code c is to compute c
modulo n - n is a prime number (and the size of the table)
- Index will then be between 0 and n 1
private int getHashIndex(Object key) int
hashIndex key.hashCode() hashTable.length if
(hashIndex
hashTable.length return hashIndex
12Resolving Collisions
- Options when hash functions returns location
already used in the table - Use another location in the table
- Change the structure of the hash table so that
each array location can represent multiple values
13Open Addressing with Linear Probing
- Open addressing locates alternate location
- New location must be open, available
- Linear probing
- If collision occurs at hashTablek, look
successively at location k 1, k 2,
14Open Addressing with Linear Probing
Fig. 19-3 The effect of linear probing after
adding four entries whose search keys hash to the
same index.
15Open Addressing with Linear Probing
Fig. 19-4 A revision of the hash table shown in
19-3 when linear probing resolves collisions
each entry contains a search key and its
associated value
16Removals
remove(555-8132) remove(555-4294) BUT We
dont want removal of 53 54 to cause a search
for 555-2072 to fail!
Fig. 19-5 A hash table if remove replaces removed
entries with null (bad idea)
17Removals
- We need to distinguish among three kinds of
locations in the hash table - Occupied
- The location references an entry in the
dictionary - Empty
- The location contains null and always did
- Available
- The location's entry was removed from the
dictionary
18Open Addressing with Linear Probing
Fig. 19-6 A linear probe sequence (a) after
adding an entry (b) after removing two entries
19Open Addressing with Linear Probing
Fig. 19-6 A linear probe sequence (c) after a
search (d) during the search while adding an
entry (e) after an addition to a formerly
occupied location.
20Searches that Dictionary Operations Require
- To retrieve an entry
- Search the probe sequence for the key
- Examine entries that are present, ignore
locations in available state - Stop search when key is found or null reached
- To remove an entry
- Search the probe sequence same as for retrieval
- If key is found, mark location as available
- To add an entry
- Search probe sequence same as for retrieval
- Note first available slot
- Use available slot if the key is not found
21Open Addressing, Quadratic Probing
- Change the probe sequence
- Given search key k
- Probe to k 1, k 22, k 32, k n2
- Reaches every location in the hash table if table
size is a prime number - For avoiding primary clustering
- But can lead to secondary clustering
22Open Addressing, Quadratic Probing
52
62
Fig. 19-7 A probe sequence of length 5 using
quadratic probing.
23Open Addressing with Double Hashing
- Resolves collision by examining locations
- At original hash index
- Plus an increment determined by 2nd function
- Second hash function
- Different from first
- Depends on search key
- Returns nonzero value
- Reaches every location in hash table if table
size is prime - Avoids both primary and secondary clustering
24Open Addressing with Double Hashing
Per book (p.425), h1(16) 2 and h2(16) 4
Fig. 19-8 The first three locations in a probe
sequence generated by double hashing for the
search key.
25Separate Chaining
- Alter the structure of the hash table
- Each location can represent multiple values
- Each location called a bucket
- Bucket can take many forms
- List
- Sorted list
- Chain of linked nodes
- Array
- Vector
26Separate Chaining
Fig. 19-9 A hash table for use with separate
chaining each bucket is a chain of linked nodes.
27Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (a)
duplicate and unsorted
28Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (b)
distinct and unsorted
29Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (c)
distinct and sorted
30Efficiency Observations
- Successful retrieval or removal
- Same efficiency as successful search
- Unsuccessful retrieval or removal
- Same efficiency as unsuccessful search
- Successful addition
- Same efficiency as unsuccessful search
- Unsuccessful addition
- Same efficiency as successful search
31Load Factor
- Perfect hash function not always possible or
practical - Thus, collisions likely to occur
- As hash table fills
- Collisions occur more often
- Measure for table fullness, the load factor
32Cost of Open Addressing
Fig. 19-11 The average number of comparisons
required by a search of the hash table for given
values of the load factor when using linear
probing.
33Cost of Open Addressing
Fig. 19-12 The average number of comparisons
required by a search of the hash table for given
values of the load factor when using either
quadratic probing or double hashing.
34Cost of Separate Chaining
Fig. 19-13 Average number of comparisons required
by search of hash table for given values of load
factor when using separate chaining.
35Rehashing
- When load factor becomes too large
- Expand the hash table
- Double present size, increase result to next
prime number - Use method add to place current entries into new
hash table
36Comparing Schemes for Collision Resolution
Fig. 19-14 Average number of comparisons required
by search of hash table versus for four
techniques when search is (a) successful (b)
unsuccessful.
37A Dictionary Implementation That Uses Hashing
Fig. 19-15 A hash table and one of its entry
objects
38A Dictionary Implementation That Uses Hashing
- Beginning of private class TableEntry
- Made internal to dictionary class
private class TableEntry implements
java.io.Serializable private Object
entryKey private Object entryValue private
boolean inTable private TableEntry(Object
key, Object value) entryKey
key entryValue value inTable true
// end constructor . . .
39A Dictionary Implementation That Uses Hashing
Fig. 19-16 A hash table containing dictionary
entries, removed entries, and null values.
40Java Class Library The Class HashMap
- Assumes search-key objects belong to a class that
overrides methods hashCode and equals - Hash table is collection of buckets
- Constructors
- public HashMap()
- public HashMap (int initialSize)
- public HashMap (int initialSize, float
maxLoadFactor) - public HashMap (Map table)