Hashing as a Dictionary Implementation - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Hashing as a Dictionary Implementation

Description:

A Dictionary Implementation That Uses Hashing ... Fig. 19-16 A hash table containing dictionary entries, removed entries, and null values. ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 41
Provided by: steve1789
Category:

less

Transcript and Presenter's Notes

Title: Hashing as a Dictionary Implementation


1
Hashing as a Dictionary Implementation
  • Chapter 19

2
Chapter Contents
  • What is Hashing?
  • Hash Functions
  • Computing Hash Codes
  • Compressing a Hash Code into a Hash Table Index
  • Resolving Collisions
  • Open Addressing with Linear Probing
  • Open Addressing with Quadratic Probing
  • Open Addressing with Double Hashing
  • A Potential Problem with Open Addressing
  • Separate Chaining

3
Chapter Contents (ctd.)
  • Efficiency
  • Load Factor
  • Cost of Open Addressing
  • Cost of Separate Chaining
  • Rehashing
  • Comparing Schemes for Collision Resolution
  • A Dictionary Implementation that Uses Hashing
  • Entries in the Hash Table
  • Data Fields and Constructors
  • The Methods getValue, remove, and addIterators
  • Java Class Library the Class HashMap

4
What is Hashing?
  • A technique that determines an index for storage
    of an item in a data structure
  • The hash function receives the search key
  • Returns the index of an element in an array
    called the hash table
  • The index is known as the hash index
  • A perfect hash function maps each search key into
    a different integer suitable as an index to the
    hash table

5
What is Hashing?
Assume a 911 system maps phone numbers to
street addresses Mapping phone number (key) to
index (hash) requires knowledge of domain of keys
Fig. 19-1 A hash function indexes its hash table.
6
What is Hashing?
  • Two steps of the hash function
  • Convert the search key into an integer called the
    hash code
  • Compress the hash code into the range of indices
    for the hash table
  • Typical hash functions are not perfect
  • They can allow more than one search key to map
    into a single index
  • This is known as a collision
  • Alternative is sparse table, which wastes memory
    (i.e., Hash Tables are a tradeoff)

7
What is Hashing?
Suppose table size 101 hash code is last
four digits of 1214 101 52 8132 101 52
Fig. 19-2 A collision caused by the hash function
h
8
Hash Functions
  • General characteristics of a good hash function
  • Minimize collisions
  • Distribute entries uniformly throughout the hash
    table
  • Compute quickly

9
Computing Hash Codes
  • We will override the hashCode method of Object
  • Guidelines
  • If a class overrides the method equals, it should
    override hashCode
  • If the method equals considers two objects equal,
    hashCode must return the same value for both
    objects
  • If an object invokes hashCode more than once
    during execution of program on the same data, it
    must return the same hash code
  • An object's hash code during one execution of a
    program can differ from its hash code during
    another execution of the same program

10
Computing Hash Codes
  • The hash code for a string, s
  • Hash code for a primitive type
  • Use the primitive typed key itself
  • Manipulate internal binary representations
  • Use folding (XOR of left/right for 64 bits)

int hash 0int n s.length()for (int i 0
i
is a positive constant
11
Compressing a Hash Code
  • Must compress the hash code so it fits into the
    index range
  • Typical method for a code c is to compute c
    modulo n
  • n is a prime number (and the size of the table)
  • Index will then be between 0 and n 1

private int getHashIndex(Object key) int
hashIndex key.hashCode() hashTable.length if
(hashIndex
hashTable.length return hashIndex
12
Resolving Collisions
  • Options when hash functions returns location
    already used in the table
  • Use another location in the table
  • Change the structure of the hash table so that
    each array location can represent multiple values

13
Open Addressing with Linear Probing
  • Open addressing locates alternate location
  • New location must be open, available
  • Linear probing
  • If collision occurs at hashTablek, look
    successively at location k 1, k 2,

14
Open Addressing with Linear Probing
Fig. 19-3 The effect of linear probing after
adding four entries whose search keys hash to the
same index.
15
Open Addressing with Linear Probing
Fig. 19-4 A revision of the hash table shown in
19-3 when linear probing resolves collisions
each entry contains a search key and its
associated value
16
Removals
remove(555-8132) remove(555-4294) BUT We
dont want removal of 53 54 to cause a search
for 555-2072 to fail!
Fig. 19-5 A hash table if remove replaces removed
entries with null (bad idea)
17
Removals
  • We need to distinguish among three kinds of
    locations in the hash table
  • Occupied
  • The location references an entry in the
    dictionary
  • Empty
  • The location contains null and always did
  • Available
  • The location's entry was removed from the
    dictionary

18
Open Addressing with Linear Probing
Fig. 19-6 A linear probe sequence (a) after
adding an entry (b) after removing two entries
19
Open Addressing with Linear Probing
Fig. 19-6 A linear probe sequence (c) after a
search (d) during the search while adding an
entry (e) after an addition to a formerly
occupied location.
20
Searches that Dictionary Operations Require
  • To retrieve an entry
  • Search the probe sequence for the key
  • Examine entries that are present, ignore
    locations in available state
  • Stop search when key is found or null reached
  • To remove an entry
  • Search the probe sequence same as for retrieval
  • If key is found, mark location as available
  • To add an entry
  • Search probe sequence same as for retrieval
  • Note first available slot
  • Use available slot if the key is not found

21
Open Addressing, Quadratic Probing
  • Change the probe sequence
  • Given search key k
  • Probe to k 1, k 22, k 32, k n2
  • Reaches every location in the hash table if table
    size is a prime number
  • For avoiding primary clustering
  • But can lead to secondary clustering

22
Open Addressing, Quadratic Probing
52
62
Fig. 19-7 A probe sequence of length 5 using
quadratic probing.
23
Open Addressing with Double Hashing
  • Resolves collision by examining locations
  • At original hash index
  • Plus an increment determined by 2nd function
  • Second hash function
  • Different from first
  • Depends on search key
  • Returns nonzero value
  • Reaches every location in hash table if table
    size is prime
  • Avoids both primary and secondary clustering

24
Open Addressing with Double Hashing
Per book (p.425), h1(16) 2 and h2(16) 4
Fig. 19-8 The first three locations in a probe
sequence generated by double hashing for the
search key.
25
Separate Chaining
  • Alter the structure of the hash table
  • Each location can represent multiple values
  • Each location called a bucket
  • Bucket can take many forms
  • List
  • Sorted list
  • Chain of linked nodes
  • Array
  • Vector

26
Separate Chaining
Fig. 19-9 A hash table for use with separate
chaining each bucket is a chain of linked nodes.
27
Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (a)
duplicate and unsorted
28
Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (b)
distinct and unsorted
29
Separate Chaining
Fig. 19-10 Where new entry is inserted into
linked bucket when integer search keys are (c)
distinct and sorted
30
Efficiency Observations
  • Successful retrieval or removal
  • Same efficiency as successful search
  • Unsuccessful retrieval or removal
  • Same efficiency as unsuccessful search
  • Successful addition
  • Same efficiency as unsuccessful search
  • Unsuccessful addition
  • Same efficiency as successful search

31
Load Factor
  • Perfect hash function not always possible or
    practical
  • Thus, collisions likely to occur
  • As hash table fills
  • Collisions occur more often
  • Measure for table fullness, the load factor

32
Cost of Open Addressing
Fig. 19-11 The average number of comparisons
required by a search of the hash table for given
values of the load factor when using linear
probing.
33
Cost of Open Addressing
Fig. 19-12 The average number of comparisons
required by a search of the hash table for given
values of the load factor when using either
quadratic probing or double hashing.
34
Cost of Separate Chaining
Fig. 19-13 Average number of comparisons required
by search of hash table for given values of load
factor when using separate chaining.
35
Rehashing
  • When load factor becomes too large
  • Expand the hash table
  • Double present size, increase result to next
    prime number
  • Use method add to place current entries into new
    hash table

36
Comparing Schemes for Collision Resolution
Fig. 19-14 Average number of comparisons required
by search of hash table versus for four
techniques when search is (a) successful (b)
unsuccessful.
37
A Dictionary Implementation That Uses Hashing
Fig. 19-15 A hash table and one of its entry
objects
38
A Dictionary Implementation That Uses Hashing
  • Beginning of private class TableEntry
  • Made internal to dictionary class

private class TableEntry implements
java.io.Serializable private Object
entryKey private Object entryValue private
boolean inTable private TableEntry(Object
key, Object value) entryKey
key entryValue value inTable true
// end constructor . . .
39
A Dictionary Implementation That Uses Hashing
Fig. 19-16 A hash table containing dictionary
entries, removed entries, and null values.
40
Java Class Library The Class HashMap
  • Assumes search-key objects belong to a class that
    overrides methods hashCode and equals
  • Hash table is collection of buckets
  • Constructors
  • public HashMap()
  • public HashMap (int initialSize)
  • public HashMap (int initialSize, float
    maxLoadFactor)
  • public HashMap (Map table)
Write a Comment
User Comments (0)
About PowerShow.com