Hash Table - PowerPoint PPT Presentation

About This Presentation

Title:

Hash Table

Description:

Chapter 12 Hash Table – PowerPoint PPT presentation

Number of Views:315

Avg rating:3.0/5.0

Slides: 49

Provided by: Darw2

Category:

more less

Transcript and Presenter's Notes

Title: Hash Table

1
Chapter 12

Hash Table

2
Hash Table

So far, the best worst-case time for searching is
O(log n).
Hash tables
average search time of O(1).
worst case search time of O(n).

3
Learning Objectives

Develop the motivation for hashing.
Study hash functions.
Understand collision resolution and compare and
contrast various collision resolution schemes.
Summarize the average running times for hashing
under various collision resolution schemes.
Explore the java.util.HashMap class.

4
12.1 Motivation

Let's design a data structure using an array for
which the indices could be the keys of entries.
Suppose we wanted to store the keys 1, 3, 5, 8,
10, with a guaranteed one-step access to any of
these.

5
12.1 Motivation

The space consumption does not depend on the
actual number of entries stored.
It depends on the range of keys.
What if we wanted to store strings?
For each string, we would first have to compute a
numeric key that is equivalent to it.
java.lang.String.hashCode() computes the numeric
equivalent (or hashcode) of a string by an
arithmetic manipulation involving its individual
characters.

6
12.1 Motivation

Using numeric keys directly as indices is out of
the question for most applications.
There isn't enough space

7
12.1 Motivation
8
12.2 Hashing

A simple hash function
table size of 10
h(k) k mod 10

9
12.2 Hashing

ear collides with cat at position 4.
There is empty space in the table, and it is up
to the collision resolution scheme to find an
appropriate position for this string.
A better mapping function
For any hash function one could devise, there are
always hashcodes that could force the mapping
function to be ineffective by generating lots of
collisions.

10
12.2 Hashing
11
12.3 Collision Resolution

There are two ways to resolve collisions.
open addressing
Find another location for the colliding key
within the hash table.
closed addressing
store all keys that hash to the same location in
a data structure that hangs off that location.

12
12.3.1 Linear Probing
13
12.3.1 Linear Probing

As more and more entries are hashed into the
table, they tend to form clusters that get bigger
and bigger.
The number of probes on collisions gradually
increases, thus slowing down the hash time to a
crawl.

14
12.3.1 Linear Probing

Insert "cat", "ear", "sad", and "aid"

15
12.3.1 Linear Probing

Clustering is the downfall of linear probing, so
we need to look to another method of collision
resolution that avoids clustering.

16
12.3.2 Quadratic Probing
17
12.3.2 Quadratic Probing

Avoids Clustering
When the probing stops with a failure to find an
empty spot, as many as half the locations of the
table may still be unoccupied.
A hash to 2,3,6,0,7, and 5 are endlessly
repeated, and an insertion is not done, even
though half the table is empty.

18
12.3.2 Quadratic Probing

For any given prime N, once a location is
examined twice, all locations that are examined
thereafter are also ones that have been already
examined.

19
12.3.3 Chaining

If a collision occurs at location i of the hash
table, it simply adds the colliding entry to a
linked list that is built at that location.

20
Running times

We assume that the hashing process itself
(hashcode and mapping) takes O(1).
Running time of insertion is determined by the
collision resolution scheme.

21
12.4 The java.util.HashMap Class

Consider a university-wide database that stores
student records.
Every student is assigned a unique id (key), with
which is associated several pieces of information
such as name, address, credits, gpa, etc.
These pieces of information constitute the value.

22
12.4 The java.util.HashMap Class

A StudentInfo dictionary that stores (id, info)
pairs for all the students enrolled in the
university.
The operations corresponding to this relationship
can be found in hava.util.MapltK,Vgt

23
12.4 The java.util.HashMap Class

The Map interface also provides operations to
enumerate all the keys, enumerate all the values,
get the size of the dictionary, check whether the
dictionary is empty, and so on.
The java.util.HashMap implements the dictionary
abstraction as specified by the java.util.Map
interface. It resolves collisions using chaining.

24
12.4.1 Table and Load Factor

When the no-arg constructor is used
Default initial capacity 16
Default load factor of 0.75.
The table size is defined as the actual number of
key-value mappings in the has table.

25
12.4.1 Table and Load Factor

We can choose an initial capacity
Only uses capacities that are powers of 2.
101 becomes 128

26
12.4.1 Table and Load Factor

An initial capacity of 128.

27
12.4.2 Storage of Entries

Relevant fields in the HashMap class.
threshold is the size threshold
Product of the capacity and the threshold load
factor (N t)

28
12.4.2 Storage of Entries

Entry table sets up an array of chains.
Map.EntryltK,Vgt is defined inside the MapltK,Vgt
interface.
next holds a reference to the next Entry in its
linked list.

29
12.4.3 Adding an Entry

Example
Name serves as a key to the phone number value.

30
12.4.3 Adding an Entry
31
12.4.3 Adding an Entry

If the key argument is null, a special object,
NULL_KEY is returned, otherwise the argument key
is returned as is.

32
12.4.3 Adding an Entry
33
12.4.3 Adding an Entry

Example
h 25 and length 16
The binary representation of h and length-1
(11001 and 01111).

34
12.4.3 Adding an Entry

Since length is a power of 2, the binary
representation of length will be 100...0 with k
zeros.
Any h is expressible as 2c k r.
r is a result of the bit-wise and, since the 2c
k part is a higher order bit that will be zeroed
out in the process.

35
12.4.3 Adding an Entry
36
12.4.3 Adding an Entry

The if statement triggers a rehashing process if
the size is equal to or greater than the
threshold.

37
12.4.4 Rehashing
38
12.4.4 Rehashing
39
12.4.5 Searching
40
12.5 Quadratic Probing Repetition of Probe
Locations

Quadratic probing only examines N/2 locations of
the table before starting to repeat locations.
Suppose a key is hashed to location h, where
there is a collision.
Following locations are examined.

41
12.5 Quadratic Probing Repetition of Probe
Locations

If two different probes (i and j) end up at the
same location?

42
12.5 Quadratic Probing Repetition of Probe
Locations

Since N is a prime number, it must divide one of
the factors (i j) or (i - j).
N divides (i - j) only when at least N probes
have been made already.
N divides (i j) when (i j N), at the very
least.
j N - i

43
12.6 Summary

A hash table implements the dictionary operations
of insert, search, and delete on (key, value)
pairs.
Given a key, a hash function for a given hash
table computes an index into the table as a
function of the key by first obtaining a numeric
hashcode, and then mapping this hashcode to a
table location.

44
12.6 Summary

When a new key hashes to a location in the hash
table that is already occupied, it is said to
collide with the occupying key.
Collision resolution is the process used upon
collision to determine an unoccupied location in
the hash table where the colliding key may be
inserted.
In searching for a key, the same hash function
and collision resolution scheme must be used as
for its insertion.

45
12.6 Summary

A good hash function must be O(1) time and must
distribute entries uniformly over the hash table.
Open addressing relocates a colliding entry in
the hash table itself. Closed addressing stores
all entries that hash to a location, in a data
structure that hangs off that location.
Linear probing and quadratic probing are
instances of open addressing, while chaining is
an instance of closed addressing.

46
12.6 Summary

Linear probing leads to clustering of entries
with the clusters becoming increasingly larger as
more and more collisions occur. Clustering
degrades performance significantly.
Quadratic probing attempts to reduce clustering.
On the other hand, quadratic probing may leave as
many as half the hash table empty while reporting
failure to insert a new entry.

47
12.6 Summary

Chaining is the simplest way to resolve
collisions and also results in better performance
than linear probing or quadratic probing.
The worst-case search time for linear probing,
quadratic probing, and chaining is O(n).
The load factor of a hash table is the ratio of
the number of keys, n, to the capacity, N.

48
12.6 Summary

The average performance of chaining depends on
the load factor. For a perfect hash function that
always distributes keys uniformly, the average
search time for chaining is O(1).

Write a Comment

User Comments (0)