Hash Tables - PowerPoint PPT Presentation

1 / 126
About This Presentation
Title:

Hash Tables

Description:

What is the best case for sequential search? O(1) when value is the ... Usually involves taking the key, chopping it up, mix the pieces together in various ways ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 127
Provided by: BMorr1
Category:
Tags: chopping | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables


1
Hash Tables
  • Briana B. Morrison
  • Adapted from William Collins

2
(No Transcript)
3
(No Transcript)
4
Sequential Search
  • Given a vector of integers
  • v 12, 15, 18, 3, 76, 9, 14, 33, 51,
    44
  • What is the best case for sequential search?
  • O(1) when value is the first element
  • What is the worst case?
  • O(n) when value is last element, or value is not
    in the list
  • What is the average case?
  • O(1/2 n) which is O(n)

5
(No Transcript)
6
(No Transcript)
7
Binary Search
  • Given a vector of integers
  • v 3, 9, 12, 14, 15, 18, 33, 44, 51,
    76
  • What is the best case for binary search?
  • O(1) when element is the middle element
  • What is the worst case?
  • O(log n) when element is first, last, or not in
    list
  • What is the average case?
  • O(log n)

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Map vs. Hashmap
  • What are the differences between a map and a
    hashmap?
  • Interface
  • Efficiency
  • Applications
  • Implementation

22
(No Transcript)
23
(No Transcript)
24
  • CONTIGUOUS
  • array? vector? deque? heap?
  • LINKED
  • Linked? list? map?
  • BUT NONE OF THESE WILL GIVE
  • CONSTANT AVERAGE TIME FOR
  • SEARCHES, INSERTIONS AND
  • REMOVALS.

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
To make these values fit into the table, we need
to mod by the table size i.e., key 1000.
210
256
816
OOPS!
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Hash Codes
  • Suppose we have a table of size N
  • A hash code is
  • A number in the range 0 to N-1
  • We compute the hash code from the key
  • You can think of this as a default position
    when inserting, or a position hint when looking
    up
  • A hash function is a way of computing a hash code
  • Desire The set of keys should spread evenly over
    the N values
  • When two keys have the same hash code collision

37
Hash Functions
  • A hash function should be quick and easy to
    compute.
  • A hash function should achieve an even
    distribution of the keys that actually occur
    across the range of indices for both random and
    non-random data.
  • Calculation should involve the entire search key.

38
Examples of Hash Functions
  • Usually involves taking the key, chopping it up,
    mix the pieces together in various ways
  • Examples
  • Truncation ignore part of key, use the
    remaining part as the index
  • Folding partition the key into several parts
    and combine the parts in a convenient way
    (adding, etc.)
  • After calculating the index, use modular
    arithmetic. Divide by the size of the index
    range, and take the remainder as the result

39
Example Hash Function
40
Devising Hash Functions
  • Simple functions often produce many collisions
  • ... but complex functions may not be good either!
  • It is often an empirical process
  • Adding letter values in a string same hash for
    strings with same letters in different order
  • Better approach
  • size_t hash 0
  • for (size_t i 0 i lt s.size() i)
  • hash hash 31 si

41
Devising Hash Functions (2)
  • The String hash is good in that
  • Every letter affects the value
  • The order of the letters affects the value
  • The values tend to be spread well over the
    integers

42
Devising Hash Functions (3)
  • Guidelines for good hash functions
  • Spread values evenly as if random
  • Cheap to compute
  • Generally, number of possible values much greater
    than table size

43
Hash Code Maps
  • Memory address
  • We reinterpret the memory address of the key
    object as an integer
  • Good in general, except for numeric and string
    keys
  • Integer cast
  • We reinterpret the bits of the key as an integer
  • Suitable for keys of length less than or equal to
    the number of bits of the integer type (e.g.,
    char, short, int and float on many machines)
  • Component sum
  • We partition the bits of the key into components
    of fixed length (e.g., 16 or 32 bits) and we sum
    the components (ignoring overflows)
  • Suitable for numeric keys of fixed length greater
    than or equal to the number of bits of the
    integer type (e.g., long and double on many
    machines)

44
Hash Code Maps (cont.)
  • Polynomial accumulation
  • We partition the bits of the key into a sequence
    of components of fixed length (e.g., 8, 16 or 32
    bits) a0 a1 an-1
  • We evaluate the polynomial
  • p(z) a0 a1 z a2 z2 an-1zn-1
  • at a fixed value z, ignoring overflows
  • Especially suitable for strings (e.g., the choice
    z 33 gives at most 6 collisions on a set of
    50,000 English words)
  • Polynomial p(z) can be evaluated in O(n) time
    using Horners rule
  • The following polynomials are successively
    computed, each from the previous one in O(1) time
  • p0(z) an-1
  • pi (z) an-i-1 zpi-1(z) (i 1, 2, , n
    -1)
  • We have p(z) pn-1(z)

45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Collision Handlers
  • NOW WELL LOOK AT SPECIFIC COLLISION HANDLERS
  • Chaining
  • Linear Probing (Open Addressing)
  • Double Hashing
  • Quadratic Hashing

61
Collision Handling
  • Collisions occur when different elements are
    mapped to the same cell
  • Chaining let each cell in the table point to a
    linked list of elements that map there
  • Chaining is simple, but requires additional
    memory outside the table

62
(No Transcript)
63
(No Transcript)
64
Chaining with Separate Lists Example
65
Chaining Picture
Two items hashed to bucket 3 Three items hashed
to bucket 4
66
(No Transcript)
67
(No Transcript)
68
FOR THE find METHOD, averageTimeS(n, m) ? n /
2m iterations.
lt 0.75 / 2 SO averageTimeS(n, m) lt A
CONSTANT. averageTimeS(n, m) IS CONSTANT.
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
Hash Table Using Open Probe Addressing Example
Insert 45 (mod by table size 11)
76
Hash Table Using Open Probe Addressing Example
Insert 35
77
Hash Table Using Open Probe Addressing Example
Insert 76
78
Hash Table Using Open Probe Addressing Example
79
Linear Probing
  • Open addressing the colliding item is placed in
    a different cell of the table
  • Linear probing handles collisions by placing the
    colliding item in the next (circularly) available
    table cell
  • Each table cell inspected is referred to as a
    probe
  • Colliding items lump together, causing future
    collisions to cause a longer sequence of probes
  • Example
  • h(x) x mod 13
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in
    this order














0
1
2
3
4
5
6
7
8
9
10
11
12


41


18
44
59
32
22
31
73

0
1
2
3
4
5
6
7
8
9
10
11
12
80
  • WE NEED TO KNOW WHEN A SLOT IS FULL
  • OR OCCUPIED.
  • HOW?
  • INSTEAD OF JUST T() STORED IN THE BUCKETS
    (BECAUSE T() COULD BE A VALID VALUE), THE BUCKET
    WILL STORE AN INSTANCE OF THE VALUE_TYPE CLASS.

81
(No Transcript)
82
(No Transcript)
83
Retrieve
  • What about when we want to retrieve?
  • Consider the previous example.

84
Hash Table Using Open Probe Addressing Example
Find the value 35. ( 11) Now find
the value 76. Now find the value 33.
85
Hash Table Using Open Probe Addressing Example
Now delete 35. ( 11) Now find the
value 76. Now find the value 33.
86
Linear Probing
  • Probe by incrementing the index
  • If fall off end, wrap around to the beginning
  • Take care not to cycle forever!
  • Compute index as hash_fcn() table.size()
  • if tableindex NULL, item is not in the table
  • if tableindex matches item, found item (done)
  • Increment index circularly and go to 2
  • Why must we probe repeatedly?
  • hashCode may produce collisions
  • remainder by table.size may produce collisions

87
Search Termination
  • Ways to obtain proper termination
  • Stop when you come back to your starting point
  • Stop after probing N slots, where N is table size
  • Stop when you reach the bottom the second time
  • Ensure table never full
  • Reallocate when occupancy exceeds threshold

88
(No Transcript)
89
Erase value 1069.
false
90
Now search for 460.
91
(No Transcript)
92
SOLUTION bool marked_for_removal THE
CONSTRUCTOR FOR VALUE_TYPE SETS EACH buckets
marked_for_removal FIELD TO false. insert SETS
marked_for_removal TO false erase SETS
marked_for_removal TO true. SO AFTER THE
INSERTIONS
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
CLUSTER A SEQUENCE OF NON-EMPTY LOCATIONS KEYS
THAT HASH TO 54 FOLLOW THE SAME COLLISION-PATH AS
KEYS THAT HASH TO 55,
98
(No Transcript)
99
(No Transcript)
100
SOLUTION 1 DOUBLE HASHING, THAT IS, OBTAIN BOTH
INDICES AND OFFSETS BY HASHING   unsigned
long hash_int hash (key) int index hash_int
length, offset hash_int / length NOW THE
OFFSET DEPENDS ON THE KEY, SO DIFFERENT KEYS WILL
USUALLY HAVE DIFFERENT OFFSETS, SO NO MORE
PRIMARY CLUSTERING!
Secondary hash function
101
TO GET A NEW INDEX index (index offset)
length
Notice that if a collision occurs, you rehash
from the NEW index value.
102
EXAMPLE length 11 key index
offset 15 4 1 19 8 1 16 5 1 58
3 5 27 5 2 35 2 3 30 8 2 47 3 4 WHERE
WOULD THESE KEYS GO IN buckets?
103
index key 0 47 1 2 35 3 58
4 15 5 16 6 7 27 8 19 9 10 30
104
PROBLEM WHAT IF OFFSET IS A MULTIPLE OF
length? EXAMPLE length 11 key
index offset 15
4 1 19 8 1 16 5 1 58 3 5 27 5 2 35 2 3
47 3 4 246 4 22 // BUT 15 IS AT INDEX 4
// FOR KEY 246, NEW INDEX (4 22) 11
4. OOPS!
105
SOLUTION if (offset length 0)
offset 1 ON AVERAGE, offset length
WILL EQUAL 0 ONLY ONCE IN EVERY length TIMES.
106
FINAL PROBLEM WHAT IF length HAS SEVERAL
FACTORS? EXAMPLE length 20 key
index offset 20 0
1 25 5 1 30 10 1 35
15 1 110 10 5 // BUT 30 IS AT
INDEX 10 FOR KEY 110, NEW INDEX (10 5) 20
15, WHICH IS OCCUPIED, SO NEW INDEX (15 5)
20, WHICH IS OCCUPIED, SO NEW INDEX ...
107
SOLUTION MAKE length A PRIME.
108
Example of Double Hashing
  • Consider a hash table storing integer keys that
    handles collision with double hashing
  • N 13
  • h(k) k mod 13
  • d(k) 7 - k mod 7
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in
    this order














0
1
2
3
4
5
6
7
8
9
10
11
12
31

41


18
32
59
73
22
44

0
1
2
3
4
5
6
7
8
9
10
11
12
109
(No Transcript)
110
ANOTHER SOLUTION QUADRATIC HASHING, THAT IS,
ONCE COLLISION OCCURS AT h, GO TO LOCATION h 1,
THEN IF COLLISION OCCURS THERE GO TO LOCATION h
4, then h 9, then h 16, etc. unsigned long
hash_int hash (key) int index hash_int
length, offset i2 Notice that h stays at the
same location. No clustering.
111
QUADRATIC REHASHING EXAMPLE length 11 key
index offset 15
4 19 8 16 5 58 3 27 5 1, final place
index 6 35 2 30 8 1, final place index
9 47 3 4, final place index 7
112
Performance
  • HOW DOES DOUBLE-HASHING COMPARE WITH CHAINED
    HASHING?

113
Performance of Hash Tables
  • Load factor filled cells / table size
  • Between 0 and 1
  • Load factor has greatest effect on performance
  • Lower load factor ? better performance
  • Reduce collisions in sparsely populated tables
  • Knuth gives expected probes p for open
    addressing, linear probing, load factor L p
    ½(1 1/(1-L))
  • As L approaches 1, this zooms up
  • For chaining, p 1 (L/2)
  • Note Here L can be greater than 1!

114
Performance of Hash Tables (2)
115
Performance of Hash Tables (3)
  • Hash table
  • Insert average O(1)
  • Search average O(1)
  • Sorted array
  • Insert average O(n)
  • Search average O(log n)
  • Binary Search Tree
  • Insert average O(log n)
  • Search average O(log n)
  • But balanced trees can guarantee O(log n)

116
We know that hashing becomes inefficient as
the table fills up. What to do? EXPAND!
117
(No Transcript)
118
(No Transcript)
119
(No Transcript)
120
(No Transcript)
121
Summary Slide 1
- Hash Table - simulates the fastest
searching technique, knowing the index of the
required value in a vector and array and apply
the index to access the value, by applying a
hash function that converts the data to an
integer - After obtaining an index by
dividing the value from the hash function by
the table size and taking the remainder,
access the table. Normally, the number of
elements in the table is much smaller than the
number of distinct data values, so collisions
occur. - To handle collisions, we must
place a value that collides with an existing
table element into the table in such a way that
we can efficiently access it later.
121
122
Summary Slide 2
- Hash Table (Cont) - average running time
for a search of a hash table is O(1) -
the worst case is O(n)
122
123
Summary Slide 3
- Collision Resolution - Types 1) linear
open probe addressing - the table is a
vector or array of static size - After using
the hash function to compute a table index,
look up the entry in the table. - If the
values match, perform an update if
necessary. - If the table entry is
empty, insert the value in the table.
123
124
Summary Slide 4
- Collision Resolution (Cont) -
Types 1) linear open probe addressing -
Otherwise, probe forward circularly, looking
for a match or an empty table slot. -
If the probe returns to the original starting
point, the table is full. - you can
search table items that hashed to
different table locations. - Deleting
an item difficult.
124
125
Summary Slide 5
- Collision Resolution (Cont) 2) chaining
with separate lists. - the hash table is a
vector of list objects - Each list is a
sequence of colliding items. - After
applying the hash function to compute the
table index, search the list for the data
value. - If it is found, update its
value otherwise, insert the value at the
back of the list. - you search only items
that collided at the same table location
125
126
Summary Slide 6
- Collision Resolution (Cont) - there is no
limitation on the number of values in the
table, and deleting an item from the table
involves only erasing it from its
corresponding list
126
Write a Comment
User Comments (0)
About PowerShow.com