Title: Resolving collisions: Open addressing
1Resolving collisions Open addressing
- Open addressing
- Store all elements within the table
- The space we save from the chain pointers is used
instead to make the array larger. - If there is a collision, probe the table in a
systematic way to find an empty slot. - If the table fills up, we need to enlarge it and
rehash all the keys.
2Open addressing Linear Probing
- hash function (h(k) i ) mod m for i0,
1,...,m-1 - Insert Start with the location where the key
hashed and do a sequential search for an empty
slot. - Search Start with the location where the key
hashed and do a sequential search until you
either find the key (success) or find an empty
slot (failure). - Delete (lazy deletion) follow same route but
mark slot as DELETED rather than EMPTY, otherwise
subsequent searches will fail.
3Open addressing Linear Probing
- Advantage very easy to implement
- Disadvantage primary clustering
- Long sequences of used slots build up with gaps
between them. Every insertion requires several
probes and adds to the cluster. - The average length of a probe sequence when
inserting is
4Open addressing Quadratic Probing
- Probe the table at slots ( h(k) i2 ) mod m
for i 0, 1,2, 3, ..., m-1 - Ease of computation
- Not as easy as linear probing.
- Do we really have to compute a power?
- Clustering
- Primary clustering is avoided, since the probes
are not sequential. But...
5Open addressing Quadratic Probing
- Probe sequence for hash value 3 in a table of
size 16
3 02 3 3 12 4 3 22 7 3 32 12 3
42 3 3 52 12 3 62 7 3 72 4
3 82 3 3 92 4 3 102 7 3 112
12 3 122 3 3 132 12 3 142 7 3 152
4
6Open addressing Quadratic Probing
- Probe sequence for hash value 3 in a table of
size 19
Why did this happen?
3 02 3 3 12 4 3 22 7 3 32 12 3
42 0 3 52 9 3 62 1 3 72 14 3 82
10 3 92 8
3 102 8 3 112 10 3 122 14 3 132
1 3 142 9 3 152 0 3 162 12 3 172
7 3 182 4
i2 is not a good idea after all. Use a
small variant instead h(k)(-1)i-1 ((i1)/1)2
7Open addressing Quadratic Probing
An element can always be inserted, All probes
will be at different indices
Prime table size
? lt 0.5
8Open addressing Quadratic Probing
- Disadvantage secondary clustering
- if h(k1)h(k2) the probing sequences for k1 and
k2 are exactly the same. - Is this really bad?
- In practice, not so much
- It becomes an issue when the load factor is high.
9Open addressing Double hashing
- The hash function is (h(k)i h2(k)) mod m
- In English use a second hash function to obtain
the next slot. - The probing sequence is
- h(k), h(k)h2(k), h(k)2h2(k), h(k)3h3(k),
... - Performance
- Much better than linear or quadratic probing.
- Does not suffer from clustering
- BUT requires computation of a second function
10Open addressing Double hashing
- The choice of h2(k) is important
- It must never evaluate to zero
- consider h2(k)k mod 9 for k81
- The choice of m is important
- If it is not prime, we may run out of alternate
locations very fast. - If m and h2(k) are relatively prime, well end
up probing the entire table. - A good choice for h2 is h2(k) p ? (k mod p)
where p is a prime less than m.
11Open addressing Random hashing
- Use a pseudorandom number generator to obtain the
sequence of slots that are probed.