Some Hash Functions - PowerPoint PPT Presentation

About This Presentation
Title:

Some Hash Functions

Description:

N = 45,402 English words. From 'Aarhus' to 'Zurich' 1 28 ... hashes 'botch', 'botches', 'botching', and 'botched' to same bucket. h8(x) = random(0..M-1) ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 13
Provided by: RandalE9
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Some Hash Functions


1
Experiments with Hashing 15-451 Feb. 15, 2001
  • Some Hash Functions
  • Bucket Size Distribution
  • Maximum Bucket Sizes

http//www.cs.cmu.edu/bryant
2
Parameters
  • Keys
  • /usr/dict/words
  • N 45,402 English words
  • From Aarhus to Zurich
  • 128 characters long
  • antidisestablishmentarianism
  • Hashing
  • Into M buckets
  • Load N/M
  • 8 different hash functions

3
Hash Functions
  • Key x c1 c2 clen(K)
  • Functions
  • h1(x) c1 mod M
  • This is really bad!
  • Since only have 52 characters
  • h2(x) ? ci mod M
  • Hashes not and ton to same bucket
  • h3(x) ? (ai ci) mod M
  • ais random 22-bit numbers
  • This should be a good function
  • h4(x) ? (ai ci bi) mod M
  • ais, bis random 22-bit numbers
  • This should be even better function

4
More Hash Functions
  • h5(x) ? (ai ci) mod M
  • ais random 22-bit numbers
  • All sums products computed module p 524,287
  • This should be a good function
  • h6(x) ? (ai ci bi) mod M
  • ais, bis random 22-bit numbers
  • All sums products computed module p 524,287
  • This should be the best function
  • h7(x) h6(first 5 characters of K)
  • hashes botch, botches, botching, and
    botched to same bucket
  • h8(x) random(0..M-1)
  • Not a real hash function
  • Should represent ideal case

5
Bucket Size Distribution
  • Experiment
  • Hash 45,402 keys into 128 buckets
  • Load 354.7
  • Average number of keys per bucket
  • Measure
  • Range of bucket sizes
  • Normalize as count/load
  • Average 1.0
  • Determines how well hash function does at
    distributing keys

6
Bucket Size Distribution Results
7
Bucket Size Dist. Results (closeup)
8
Distribution Observations
  • Load 354.7
  • h1 is really bad
  • only uses 52 buckets
  • Largest one has 4532 elements
  • h7 is pretty bad too
  • Good function, but only over first 5 characters
  • Largest has 529 elements
  • Rest look fairly decent
  • h2 441 max. Ignores order of characters
  • h4 428 max. Why not better than h3?
  • h6 409 max. Why not better than h5?
  • h8 403 max. Random
  • Hey! This should be the best!
  • h3 402 max. Mod p helps
  • h5 400 max. Mod p helps

9
Maximum Bucket Size
  • Experiment
  • Hash 45,402 keys into M buckets
  • M powers of 2 from 128 to 65,536
  • Load 354.7 to 0.69
  • Measure
  • Maximum bucket size
  • Normalize as count/load
  • Determines worst case access time

10
Max. Bucket Size Results
11
Max. Bucket Size Results (Closeup)
12
Bucket Size Observations
  • h1 is really bad
  • only uses 52 buckets
  • Largest one has 4532 elements, independent of M
  • h7 is pretty bad too
  • Good function, but only over first 5 characters
  • Largest bucket with M65,536 has 197 elements
  • h2 doesnt do very well
  • Ignores order of characters
  • Largest bucket with M65,536 has 163 elements
  • Rest are comparable
  • 67 elements in largest bucket for M 65,536
  • Compare to theory
  • When MN, Elargest bucket size log N / log
    log N
  • For M65,536, this would be 16/4 4.
Write a Comment
User Comments (0)
About PowerShow.com