Hash Functions - PowerPoint PPT Presentation

About This Presentation
Title:

Hash Functions

Description:

Hash(K) = rand() What is wrong? Not repeatable. How About... K P, P = prime number. Hash(K) = rand(K) % P. Better randomness. Can be expensive to compute random ... – PowerPoint PPT presentation

Number of Views:355
Avg rating:3.0/5.0
Slides: 23
Provided by: csF2
Learn more at: http://www.cs.fsu.edu
Category:
Tags: functions | hash | rand

less

Transcript and Presenter's Notes

Title: Hash Functions


1
Hash Functions
  • Andy Wang
  • Data Structures, Algorithms, and Generic
    Programming

2
Introduction
  • Hash function
  • Maps keys to integers (buckets)
  • Hash(Key) Integer
  • Ideally in a random-like manner
  • Evenly distributed bucket values
  • Even if the input data is not evenly distributed

3
An Example
  • ID Number Generation
  • Key your name
  • Hash(Key) a number
  • Not a great hash function
  • Two people with the same name will have the same
    number

4
Simple Hash Functions
  • Assumptions
  • K an unsigned 32-bit integer
  • M the number of buckets (the number of entries
    in a hash table)
  • Goal
  • If a bit is changed in K, all bits are equally
    likely to change for Hash(K)

5
A Simple Hash Function
  • What if K M?
  • Hash(K) K
  • What is wrong?
  • Your student ID SSN
  • I cant use your SSN to post your grades

6
Another Simple Function
  • If K gt M
  • Hash(K) K M
  • What is wrong?
  • Suppose M 4, K 2, 4, 6, 8
  • K M 2, 0, 2, 0

7
Yet Another Simple Function
  • If K gt P, P prime number
  • Hash(K) K P
  • Suppose P 3, K 2, 4, 6, 8
  • K P 2, 1, 0, 3
  • More uniform distributionbut still problematic
    for other cases

8
More on Prime Numbers
  • K gt P1 gt P2, P1 and P2 are prime numbers
  • Hash(K) (K P1) P2
  • Suppose P1 5, P2 3, K 2, 4, 6, 8, 10
  • (K 5) 2, 4, 1, 3, 0
  • (K 5) 3 2, 1, 1, 0, 0
  • Still uniform distribution

9
Polynomial Functions
  • If K gt P, P prime number
  • Hash(K) K(K 3) P
  • Slightly better than pure modulo functions

10
How About
  • Hash(K) rand()
  • What is wrong?
  • Not repeatable

11
How About
  • K gt P, P prime number
  • Hash(K) rand(K) P
  • Better randomness
  • Can be expensive to compute random numbers

12
Pre-generated Randomness
  • Two prime numbers P1 and P2
  • K gt P1 and K gt P2
  • A table RP1, with Ri pre-initialized to
    rand(i) P2
  • Hash(K) RK P1
  • Slight Problem Possible duplicate mapping

13
To Avoid Duplicate Mapping
  • Two prime numbers P1 and P2
  • K gt P1 and K gt P2
  • A table RP1, with Ri pre-initialized to
    unique random numbers
  • Hash(K) RK P1

14
An Example
  • K 0232, P1 3, P2 5
  • R3 0, 4, 1
  • Hash(K) RK 3

15
Hashing a Sequence of Keys
  • K K1, K2, , Kn)
  • E.g., Hash(test) 98157
  • Design Principles
  • Use the entire key
  • Use the ordering information
  • Use pre-generated randomness

16
Use the Entire Key
  • unsigned int Hash(const char Key)
  • unsigned int hash 0
  • for (unsigned int j 0 j lt K j)
  • hash hash Keyj
  • return hash
  • Problem Hash(ab) Hash(ba)

17
Use the Ordering Information
  • unsigned int Hash(const char Key)
  • unsigned int hash 0
  • for (unsigned int j 0 j lt K j)
  • hash hash Keyj
  • hash / hash with some shiftings /
  • return hash
  • Problem H(short keys) will not perturb all
    32-bits (clustering)

18
Use Pre-generated Randomness
  • unsigned int Hash(const char Key)
  • unsigned int hash 0
  • for (unsigned int j 0 j lt K j)
  • hash hash RKeyj
  • hash / hash with some shiftings /
  • return hash

19
CRC Variant
  • Do 5-bit circular shift of hash
  • XOR hash and Kj
  • for ()
  • highorder hash 0xf8000000
  • hash hash ltlt 5
  • hash hash (highorder gtgt 27)
  • hash hash Kj

20
CRC Variant
  • For long keys, all 32-bits are exercised
  • More randomness toward lower bits
  • - Not all bits are changed for short keys

21
BUZ Hash
  • Set up an array R to store precomputed random
    numbers
  • for ()
  • highorder hash 0x80000000
  • hash hash ltlt 1
  • hash hash (highorder gtgt 31)
  • hash hash RKj

22
References
  • Aho, Sethi, and Ullman. Compilers Principles,
    Techniques, and Tools, 1986.
  • Cormen, Leiserson, River. Introduction to
    Algorithms, 1990
  • Knuth. The Art of Computer Programming, 1973
  • Kuenning. Hash Functions, 2003.
Write a Comment
User Comments (0)
About PowerShow.com