Title: Look-up problem
1Look-up problem
IP address
did we see the IP address before?
2Hashing chaining
use IP address as an index
linked list
index
hash function
IP address
3How to choose a hash function?
depends on the distribution of the data
x? 0,1
x ? ?x.n?
interpret x as a number
x ? x mod n
IP-addresses n256 bad n257
good?
4Universal hash functions
choose a hash function randomly
n number of entries in the hash table U the
universe h U ? 0,...,n-1 a hash function
5Universal hash functions
choose a hash function randomly
n number of entries in the hash table U the
universe h U ? 0,...,n-1 a hash function
a set of hash functions H is universal if ?x,y?
U and random h ? H P ( h(x) h(y)
) ? 1/n
6Universal hash functions
a set of hash functions H is universal if ?x,y?
U and random h ? H P ( h(x) h(y)
) ? 1/n
For IP addresses choose a1,a2,a3,a4 ?
0,1,...,256 (x1,x2,x3,x4) ?
a1x1a2x2a3x3a4x4 mod 257
7Perfect hashing
Goal worst-case O(1) search space
used O(m) static set of elements
8Perfect hashing
Goal worst-case O(1) search space
used O(m) static set of elements
n m2 i.e., space used ?(m2)
H family of universal hash functions ? hash
function h? H with no collision
9Perfect hashing
Goal worst-case O(1) search space
used O(m)
n m
H family of universal hash functions
x1,...,xn the number of elements that map
to 1,2,...,n
? h? H such that ? xi2 O(m)
10Perfect hashing
? h? H such that ? xi2 O(m)
Goal worst-case O(1) search space
used O(m)
n m
H family of universal hash functions
x1,...,xn the number of elements that map
to 1,2,...,n
secondary hash table of size xi2
11Bloom filter
Goal store an m element subset of IP addresses
IP address
HASH
HASH
HASH
0
0
0
n-bits of storage
12Bloom filter - insert
INSERT(x) for i from 1 to k do
A(hi(x)) ? 1
IP address
HASH
HASH
HASH
1
1
1
n-bits of storage
13Bloom filter member
MEMBER(x) for i from 1 to k do if
A(hi(x))0 then return FALSE return TRUE
IP address
HASH
HASH
HASH
1
1
1
n-bits of storage
14Bloom filter member
MEMBER(x) for i from 1 to k do if
A(hi(x))0 then return FALSE return TRUE
sometimes gives false positive answer error
parameter false positive probability
15Bloom filter analysis
error parameter false positive probability
m number of items to be stored n number of
bits of storage k number of hash functions
16Bloom filter analysis
error parameter false positive probability
m number of items to be stored n number of
bits of storage k number of hash functions
p fraction of the bits filled
p ? e-km/n
17Bloom filter analysis
error parameter false positive probability
m number of items to be stored n number of
bits of storage k number of hash functions
p ? e-km/n
p fraction of the bits filled false positive
probability (1-p)k
18Bloom filter analysis
error parameter false positive probability
m number of items to be stored n number of
bits of storage k number of hash functions
optimal k ?
0.7 m/n false positive rate ?
0.6185m/n