Bloom filters - PowerPoint PPT Presentation

About This Presentation
Title:

Bloom filters

Description:

Title: PowerPoint Presentation Last modified by: ncnu Created Date: 1/1/1601 12:00:00 AM Document presentation format: Other titles – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 10
Provided by: edut1522
Category:
Tags: bloom | filters | hashing

less

Transcript and Presenter's Notes

Title: Bloom filters


1
Bloom filters
  • Probability and Computing
  • Randomized algorithms and probabilistic analysis
    P109P111
  • Michael Mitzenmacher Eli Upfal

2
Introduction
  • Approximate set membership problem .
  • Trade-off between the space and the false
    positive probability .
  • Generalize the hashing ideas.

3
Approximate set membership problem
  • Suppose we have a set
  • S s1,s2,...,sm ? universe U
  • Represent S in such a way we can quickly answer
    Is x an element of S ?
  • To take as little space as possible ,we allow
    false positive (i.e. x?S , but we answer yes )
  • If x?S , we must answer yes .

4
Bloom filters
  • Consist of an arrays An of n bits (space) , and
    k independent random hash functions
  • h1,,hk U --gt 0,1,..,n-1
  • 1. Initially set the array to 0
  • 2. ? s?S, Ahi(s) 1 for 1? i ? k
  • (an entry can be set to 1 multiple times, only
    the first times has an effect )
  • 3. To check if x?S , we check whether all
    location Ahi(x) for 1? i ? k are set to 1
  • If not, clearly x?S.
  • If all Ahi(x) are set to 1 ,we assume x?S

5
0
0
0
0
0
0
0
0
0
0
0
0
Initial with all 0
6
The probability of a false positive
  • We assume the hash function are random.
  • After all the elements of S are hashed into the
    bloom filters ,the probability that a specific
    bit is still 0 is

7
  • To simplify the analysis ,we can assume a
    fraction p of the entries are still 0 after all
    the elements of S are hashed into bloom filters.
  • In fact,let X be the random variable of number of
    those 0 positions. By Chernoff bound
  • It implies X/n will be very close to p with a
    very high probability

8
  • The probability of a false positive f is
  • To find the optimal k to minimize f .
  • Minimize f iff minimize gln(f)
  • kln(2)(n/m)
  • f (1/2)k (0.6185..)n/m
  • The false positive probability falls
    exponentially in n/m ,the number bits used per
    item !!

9
Conclusion
  • A Bloom filters is like a hash table ,and simply
    uses one bit to keep track whether an item hashed
    to the location.
  • If k1 , its equivalent to a hashing based
    fingerprint system.
  • If ncm for small constant c,such as c8 ,then
    k5 or 6 ,the false positive probability is just
    over 2 .
  • Its interesting that when k is optimal
  • kln(2)(n/m) , then p 1/2.
  • An optimized Bloom filters looks like a random
    bit-string
Write a Comment
User Comments (0)
About PowerShow.com