History-Independent Cuckoo Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

History-Independent Cuckoo Hashing

Description:

First student voted for Carol. Second student voted for Alice. Alice ... Carol. Alice. Alice. Bob. What about more involved applications? Write-in candidates ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 20
Provided by: MAST180
Category:

less

Transcript and Presenter's Notes

Title: History-Independent Cuckoo Hashing


1
History-IndependentCuckoo Hashing
Udi Wieder
Moni Naor
Gil Segev
Weizmann InstituteIsrael
Microsoft Research Silicon Valley
2
Election Day
Carol
Alice
Alice
Bob
  • Elections for class president
  • Each student whispers in Mr. Drews ear
  • Mr. Drew writes down the votes

Carol
  • ProblemMr. Drews notebook leaks sensitive
    information
  • First student voted for Carol
  • Second student voted for Alice

Alice
Alice
May compromise the privacy of the elections
Bob
3
Election Day
  • What about more involved applications?
  • Write-in candidates
  • Votes which are subsets or rankings
  • .

Carol
Alice
Alice
Bob
Alice
1
1
  • A simple solution
  • Lexicographically sorted list of candidates
  • Unary counters

Bob
1
Carol
1
4
Learning From History
  • The two levels of a data structure
  • Legitimate interface
  • Memory representation
  • History independence The memory representation
    should not reveal information that cannot be
    obtained using the legitimate interface
  • A simple example sorted list
  • Canonical memory representation
  • Not really efficient...

5
Typical Applications
  • Incremental cryptography BGG94, Mic97
  • Voting MKSW06, MNS07
  • Set comparison reconciliation MNS08
  • Computational geometry BGV08
  • ...

6
Our Contribution
A HI dictionary that simultaneously achieves the
following
  • Efficiency
  • Lookup time O(1) worst case
  • Update time O(1) expected amortized
  • Memory utilization 50 (25 with deletions)
  • Strongest notion of history independence
  • Simple and fast

7
Notions of History Independence
  • Micciancio (1997) oblivious trees
  • Motivated by incremental cryptography
  • Only considered the shape of the trees and not
    their memory representation
  • Naor and Teague (2001)
  • Memory representation
  • Weak strong history independence

8
Notions of History Independence
Naor and Teague (2001) following Macciancio
(1997)
  • Weak history independence
  • Memory revealed at the end of an activity period
  • Any two sequences of operations S1 and S2 that
    lead to the same content induce the same
    distribution on the memory representation
  • Strong history independence
  • Memory revealed several times during an activity
    period
  • Any two sets of breakpoints along S1 and S2 with
    the same content at each breakpoint, induce the
    same distributions on the memory representation
    at all these points
  • Completely randomizing memory after each
    operation is not good enough

9
Notions of History Independence
  • We consider strong history independence
  • Canonical representation (up to initial
    randomness) implies SHI
  • Other direction shown to hold for reversible data
    structures HHMPR05
  • Weak strong are not equivalent
  • WHI for reversible data structures is possible
    without a canonical representation
  • Provable efficiency gaps BP06 (in restricted
    models)

9
10
SHI Dictionaries
Memory utilization
Update time
Lookup time
Deletions
Practical?
Naor Teague 01
O(1) expected
O(1) worst case
99
(mem. util. lt 50)
Blelloch Golovin 07
O(1) expected
O(1) expected
99
(mem. util. lt 50)
?
Blelloch Golovin 07
O(1) expected
O(1) worst case
lt 9
lt 25(lt 50)
O(1) expected
O(1) worst case
This work
11
Our Approach
  • Cuckoo hashing PR01A simple practical
    scheme with worst case constant lookup time
  • Force a canonical representation on cuckoo
    hashing
  • No significant loss in efficiency
  • Avoid rehashing by using a small stash
  • What happens when hash functions fail?
  • Rehashing is problematic in SHI data structures
  • All hash functions need to be sampled in advance
    (theoretical problem)
  • When an item is deleted, may need to roll back on
    previous functions
  • We use a secondary storage to reduces the failure
    probability exponentially KMW08

12
Cuckoo Hashing
  • Tables T1 and T2 with hash functions h1 and h2
  • Store x in one of T1h1(x) and T2h2(x)
  • Insert(x)
  • Greedily insert in T1 or T2
  • If both are occupied then store x in T1
  • Repeat in other table with the previous occupant

T1
T2
T1
T2
V
V
Successful insertion
Z
Y
Z
Y
X
W
W
X
13
Cuckoo Hashing
  • Tables T1 and T2 with hash functions h1 and h2
  • Store x in one of T1h1(x) and T2h2(x)
  • Insert(x)
  • Greedily insert in T1 or T2
  • If both are occupied then store x in T1
  • Repeat in other table with the previous occupant

T1
T2
V
Failure rehash required
U
Z
Y
X
14
The Cuckoo Graph
  • Set S ½ U containing n keys
  • h1, h2 U ! 1,...,r

S is successfully stored
Every connected componenthas at most one cycle
Main theorem If r (1 ²)n and h1,h2 are
log(n)-wise independent,then failure probability
is (1/n)
Bipartite graph with sets of size r Edge (h1(x),
h2(x)) for every x2S
15
The Canonical Representation
  • Assume that S can be stored using h1 and h2
  • We force a canonical representation on the cuckoo
    graph
  • Suffices to consider a single connected component
  • Assume that S forms a tree in the cuckoo graph.
    Typical case
  • One location must be empty. The choice of the
    empty location uniquely determines the location
    of all elements

a
b
c
d
e
Rule h1 (minimal element) is empty
16
The Canonical Representation
  • Assume that S can be stored using h1 and h2
  • We force a canonical representation on the cuckoo
    graph
  • Suffices to consider a single connected component
  • Assume that S has one cycle
  • Two ways to assign elements in the cycle
  • Each choice uniquely determines the location of
    all elements

a
b
c
d
e
Rule minimal element in cycle lies in T1
17
The Canonical Representation
  • Updates efficiently maintain the canonical
    representation
  • Insertions
  • New leaf check if new element is smaller than
    current min
  • new cycle
  • Same component
  • Merging two components
  • All cases straight forward
  • Deletions
  • Find the new min, split component,
  • Requires connecting all elements in the component
    with a sorted cyclic list
  • Memory utilization drops to 25
  • All cases straight forward

18
Rehashing
  • What if S cannot be stored using h1 and h2 ?
  • Happens with probability (1/n)
  • Can we simply pick new functions?
  • Rear, but very bad worst case performance
  • Canonical memory implies we need to sample all
    hash functions in advance (theoretical problem)
  • Whenever an item is deleted, need to check
    whether we must role back to previous hash
    functions
  • A bad item which is repeatedly inserted and
    deleted would cause a rehash every operation!

19
Using a Stash
  • Whenever an insert fails, put a bad item in a
    secondary data structure
  • Bad item smallest item that belongs to a cycle
  • Secondary data structure must be SHI in itself
  • Theorem KMW08 Prstash gt s lt n-s
  • In practice keeping the stash as a sorted list is
    probably the best solution
  • Effectively the query time is constant with
    (very) high probability
  • In theory the stash could be any SHI with
    constant lookup time
  • A deterministic hashing scheme, where the
    elements are rehashed whenever the content
    changes AN96, HMP01

20
Conclusions and Problems
  • Cuckoo hashing is a robust and flexible hashing
    scheme
  • Easily molded into a history independent data
    structure
  • We dont know how to do this for CH with more
    than 2 hash functions and/or more than 1 element
    per bucket
  • Better memory utilization, better performance,
    but..
  • Expected size of connected component is not
    constant
  • Full performance analysis
Write a Comment
User Comments (0)
About PowerShow.com