Title: History-Independent Cuckoo Hashing
1History-IndependentCuckoo Hashing
Udi Wieder
Moni Naor
Gil Segev
Weizmann InstituteIsrael
Microsoft Research Silicon Valley
2Election Day
Carol
Alice
Alice
Bob
- Elections for class president
- Each student whispers in Mr. Drews ear
- Mr. Drew writes down the votes
Carol
- ProblemMr. Drews notebook leaks sensitive
information - First student voted for Carol
- Second student voted for Alice
Alice
Alice
May compromise the privacy of the elections
Bob
3Election Day
- What about more involved applications?
- Write-in candidates
- Votes which are subsets or rankings
- .
Carol
Alice
Alice
Bob
Alice
1
1
- A simple solution
- Lexicographically sorted list of candidates
- Unary counters
Bob
1
Carol
1
4Learning From History
- The two levels of a data structure
- Legitimate interface
- Memory representation
- History independence The memory representation
should not reveal information that cannot be
obtained using the legitimate interface
- A simple example sorted list
- Canonical memory representation
- Not really efficient...
5Typical Applications
- Incremental cryptography BGG94, Mic97
- Voting MKSW06, MNS07
- Set comparison reconciliation MNS08
- Computational geometry BGV08
- ...
6Our Contribution
A HI dictionary that simultaneously achieves the
following
- Efficiency
- Lookup time O(1) worst case
- Update time O(1) expected amortized
- Memory utilization 50 (25 with deletions)
- Strongest notion of history independence
7Notions of History Independence
- Micciancio (1997) oblivious trees
- Motivated by incremental cryptography
- Only considered the shape of the trees and not
their memory representation
- Naor and Teague (2001)
- Memory representation
- Weak strong history independence
8Notions of History Independence
Naor and Teague (2001) following Macciancio
(1997)
- Weak history independence
- Memory revealed at the end of an activity period
- Any two sequences of operations S1 and S2 that
lead to the same content induce the same
distribution on the memory representation
- Strong history independence
- Memory revealed several times during an activity
period - Any two sets of breakpoints along S1 and S2 with
the same content at each breakpoint, induce the
same distributions on the memory representation
at all these points - Completely randomizing memory after each
operation is not good enough
9Notions of History Independence
- We consider strong history independence
- Canonical representation (up to initial
randomness) implies SHI - Other direction shown to hold for reversible data
structures HHMPR05
- Weak strong are not equivalent
- WHI for reversible data structures is possible
without a canonical representation - Provable efficiency gaps BP06 (in restricted
models)
9
10SHI Dictionaries
Memory utilization
Update time
Lookup time
Deletions
Practical?
Naor Teague 01
O(1) expected
O(1) worst case
99
(mem. util. lt 50)
Blelloch Golovin 07
O(1) expected
O(1) expected
99
(mem. util. lt 50)
?
Blelloch Golovin 07
O(1) expected
O(1) worst case
lt 9
lt 25(lt 50)
O(1) expected
O(1) worst case
This work
11Our Approach
- Cuckoo hashing PR01A simple practical
scheme with worst case constant lookup time
- Force a canonical representation on cuckoo
hashing - No significant loss in efficiency
- Avoid rehashing by using a small stash
- What happens when hash functions fail?
- Rehashing is problematic in SHI data structures
- All hash functions need to be sampled in advance
(theoretical problem) - When an item is deleted, may need to roll back on
previous functions - We use a secondary storage to reduces the failure
probability exponentially KMW08
12Cuckoo Hashing
- Tables T1 and T2 with hash functions h1 and h2
- Store x in one of T1h1(x) and T2h2(x)
- Insert(x)
- Greedily insert in T1 or T2
- If both are occupied then store x in T1
- Repeat in other table with the previous occupant
T1
T2
T1
T2
V
V
Successful insertion
Z
Y
Z
Y
X
W
W
X
13Cuckoo Hashing
- Tables T1 and T2 with hash functions h1 and h2
- Store x in one of T1h1(x) and T2h2(x)
- Insert(x)
- Greedily insert in T1 or T2
- If both are occupied then store x in T1
- Repeat in other table with the previous occupant
T1
T2
V
Failure rehash required
U
Z
Y
X
14The Cuckoo Graph
- Set S ½ U containing n keys
- h1, h2 U ! 1,...,r
S is successfully stored
Every connected componenthas at most one cycle
Main theorem If r (1 ²)n and h1,h2 are
log(n)-wise independent,then failure probability
is (1/n)
Bipartite graph with sets of size r Edge (h1(x),
h2(x)) for every x2S
15The Canonical Representation
- Assume that S can be stored using h1 and h2
- We force a canonical representation on the cuckoo
graph - Suffices to consider a single connected component
- Assume that S forms a tree in the cuckoo graph.
Typical case - One location must be empty. The choice of the
empty location uniquely determines the location
of all elements
a
b
c
d
e
Rule h1 (minimal element) is empty
16The Canonical Representation
- Assume that S can be stored using h1 and h2
- We force a canonical representation on the cuckoo
graph - Suffices to consider a single connected component
- Assume that S has one cycle
- Two ways to assign elements in the cycle
- Each choice uniquely determines the location of
all elements
a
b
c
d
e
Rule minimal element in cycle lies in T1
17The Canonical Representation
- Updates efficiently maintain the canonical
representation - Insertions
- New leaf check if new element is smaller than
current min - new cycle
- Same component
- Merging two components
- All cases straight forward
- Deletions
- Find the new min, split component,
- Requires connecting all elements in the component
with a sorted cyclic list - Memory utilization drops to 25
- All cases straight forward
18Rehashing
- What if S cannot be stored using h1 and h2 ?
- Happens with probability (1/n)
- Can we simply pick new functions?
- Rear, but very bad worst case performance
- Canonical memory implies we need to sample all
hash functions in advance (theoretical problem) - Whenever an item is deleted, need to check
whether we must role back to previous hash
functions - A bad item which is repeatedly inserted and
deleted would cause a rehash every operation!
19Using a Stash
- Whenever an insert fails, put a bad item in a
secondary data structure - Bad item smallest item that belongs to a cycle
- Secondary data structure must be SHI in itself
- Theorem KMW08 Prstash gt s lt n-s
- In practice keeping the stash as a sorted list is
probably the best solution - Effectively the query time is constant with
(very) high probability - In theory the stash could be any SHI with
constant lookup time - A deterministic hashing scheme, where the
elements are rehashed whenever the content
changes AN96, HMP01
20Conclusions and Problems
- Cuckoo hashing is a robust and flexible hashing
scheme - Easily molded into a history independent data
structure - We dont know how to do this for CH with more
than 2 hash functions and/or more than 1 element
per bucket - Better memory utilization, better performance,
but.. - Expected size of connected component is not
constant - Full performance analysis