History-Independent Cuckoo Hashing - PowerPoint PPT Presentation

About This Presentation

Title:

History-Independent Cuckoo Hashing

Description:

First student voted for Carol. Second student voted for Alice. Alice ... Carol. Alice. Alice. Bob. What about more involved applications? Write-in candidates ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 20

Provided by: MAST180

Category:

more less

Transcript and Presenter's Notes

Title: History-Independent Cuckoo Hashing

1
History-IndependentCuckoo Hashing
Udi Wieder
Moni Naor
Gil Segev
Weizmann InstituteIsrael
Microsoft Research Silicon Valley
2
Election Day
Carol
Alice
Alice
Bob

Elections for class president
Each student whispers in Mr. Drews ear
Mr. Drew writes down the votes

Carol

ProblemMr. Drews notebook leaks sensitive
information
First student voted for Carol
Second student voted for Alice

Alice
Alice
May compromise the privacy of the elections
Bob
3
Election Day

What about more involved applications?
Write-in candidates
Votes which are subsets or rankings
.

Carol
Alice
Alice
Bob
Alice
1
1

A simple solution
Lexicographically sorted list of candidates
Unary counters

Bob
1
Carol
1
4
Learning From History

The two levels of a data structure
Legitimate interface
Memory representation

History independence The memory representation
should not reveal information that cannot be
obtained using the legitimate interface

A simple example sorted list
Canonical memory representation
Not really efficient...

5
Typical Applications

Incremental cryptography BGG94, Mic97
Voting MKSW06, MNS07
Set comparison reconciliation MNS08
Computational geometry BGV08
...

6
Our Contribution
A HI dictionary that simultaneously achieves the
following

Efficiency
Lookup time O(1) worst case
Update time O(1) expected amortized
Memory utilization 50 (25 with deletions)

Strongest notion of history independence

Simple and fast

7
Notions of History Independence

Micciancio (1997) oblivious trees
Motivated by incremental cryptography
Only considered the shape of the trees and not
their memory representation

Naor and Teague (2001)
Memory representation
Weak strong history independence

8
Notions of History Independence
Naor and Teague (2001) following Macciancio
(1997)

Weak history independence
Memory revealed at the end of an activity period
Any two sequences of operations S1 and S2 that
lead to the same content induce the same
distribution on the memory representation

Strong history independence
Memory revealed several times during an activity
period
Any two sets of breakpoints along S1 and S2 with
the same content at each breakpoint, induce the
same distributions on the memory representation
at all these points
Completely randomizing memory after each
operation is not good enough

9
Notions of History Independence

We consider strong history independence
Canonical representation (up to initial
randomness) implies SHI
Other direction shown to hold for reversible data
structures HHMPR05

Weak strong are not equivalent
WHI for reversible data structures is possible
without a canonical representation
Provable efficiency gaps BP06 (in restricted
models)

9
10
SHI Dictionaries
Memory utilization
Update time
Lookup time
Deletions
Practical?
Naor Teague 01
O(1) expected
O(1) worst case
99
(mem. util. lt 50)
Blelloch Golovin 07
O(1) expected
O(1) expected
99
(mem. util. lt 50)
?
Blelloch Golovin 07
O(1) expected
O(1) worst case
lt 9
lt 25(lt 50)
O(1) expected
O(1) worst case
This work
11
Our Approach

Cuckoo hashing PR01A simple practical
scheme with worst case constant lookup time

Force a canonical representation on cuckoo
hashing
No significant loss in efficiency

Avoid rehashing by using a small stash
What happens when hash functions fail?
Rehashing is problematic in SHI data structures
All hash functions need to be sampled in advance
(theoretical problem)
When an item is deleted, may need to roll back on
previous functions
We use a secondary storage to reduces the failure
probability exponentially KMW08

12
Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1h1(x) and T2h2(x)

Insert(x)
Greedily insert in T1 or T2
If both are occupied then store x in T1
Repeat in other table with the previous occupant

T1
T2
T1
T2
V
V
Successful insertion
Z
Y
Z
Y
X
W
W
X
13
Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1h1(x) and T2h2(x)

Insert(x)
Greedily insert in T1 or T2
If both are occupied then store x in T1
Repeat in other table with the previous occupant

T1
T2
V
Failure rehash required
U
Z
Y
X
14
The Cuckoo Graph

Set S ½ U containing n keys
h1, h2 U ! 1,...,r

S is successfully stored
Every connected componenthas at most one cycle
Main theorem If r (1 ²)n and h1,h2 are
log(n)-wise independent,then failure probability
is (1/n)
Bipartite graph with sets of size r Edge (h1(x),
h2(x)) for every x2S
15
The Canonical Representation

Assume that S can be stored using h1 and h2
We force a canonical representation on the cuckoo
graph
Suffices to consider a single connected component

Assume that S forms a tree in the cuckoo graph.
Typical case
One location must be empty. The choice of the
empty location uniquely determines the location
of all elements

a
b
c
d
e
Rule h1 (minimal element) is empty
16
The Canonical Representation

Assume that S can be stored using h1 and h2
We force a canonical representation on the cuckoo
graph
Suffices to consider a single connected component

Assume that S has one cycle
Two ways to assign elements in the cycle
Each choice uniquely determines the location of
all elements

a
b
c
d
e
Rule minimal element in cycle lies in T1
17
The Canonical Representation

Updates efficiently maintain the canonical
representation
Insertions
New leaf check if new element is smaller than
current min
new cycle
Same component
Merging two components
All cases straight forward

Deletions
Find the new min, split component,
Requires connecting all elements in the component
with a sorted cyclic list
Memory utilization drops to 25
All cases straight forward

18
Rehashing

What if S cannot be stored using h1 and h2 ?
Happens with probability (1/n)

Can we simply pick new functions?
Rear, but very bad worst case performance
Canonical memory implies we need to sample all
hash functions in advance (theoretical problem)
Whenever an item is deleted, need to check
whether we must role back to previous hash
functions
A bad item which is repeatedly inserted and
deleted would cause a rehash every operation!

19
Using a Stash

Whenever an insert fails, put a bad item in a
secondary data structure
Bad item smallest item that belongs to a cycle
Secondary data structure must be SHI in itself

Theorem KMW08 Prstash gt s lt n-s

In practice keeping the stash as a sorted list is
probably the best solution
Effectively the query time is constant with
(very) high probability
In theory the stash could be any SHI with
constant lookup time
A deterministic hashing scheme, where the
elements are rehashed whenever the content
changes AN96, HMP01

20
Conclusions and Problems

Cuckoo hashing is a robust and flexible hashing
scheme
Easily molded into a history independent data
structure
We dont know how to do this for CH with more
than 2 hash functions and/or more than 1 element
per bucket
Better memory utilization, better performance,
but..
Expected size of connected component is not
constant
Full performance analysis