Title: Anti-Persistence or History Independent Data Structures
1Anti-Persistence orHistory Independent Data
Structures
- Moni Naor Vanessa Teague
- Weizmann Institute Stanford
2Why hide your history?
- Core dumps
- Losing your laptop
- The entire memory representation
- of data structures is exposed
- Emailing files
- The editing history may
- be exposed (e.g. Word)
- Maintaining lists of people
- Sports teams, party invitees
3Making sure that nobody learns from history
- A data structure has
- A legitimate interface the set of operations
allowed to be performed on it - A memory representation
- The memory representation should reveal no
information that cannot be obtained from the
legitimate interface
4History of history independence
- Issue dealt with in Cryptographic and Data
Structures communities - Micciancio (1997) history independent trees
- Motivation incremental crypto
- Based on the shape of the data structure, not
including memory representation - Stronger performance model!
- Uniquely represented data structures
- Treaps (Seidel Aragon), uniquely represented
dictionaries - Ordered hash tables (Amble Knuth 1974)
5More History
- Persistent Data Structures possible to
reconstruct all previous states of the data
structure (Sarnak and Tarjan) - We want the opposite anti-persistence
- Oblivious RAM (Goldreich and Ostrovsky)
6Overview
- Definitions
- History independent open addressing hashing
- History independent dynamic perfect hashing
- Memory Management
- (Union Find)
- Open problems
7Precise Definitions
- A data structure is
- history independent if any two sequences of
operations S1 and S2 that yield the same content
induce the same probability distribution on
memory representation - strongly history independent if given any two
sets of breakpoints along S1 and S2 s.t.
corresponding points have identical contents, S1
and S2 induce the same probability distributions
on memory representation at those points
8Relaxations
- Statistical closeness
- Computational indistinguishability
- Example where helpful erasing
- Allow some information to be leaked
- Total number of operations
- n-history independent identical distributions if
the last n operations where identical as well - Under-defined data structures same query can
yield several legitimate answers, - e.g. approximate priority queue
- Define identical content no suffix T such that
set of permitted results returned by S1 ? T is
different from the one returned by S2 ? T
9History independence is easy (sort of)
- If it is possible to decide the
(lexicographically) first sequence of
operations that produce a certain contents, just
store the result of that - This gives a history independent version of a
huge class of data structures - Efficiency is the problem
10Dictionaries
- Operations are insert(x), lookup(x) and possibly
delete(x) - The content of a dictionary is the set of
elements currently inserted (those that have been
inserted but not deleted) - Elements x ? U some universe
- Size of table/memory N
11Goal
- Find a history independent implementation of
dictionaries with good provable performance. - Develop general techniques for history
independence
12Approaches
- Unique representation
- e.g. array in sorted order
- Yields strong history independence
- Secret randomness
- e.g. array in random order
- only history independence
13Open addressing traditional version
- Each element x has a probe sequence
- h1(x), h2(x), h3(x), ...
- Linear probing h2(x) h1(x)1, h3(x) h1(x)2,
... - Double hashing
- Uniform hashing
- Element is inserted into the first free space in
its probe sequence - Search ends unsuccessfully at a free space
- Efficient space utilization
- Almost all the table can be full
14Open addressing traditional version
Not history independent because later-inserted
elements move further along in their probe
sequence
y
x
x arrived before y, so move y
y
y
No clash, so insert y
15History independent version
- At each cell i, decide elements priorities
independently of insertion order - Call the priority function pi(x,y).
- If there is a clash, move the element of lower
priority - At each cell, priorities must form a total order
16Insertion
x
y
x
p2(x,y)? No, so move x
x
y
x
17Search
- Same as in the traditional algorithm
- In unsuccessful search, can quit as soon as you
find a lower-priority element
No deletions
- Problematic in open addressing anyway
18Strong history independence
- Claim
- For all hash functions and priority functions,
the final configuration of the table is
independent of the order of insertion. - Conclusion
- Strongly history independent
19Proof of history independence
A static insertion algorithm (clearly history
independent)
Gather up the rejects and restart
x2
x1
x2
x2
p1(x2,x1) so insert x2
x3
x1
x3
x1
p3(x4,x5) and p3(x4,x6). Insert x4 and remove x5
x5
x5
insert x5
x5
x6
x4
x4
x4
x5
x4
x5
x4
x5
x6
x2
x2
x1
x1
x4
x3
x6
p1(x6,x4) and p6(x3,x6), so insert x3
x3
x3
20Proof of history independence
- Nothing moves further in the static algorithm
than in the dynamic one - By induction on rounds of the static alg.
- Vice versa
- By induction on the steps in the dynamic alg.
- Strongly history independent
21Some priority functions
- Global
- A single priority independent of cell
- Random
- Choose a random order at each cell
- Youth-rules
- Call an element younger if it has moved less
far along its probe sequence younger elements
get higher priority
22Youth-rules
y
p2(x,y) because x has taken fewer steps than y
y
x
Use a tie-breaker if steps are equal This is a
priority function
y
x
23Specifying a scheme
- Priority rule
- Choice of priority functions
- In Youth-rules determined by probe sequence
- Probe functions
- How are they chosen
- Maintained
- Computed
24Implementing Youth-rules
- Let each hi be chosen from a pair-wise
independent collection - For any two x and y the r.v. hi(x) and hi(y) are
uniform and independent. - Let h1, h2, h3, be chosen independently
- Example hi(x) (ai x mod U) bi mod N
- Space 2 elements per function
- Need only log N functions
25Performance Analysis
- Based on worst-case insertion sequence
- The important parameter ? - the fraction of the
table that is used ? N elements - Analysis of expected insertion time and search
time (number of probes to the table) - Have to distinguish successful and unsuccessful
search
26Analysis via the Static Algorithm
- For insertions, the total number of probes in
static and dynamic algorithm are identical - Easier to analyze the static algorithm
- Key point for Youth-rules in the phase i all
unsettled elements are in the ith probe in their
sequence - Assures fresh randomness of hi (x)
27Performance
- For Youth-rules, implemented as specified
- For any sequence of insertion the expected
probe-time for insertion is at most 1/(1-?) - For any sequence of insertion the expected
probe-time for successful or unsuccessful search
is at most 1/(1-?) - Analysis based on static algorithm
- ? is the fraction of the table that is used
28Comparison to double hashing
- Analysis of double hashing with truly random
functions Guibas Szemeredi, Lueker
Molodowitch - Can be replaced by log n wise independent
functions (Schmidt Siegel) - log n wise independent is relatively expensive
- either a lot of space or log n time
- Youth-rules is a simple and provably efficient
scheme with very little extra storage - Extra benefit of considering history independence
29Other Priority Functions
- Amble Knuth log(1/(1-?)) for global
- Truly random hash functions
- Experiments show about log(1/(1-?)) for most
priority functions tried
30Other types of data structures
- Memory management (dealing with pointers)
- Memory Allocation
- Other state-related issues
31Dynamic perfect hashingFKS scheme, dynamized
Low-level tables O(n) space total. Each gets
about si2
n elements to be inserted
Top-level table O(n) space
h0
x1
x3
s0
h1
h
s1
x5
x4
x6
x2
hk
sk
The hi are perfect on their respective sets.
Rechoose h or some hi to maintain perfection and
linear space.
32A subtle problemthe intersection bias problem
- Suppose we have
- a set of states ?1, ?2, ...
- a set of objects h1, h2, ...
- a way to decide whether hi is good for ?j.
- Keep a current h as states change
- Change h only if it is no longer good.
- Choose uniformly from the good ones for ?.
- Then this is not history independent
- h is biased towards the intersection of those
good for current ? and for previous states.
33Dynamized FKS is not history independent
- Does not erase upon deletion
- Uses history-dependent memory allocation
- Hash functions (h, h1, h2, ...) are changed
whenever they cease to be good - Hence they suffer from the intersection bias
problem, since they are biased towards functions
that were good for previous sets of elements - Hence they leak information about past sets of
elements
34Making it history independent
- Use history independent memory allocation
- Upon deletion, erase the element and rechoose the
appropriate hi. This solves the low-level
intersection bias problem. - Some other minor changes
- Solve the top-level intersection bias problem...
35Solving the top-level intersection bias problem
- Cant afford a top-level rehash on every deletion
- Generate two potential hs ?1 and ?2 at the
beginning - Always use the first good one
- If neither are good, rehash at every deletion
- If not using ?1, keep a top-level table for it
for easy goodness checking (likewise for ?2)
36Proof of history independence
- Tables state is defined by
- The current set of elements
- Top-level hash functions
- Always the first good ?i, or rechosen each step
- Low-level hash functions
- Uniformly chosen from perfect functions
- Arrangement of sub-tables in memory
- Use history-independent memory allocation
- Some other history independent things
37Performance
- Lookup takes two steps
- Insertion and deletion take expected amortized
O(1) time - There is a 1/poly chance that they will take more
38Open Problems
- Better analysis for youth-rules as well as other
priority functions with no random oracles. - Efficient memory allocation
- ours is O(s log s)
- Separations
- Between strong and weak history independence
- Between history independent and traditional
versions - e.g. for union find
- Can persistence and (computational) history
independence co-exist efficiently?
39Conclusion
- History independence can be subtle
- We have two history independent hash tables
- Based on open addressing
- Very space efficient but no deletion
- Dynamic perfect hashing
- Allows deletion, constant-time lookup
40Open addressing implementing hash functions
- For all i, generate random independent ai, bi
- hi(x) (aix mod U bi) mod N
- U size of universe prime
- N size of hash table
- xs probe sequence is h1(x), h2(x), h3(x), ...
- We need log n hash functions
- n is the number of elements in the table