Anti-Persistence or History Independent Data Structures - PowerPoint PPT Presentation

About This Presentation

Title:

Anti-Persistence or History Independent Data Structures

Description:

history independent: if any two sequences of operations S1 and S2 that yield the ... This gives a history independent version of a huge class of data structures ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 41

Provided by: vaness83

Category:

more less

Transcript and Presenter's Notes

Title: Anti-Persistence or History Independent Data Structures

1
Anti-Persistence orHistory Independent Data
Structures

Moni Naor Vanessa Teague
Weizmann Institute Stanford

2
Why hide your history?

Core dumps
Losing your laptop
The entire memory representation
of data structures is exposed
Emailing files
The editing history may
be exposed (e.g. Word)
Maintaining lists of people
Sports teams, party invitees

3
Making sure that nobody learns from history

A data structure has
A legitimate interface the set of operations
allowed to be performed on it
A memory representation
The memory representation should reveal no
information that cannot be obtained from the
legitimate interface

4
History of history independence

Issue dealt with in Cryptographic and Data
Structures communities
Micciancio (1997) history independent trees
Motivation incremental crypto
Based on the shape of the data structure, not
including memory representation
Stronger performance model!
Uniquely represented data structures
Treaps (Seidel Aragon), uniquely represented
dictionaries
Ordered hash tables (Amble Knuth 1974)

5
More History

Persistent Data Structures possible to
reconstruct all previous states of the data
structure (Sarnak and Tarjan)
We want the opposite anti-persistence
Oblivious RAM (Goldreich and Ostrovsky)

6
Overview

Definitions
History independent open addressing hashing
History independent dynamic perfect hashing
Memory Management
(Union Find)
Open problems

7
Precise Definitions

A data structure is
history independent if any two sequences of
operations S1 and S2 that yield the same content
induce the same probability distribution on
memory representation
strongly history independent if given any two
sets of breakpoints along S1 and S2 s.t.
corresponding points have identical contents, S1
and S2 induce the same probability distributions
on memory representation at those points

8
Relaxations

Statistical closeness
Computational indistinguishability
Example where helpful erasing
Allow some information to be leaked
Total number of operations
n-history independent identical distributions if
the last n operations where identical as well
Under-defined data structures same query can
yield several legitimate answers,
e.g. approximate priority queue
Define identical content no suffix T such that
set of permitted results returned by S1 ? T is
different from the one returned by S2 ? T

9
History independence is easy (sort of)

If it is possible to decide the
(lexicographically) first sequence of
operations that produce a certain contents, just
store the result of that
This gives a history independent version of a
huge class of data structures
Efficiency is the problem

10
Dictionaries

Operations are insert(x), lookup(x) and possibly
delete(x)
The content of a dictionary is the set of
elements currently inserted (those that have been
inserted but not deleted)
Elements x ? U some universe
Size of table/memory N

11
Goal

Find a history independent implementation of
dictionaries with good provable performance.
Develop general techniques for history
independence

12
Approaches

Unique representation
e.g. array in sorted order
Yields strong history independence
Secret randomness
e.g. array in random order
only history independence

13
Open addressing traditional version

Each element x has a probe sequence
h1(x), h2(x), h3(x), ...
Linear probing h2(x) h1(x)1, h3(x) h1(x)2,
...
Double hashing
Uniform hashing
Element is inserted into the first free space in
its probe sequence
Search ends unsuccessfully at a free space
Efficient space utilization
Almost all the table can be full

14
Open addressing traditional version
Not history independent because later-inserted
elements move further along in their probe
sequence
y
x
x arrived before y, so move y
y
y
No clash, so insert y
15
History independent version

At each cell i, decide elements priorities
independently of insertion order
Call the priority function pi(x,y).
If there is a clash, move the element of lower
priority
At each cell, priorities must form a total order

16
Insertion
x
y
x
p2(x,y)? No, so move x
x
y
x
17
Search

Same as in the traditional algorithm
In unsuccessful search, can quit as soon as you
find a lower-priority element

No deletions

Problematic in open addressing anyway

18
Strong history independence

Claim
For all hash functions and priority functions,
the final configuration of the table is
independent of the order of insertion.
Conclusion
Strongly history independent

19
Proof of history independence
A static insertion algorithm (clearly history
independent)
Gather up the rejects and restart
x2
x1
x2
x2
p1(x2,x1) so insert x2
x3
x1
x3
x1
p3(x4,x5) and p3(x4,x6). Insert x4 and remove x5
x5
x5
insert x5
x5
x6
x4
x4
x4
x5
x4
x5
x4
x5
x6
x2
x2
x1
x1
x4
x3
x6
p1(x6,x4) and p6(x3,x6), so insert x3
x3
x3
20
Proof of history independence

Nothing moves further in the static algorithm
than in the dynamic one
By induction on rounds of the static alg.
Vice versa
By induction on the steps in the dynamic alg.
Strongly history independent

21
Some priority functions

Global
A single priority independent of cell
Random
Choose a random order at each cell
Youth-rules
Call an element younger if it has moved less
far along its probe sequence younger elements
get higher priority

22
Youth-rules
y
p2(x,y) because x has taken fewer steps than y
y
x
Use a tie-breaker if steps are equal This is a
priority function
y
x
23
Specifying a scheme

Priority rule
Choice of priority functions
In Youth-rules determined by probe sequence
Probe functions
How are they chosen
Maintained
Computed

24
Implementing Youth-rules

Let each hi be chosen from a pair-wise
independent collection
For any two x and y the r.v. hi(x) and hi(y) are
uniform and independent.
Let h1, h2, h3, be chosen independently
Example hi(x) (ai x mod U) bi mod N
Space 2 elements per function
Need only log N functions

25
Performance Analysis

Based on worst-case insertion sequence
The important parameter ? - the fraction of the
table that is used ? N elements
Analysis of expected insertion time and search
time (number of probes to the table)
Have to distinguish successful and unsuccessful
search

26
Analysis via the Static Algorithm

For insertions, the total number of probes in
static and dynamic algorithm are identical
Easier to analyze the static algorithm
Key point for Youth-rules in the phase i all
unsettled elements are in the ith probe in their
sequence
Assures fresh randomness of hi (x)

27
Performance

For Youth-rules, implemented as specified
For any sequence of insertion the expected
probe-time for insertion is at most 1/(1-?)
For any sequence of insertion the expected
probe-time for successful or unsuccessful search
is at most 1/(1-?)
Analysis based on static algorithm
? is the fraction of the table that is used

28
Comparison to double hashing

Analysis of double hashing with truly random
functions Guibas Szemeredi, Lueker
Molodowitch
Can be replaced by log n wise independent
functions (Schmidt Siegel)
log n wise independent is relatively expensive
either a lot of space or log n time
Youth-rules is a simple and provably efficient
scheme with very little extra storage
Extra benefit of considering history independence

29
Other Priority Functions

Amble Knuth log(1/(1-?)) for global
Truly random hash functions
Experiments show about log(1/(1-?)) for most
priority functions tried

30
Other types of data structures

Memory management (dealing with pointers)
Memory Allocation
Other state-related issues

31
Dynamic perfect hashingFKS scheme, dynamized
Low-level tables O(n) space total. Each gets
about si2
n elements to be inserted
Top-level table O(n) space
h0
x1
x3
s0
h1
h
s1
x5
x4
x6
x2
hk
sk
The hi are perfect on their respective sets.
Rechoose h or some hi to maintain perfection and
linear space.
32
A subtle problemthe intersection bias problem

Suppose we have
a set of states ?1, ?2, ...
a set of objects h1, h2, ...
a way to decide whether hi is good for ?j.
Keep a current h as states change
Change h only if it is no longer good.
Choose uniformly from the good ones for ?.
Then this is not history independent
h is biased towards the intersection of those
good for current ? and for previous states.

33
Dynamized FKS is not history independent

Does not erase upon deletion
Uses history-dependent memory allocation
Hash functions (h, h1, h2, ...) are changed
whenever they cease to be good
Hence they suffer from the intersection bias
problem, since they are biased towards functions
that were good for previous sets of elements
Hence they leak information about past sets of
elements

34
Making it history independent

Use history independent memory allocation
Upon deletion, erase the element and rechoose the
appropriate hi. This solves the low-level
intersection bias problem.
Some other minor changes
Solve the top-level intersection bias problem...

35
Solving the top-level intersection bias problem

Cant afford a top-level rehash on every deletion
Generate two potential hs ?1 and ?2 at the
beginning
Always use the first good one
If neither are good, rehash at every deletion
If not using ?1, keep a top-level table for it
for easy goodness checking (likewise for ?2)

36
Proof of history independence

Tables state is defined by
The current set of elements
Top-level hash functions
Always the first good ?i, or rechosen each step
Low-level hash functions
Uniformly chosen from perfect functions
Arrangement of sub-tables in memory
Use history-independent memory allocation
Some other history independent things

37
Performance

Lookup takes two steps
Insertion and deletion take expected amortized
O(1) time
There is a 1/poly chance that they will take more

38
Open Problems

Better analysis for youth-rules as well as other
priority functions with no random oracles.
Efficient memory allocation
ours is O(s log s)
Separations
Between strong and weak history independence
Between history independent and traditional
versions
e.g. for union find
Can persistence and (computational) history
independence co-exist efficiently?

39
Conclusion