Anti-Persistence or History Independent Data Structures - PowerPoint PPT Presentation

About This Presentation
Title:

Anti-Persistence or History Independent Data Structures

Description:

history independent: if any two sequences of operations S1 and S2 that yield the ... This gives a history independent version of a huge class of data structures ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 41
Provided by: vaness83
Category:

less

Transcript and Presenter's Notes

Title: Anti-Persistence or History Independent Data Structures


1
Anti-Persistence orHistory Independent Data
Structures
  • Moni Naor Vanessa Teague
  • Weizmann Institute Stanford

2
Why hide your history?
  • Core dumps
  • Losing your laptop
  • The entire memory representation
  • of data structures is exposed
  • Emailing files
  • The editing history may
  • be exposed (e.g. Word)
  • Maintaining lists of people
  • Sports teams, party invitees

3
Making sure that nobody learns from history
  • A data structure has
  • A legitimate interface the set of operations
    allowed to be performed on it
  • A memory representation
  • The memory representation should reveal no
    information that cannot be obtained from the
    legitimate interface

4
History of history independence
  • Issue dealt with in Cryptographic and Data
    Structures communities
  • Micciancio (1997) history independent trees
  • Motivation incremental crypto
  • Based on the shape of the data structure, not
    including memory representation
  • Stronger performance model!
  • Uniquely represented data structures
  • Treaps (Seidel Aragon), uniquely represented
    dictionaries
  • Ordered hash tables (Amble Knuth 1974)

5
More History
  • Persistent Data Structures possible to
    reconstruct all previous states of the data
    structure (Sarnak and Tarjan)
  • We want the opposite anti-persistence
  • Oblivious RAM (Goldreich and Ostrovsky)

6
Overview
  • Definitions
  • History independent open addressing hashing
  • History independent dynamic perfect hashing
  • Memory Management
  • (Union Find)
  • Open problems

7
Precise Definitions
  • A data structure is
  • history independent if any two sequences of
    operations S1 and S2 that yield the same content
    induce the same probability distribution on
    memory representation
  • strongly history independent if given any two
    sets of breakpoints along S1 and S2 s.t.
    corresponding points have identical contents, S1
    and S2 induce the same probability distributions
    on memory representation at those points

8
Relaxations
  • Statistical closeness
  • Computational indistinguishability
  • Example where helpful erasing
  • Allow some information to be leaked
  • Total number of operations
  • n-history independent identical distributions if
    the last n operations where identical as well
  • Under-defined data structures same query can
    yield several legitimate answers,
  • e.g. approximate priority queue
  • Define identical content no suffix T such that
    set of permitted results returned by S1 ? T is
    different from the one returned by S2 ? T

9
History independence is easy (sort of)
  • If it is possible to decide the
    (lexicographically) first sequence of
    operations that produce a certain contents, just
    store the result of that
  • This gives a history independent version of a
    huge class of data structures
  • Efficiency is the problem

10
Dictionaries
  • Operations are insert(x), lookup(x) and possibly
    delete(x)
  • The content of a dictionary is the set of
    elements currently inserted (those that have been
    inserted but not deleted)
  • Elements x ? U some universe
  • Size of table/memory N

11
Goal
  • Find a history independent implementation of
    dictionaries with good provable performance.
  • Develop general techniques for history
    independence

12
Approaches
  • Unique representation
  • e.g. array in sorted order
  • Yields strong history independence
  • Secret randomness
  • e.g. array in random order
  • only history independence

13
Open addressing traditional version
  • Each element x has a probe sequence
  • h1(x), h2(x), h3(x), ...
  • Linear probing h2(x) h1(x)1, h3(x) h1(x)2,
    ...
  • Double hashing
  • Uniform hashing
  • Element is inserted into the first free space in
    its probe sequence
  • Search ends unsuccessfully at a free space
  • Efficient space utilization
  • Almost all the table can be full

14
Open addressing traditional version
Not history independent because later-inserted
elements move further along in their probe
sequence
y
x
x arrived before y, so move y
y
y
No clash, so insert y
15
History independent version
  • At each cell i, decide elements priorities
    independently of insertion order
  • Call the priority function pi(x,y).
  • If there is a clash, move the element of lower
    priority
  • At each cell, priorities must form a total order

16
Insertion
x
y
x
p2(x,y)? No, so move x
x
y
x
17
Search
  • Same as in the traditional algorithm
  • In unsuccessful search, can quit as soon as you
    find a lower-priority element

No deletions
  • Problematic in open addressing anyway

18
Strong history independence
  • Claim
  • For all hash functions and priority functions,
    the final configuration of the table is
    independent of the order of insertion.
  • Conclusion
  • Strongly history independent

19
Proof of history independence
A static insertion algorithm (clearly history
independent)
Gather up the rejects and restart
x2
x1
x2
x2
p1(x2,x1) so insert x2
x3
x1
x3
x1
p3(x4,x5) and p3(x4,x6). Insert x4 and remove x5
x5
x5
insert x5
x5
x6
x4
x4
x4
x5
x4
x5
x4
x5
x6
x2
x2
x1
x1
x4
x3
x6
p1(x6,x4) and p6(x3,x6), so insert x3
x3
x3
20
Proof of history independence
  • Nothing moves further in the static algorithm
    than in the dynamic one
  • By induction on rounds of the static alg.
  • Vice versa
  • By induction on the steps in the dynamic alg.
  • Strongly history independent

21
Some priority functions
  • Global
  • A single priority independent of cell
  • Random
  • Choose a random order at each cell
  • Youth-rules
  • Call an element younger if it has moved less
    far along its probe sequence younger elements
    get higher priority

22
Youth-rules
y
p2(x,y) because x has taken fewer steps than y
y
x
Use a tie-breaker if steps are equal This is a
priority function
y
x
23
Specifying a scheme
  • Priority rule
  • Choice of priority functions
  • In Youth-rules determined by probe sequence
  • Probe functions
  • How are they chosen
  • Maintained
  • Computed

24
Implementing Youth-rules
  • Let each hi be chosen from a pair-wise
    independent collection
  • For any two x and y the r.v. hi(x) and hi(y) are
    uniform and independent.
  • Let h1, h2, h3, be chosen independently
  • Example hi(x) (ai x mod U) bi mod N
  • Space 2 elements per function
  • Need only log N functions

25
Performance Analysis
  • Based on worst-case insertion sequence
  • The important parameter ? - the fraction of the
    table that is used ? N elements
  • Analysis of expected insertion time and search
    time (number of probes to the table)
  • Have to distinguish successful and unsuccessful
    search

26
Analysis via the Static Algorithm
  • For insertions, the total number of probes in
    static and dynamic algorithm are identical
  • Easier to analyze the static algorithm
  • Key point for Youth-rules in the phase i all
    unsettled elements are in the ith probe in their
    sequence
  • Assures fresh randomness of hi (x)

27
Performance
  • For Youth-rules, implemented as specified
  • For any sequence of insertion the expected
    probe-time for insertion is at most 1/(1-?)
  • For any sequence of insertion the expected
    probe-time for successful or unsuccessful search
    is at most 1/(1-?)
  • Analysis based on static algorithm
  • ? is the fraction of the table that is used

28
Comparison to double hashing
  • Analysis of double hashing with truly random
    functions Guibas Szemeredi, Lueker
    Molodowitch
  • Can be replaced by log n wise independent
    functions (Schmidt Siegel)
  • log n wise independent is relatively expensive
  • either a lot of space or log n time
  • Youth-rules is a simple and provably efficient
    scheme with very little extra storage
  • Extra benefit of considering history independence

29
Other Priority Functions
  • Amble Knuth log(1/(1-?)) for global
  • Truly random hash functions
  • Experiments show about log(1/(1-?)) for most
    priority functions tried

30
Other types of data structures
  • Memory management (dealing with pointers)
  • Memory Allocation
  • Other state-related issues

31
Dynamic perfect hashingFKS scheme, dynamized
Low-level tables O(n) space total. Each gets
about si2
n elements to be inserted
Top-level table O(n) space
h0
x1
x3
s0
h1
h
s1
x5
x4
x6
x2
hk
sk
The hi are perfect on their respective sets.
Rechoose h or some hi to maintain perfection and
linear space.
32
A subtle problemthe intersection bias problem
  • Suppose we have
  • a set of states ?1, ?2, ...
  • a set of objects h1, h2, ...
  • a way to decide whether hi is good for ?j.
  • Keep a current h as states change
  • Change h only if it is no longer good.
  • Choose uniformly from the good ones for ?.
  • Then this is not history independent
  • h is biased towards the intersection of those
    good for current ? and for previous states.

33
Dynamized FKS is not history independent
  • Does not erase upon deletion
  • Uses history-dependent memory allocation
  • Hash functions (h, h1, h2, ...) are changed
    whenever they cease to be good
  • Hence they suffer from the intersection bias
    problem, since they are biased towards functions
    that were good for previous sets of elements
  • Hence they leak information about past sets of
    elements

34
Making it history independent
  • Use history independent memory allocation
  • Upon deletion, erase the element and rechoose the
    appropriate hi. This solves the low-level
    intersection bias problem.
  • Some other minor changes
  • Solve the top-level intersection bias problem...

35
Solving the top-level intersection bias problem
  • Cant afford a top-level rehash on every deletion
  • Generate two potential hs ?1 and ?2 at the
    beginning
  • Always use the first good one
  • If neither are good, rehash at every deletion
  • If not using ?1, keep a top-level table for it
    for easy goodness checking (likewise for ?2)

36
Proof of history independence
  • Tables state is defined by
  • The current set of elements
  • Top-level hash functions
  • Always the first good ?i, or rechosen each step
  • Low-level hash functions
  • Uniformly chosen from perfect functions
  • Arrangement of sub-tables in memory
  • Use history-independent memory allocation
  • Some other history independent things

37
Performance
  • Lookup takes two steps
  • Insertion and deletion take expected amortized
    O(1) time
  • There is a 1/poly chance that they will take more

38
Open Problems
  • Better analysis for youth-rules as well as other
    priority functions with no random oracles.
  • Efficient memory allocation
  • ours is O(s log s)
  • Separations
  • Between strong and weak history independence
  • Between history independent and traditional
    versions
  • e.g. for union find
  • Can persistence and (computational) history
    independence co-exist efficiently?

39
Conclusion
  • History independence can be subtle
  • We have two history independent hash tables
  • Based on open addressing
  • Very space efficient but no deletion
  • Dynamic perfect hashing
  • Allows deletion, constant-time lookup

40
Open addressing implementing hash functions
  • For all i, generate random independent ai, bi
  • hi(x) (aix mod U bi) mod N
  • U size of universe prime
  • N size of hash table
  • xs probe sequence is h1(x), h2(x), h3(x), ...
  • We need log n hash functions
  • n is the number of elements in the table
Write a Comment
User Comments (0)
About PowerShow.com