Title: Uniform algorithms for deterministic construction of efficient dictionaries
1Uniform algorithms for deterministic construction
of efficient dictionaries
- Milan Ružic
- IT University of Copenhagen
- Faculty of Mathematics
- University of Belgrade
- ESA 2004 / ARCO 2005 presentation
2The dictionary problem
- How to store a set S ? U and answer inquires
about membership - is x?S ?.
- In the dynamic dictionary problem, S may change
over time. - Conditions
- Compute on a unit-cost RAM with word length w and
- a standard instruction set, including
multiplication and division. - Finite universe U ? 0,1w .
- Use space linear in n ? S .
3Randomized solutions
- Started with a static dictionary with O(n)
expected construction time, - using ?(nw) random bits Fredman, Komolós,
Szmerédi 82. - Reached a dynamic dictionary with
- Constant search time.
- Constant update time with probability O(1 n-c).
- Use of only O(log n log w) random bits.
- Dietzfelbinger et al 92
- However, what if
- random bits are not easily available, or
- performance without a guarantee is unacceptable?
4Deterministic dictionaries with fast lookups
Reference Lookup time Construction time Compile-time precomputation
Alon-Naor 94 O(w / log n) O(n w log4 n) _
Andersson 96 O(log w loglog n) O(n) O(wO(1))
Raman 96 O(1) O(n2 w) _
Hagerup Miltersen Pagh 01 O(1) O(n log n) ?(2?(w) w) ?
Our results O( t(n) ) O(n11/t(n) n t(n) log w) _
Our results O(1) O(n w log2 n) _
5The family of hash functions
- Viewing the problem in a continuous setting - HR
. - A sufficient condition for avoiding collisions
6The set of good parameters
- The set of multipliers which generate less than m
collisions on the set of s differences has the
measure of at least - We can calculate the measures with numbers of
bounded precision. - The set of good parameters contains
sufficiently large intervals - that is, there are good multipliers which can
be represented by a constant number of machine
words.
7Finding a good function
- Problem Given a set of s differences,
deterministically find a multiplier a which
produces less than m colliding differences. - Not all differences need to be explicitly stored
in memory. - We use bit by bit construction sometimes
several consecutive bits - are set at once.
- Choosing a bit is equivalent to choosing a half
of a working interval. - Key observation sets with relatively small
support intervals - are insignificant to current choice.
8Three classes of differences
- The recurrence for measure estimates
- ?1(p1) ?2(p1) E(p1) ? ?(p) E(p)
- Several bits are chosen at once when Dmid ? ? .
- O(w) term represents the total cost of finding
the leftmost bits of keys.
9Reducing the construction time
- We employ multi-level hashing scheme. The number
of levels can be set by adjusting the parameters
m and s. - The structure of the set of differences
- In the case of O(1) lookup time we set nk? n?, m
? 4n? and r ? n. - Note on evaluation When input consists of
multi-word keys, full multiplication is usually
not necessary.