Title: Batch Codes and Their Applications
1Batch Codes and Their Applications
- Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai
- Preliminary version in STOC 2004
2Talk Outline
- Batch codes
- Amortized PIR
- via hashing
- via batch codes
- Constructing batch codes
- Concluding remarks
3A Load-Balancing Scenario
x
4Whats wrong with a random partition?
- Good on average for oblivious queries.
- However
- Cant balance adversarial queries
- Cant balance few random queries
- Cant relieve hot spots in multi-user setting
5Example
- 3 devices, 50 storage overhead.
- By how much can the maximal load be reduced?
- Replicating bits is no good ?device s.t.1/6 of
the bits can only be found at this device. - Factor 2 load reduction is possible
6Batch Codes
- (n,N,m,k) batch code
- Notes
- Rate n / N
- By default, insist on minimal load per bucket ?
mk. - Load measured by of probes.
- Generalizations
- Allow t probes per bucket
- Larger alphabet ?
7Multiset Batch Codes
- (n,N,m,k) multiset batch code
- Motivation
- Models multiple users (with off-line
coordination) - Useful as a building block for standard batch
codes - Nontrivial even for multisets of the form lt
i,i,,i gt
8Examples
- Trivial codes
- Replication Nkn, mk
- Optimal m, bad rate.
- One bit per bucket Nmn
- Optimal rate, bad m.
- (L,R,L?R) code rate2/3, m3, k2.
- Goal simultaneously obtain
- High rate (close to 1)
- Small m (close to k)
9Private Information Retrieval (PIR)
- Goal allow user to query database while hiding
the identity of the data-items she is after. - Motivation patent databases, web searches, ...
- Paradox(?) imagine buying in a store without the
seller knowing what you buy. -
- Note Encrypting requests is useful against third
parties not against server holding the
data.
10Modeling
- Database n-bit string x
- User wishes to
- retrieve xi and
- keep i private
11(No Transcript)
12Some Solutions
- 1. User downloads entire database.
- Drawback n communication bits (vs. logn1
w/o privacy). - Main research goal minimize communication
complexity. - 2. User masks i with additional random indices.
- Drawback gives a lot of information about i.
- 3. Enable anonymous access to database.
- Note addresses the different security
concern of hiding users identity, not
the fact that xi is retrieved. - Fact PIR as described so far requires ?(n)
communication bits.
13Two Approaches
- Computational PIR KO97, CMS99,...
- Computational privacy
- Based on cryptographic assumptions
- Information-Theoretic PIR CGKS95,Amb97,...
- Replicate database among s servers
- Unconditional privacy against t servers
- Default t1
14Communication Upper Bounds
- Computational PIR
- O(n?), polylog(n), O(??logn), O(?logn)
KO97,CMS99, - Information-theoretic PIR
- 2 servers, O(n1/3) CGKS95
- s servers, O(n1/c(s)) where c(s)?(slogs /
loglogs)CGKS95,Amb97,BIKR02 - O(logn/loglogn) servers, polylog(n)
15Time Complexity of PIR
- Given low-communication protocols, efficiency
bottleneck shifts to servers time complexity. - Protocols require (at least) linear time per
query. - This is an inherent limitation!
- Possible workarounds
- Preprocessing
- Amortize cost over multiple queries
16Previous Results BIM00
- PIR with preprocessing
- s-server protocols with O(n?) communication and
O(n1/s?) work per query, requiring poly(n)
storage. - Disadvantages
- Only work for multi-server PIR
- Storage typically huge
- Amortized PIR
- Slight savings possible using fast matrix
multiplication - Require a large batch of queries and high
communication - Apply also to queries originating from different
users. - This work
- Assume a batch of k queries originate from a
single user. - Allow preprocessing (not always needed).
- Nearly optimal amortization
17Model
Server/s
User
18Amortized PIR via Hashing
- Let P be a PIR protocol.
- Hashing-based amortized PIR
- User picks h?RH , defining a random partition of
x into k buckets of size?n/k, and sends h to
Server/s. - Except for 2-? failure probability, at most
tO(??logk) queries fall in each bucket. - P is applied t times for each bucket.
- Complexity
- Time ? kt ?T(n/k) ? t ?T(n)
- Communication ? kt?C(n/k)
- Asymptotically optimal up to polylog factors
19So whats wrong?
- Not much
- Still
- Not perfect
- introduces either error or privacy loss
- Useless for small k
- tO(??logk) overhead dominates
- Cannot hash once and for all
- ?h ? bad k-tuple of queries
- Sounds familiar?
20Amortized PIR via Batch Codes
- Idea use batch-encoding instead of hashing.
- Protocol
- Preprocessing Server/s encode x as
y(y1,y2,,ym). - Based on i1,,ik, User computes the index of the
bit it needs from each bucket. - P is applied once for each bucket.
- Complexity
- Time ? ?1?j?mT(Nj) ? T(N)
- Communication ? ?1?j?mC(Nj) ? m?C(n)
- Trivial batch codes imply trivial protocols.
- (L,R,L?R) code 2 queries,1.5 X time, 3 X
communication
21Constructing Batch Codes
22Overview
- Recall notion
- Main qualitative questions
- 1.Can we get arbitrarily high constant rate
(n/N1-?) while keeping m feasible in terms of k
(say mpoly(k))? - 2.Can we insist on nearly optimal m (say mO(k))
and still get close to a constant rate? - Several incomparable constructions
- Answer both questions affirmatively.
23Batch Codes from Unbalanced Expanders
- By Halls theorem, the graph represents an
(n,NE,m,k) batch code iff every set S
containing at most k vertices on the left has at
least S neighbors on the right. - Fully captures replication-based batch codes.
24Parameters
- Non-explicit Ndn, mO(k? (nk)1/(d-1))
- d3 rate1/3, mO(k3/2n1/2).
- dlogn rate1/logn, mO(k) ? Settles Q2
- Explicit (using TUZ01,CRVW02)
- Nontrivial, but quite far from optimal
- Limitations
- Rate lt ½ (unless m?(n))
- For const. rate, m must also depend on n.
- Cannot handle multisets.
25The Subcube Code
- Generalize (L,R,L?R) example in two ways
- Trade better rate for larger m
- (Y1,Y2,,Ys,Y1? ? Ys)
- still k2
- Handle larger k via composition
26Geomertic Interpretation
A
B
A
B
C
D
A?B
C
D
C?D
A?C
B?D
A?B?C?D
27Parameters
- N?klog(11/s)?n, m?klog(s1)
- sO(logk) gives an arbitrary constant rate with
mkO(loglogk). ? almost resolves Q1 - Advantages
- Arbitrary constant rate
- Handles multisets
- Very easy decoding
- Asymptotically dominated by subsequent
construction.
28The Gadget Lemma
Primitive multiset batch code
- From now on, we can choose a convenient n and
get same rate and m(k) for arbitrarily larger n.
29Batch Codes vs. Smooth Codes
- Def. A code C?n? ?m is q-smooth if there exists
a (randomized) decoder D such that - D(i) decodes xi by probing q symbols of C(x).
- Each symbol of C(x) is probed w/prob ? q/m.
- Smooth codes are closely related to locally
decodable codes KT00. - Two-way relation with batch codes
- q-smooth code ? primitive multiset batch code
with km/q2 (ideally would like km/q). - Primitive multiset batch code ? (expected)
q-smooth for qm/k - Batch codes and smooth codes are very different
objects - Relation breaks when relaxing multiset or
primitive - Gap between m/q and m/q2 is very significant for
high rate case - Best known smooth codes with rategt1/2 require
qgtn1/2 - These codes are provably useless as batch codes.
30Batch Codes from RM Codes
- (s,d) Reed-Muller code over F
- Message viewed as s-variate polynomial p over F
of total degree (at most) d. - Encoded by the sequence of its evaluations on all
points in Fs - Case Fgtd is useful due to a smooth decoding
feature p(z) can be extrapolated from the
values of p on any d1 points on a line passing
through z.
31x2
xn
x1
s2, d?(2n)1/2
- Two approaches for handling conflicts
- Replicate each point t times
- Use redundancy to delete intersections
- Slightly increases field size, but still allows
constant rate.
32Parameters
- Rate (1/s!-?), mk11/(s-1)o(1)
- Multiset codes with constant rate (lt ½)
- Rate ?(1/k?), mO(k) ? resolves Q2 for
multiset codes as well - Main remaining challenge resolve Q1
33The Subset Code
- Choose s,d such that n?
- Each data bit i?n is associated T?
- Each bucket j?m is associated S?
- Primitive code yS?T?SxT
( )
s ?d
34Batch Decoding the Subset Code
xT
yT
- Lemma For each T?T, xT can be decoded from all
yS such that S?TT. - Let LT,T denote the set of such S.
- Note LT,T T?T defines a partition of
0011110000 0110
35Batch Decoding the Subset Code (contd.)
x3
x1
x2
- Goal Given T1,,Tk, find subsets T1,,Tk such
that LTi,Ti are pairwise disjoint. - Easy if all Ti are distinct or if all Ti are
the same. - Attempt 1 Ti is a random subset of Ti
- Problem if Ti,Tj are disjoint, LTi,Ti and
LTj,Tj intersect w.h.p. - Attempt 2 greedily assign to Ti the largest Ti
such that LTi,Ti does not intersect any
previous LTj,Tj - Problem adjacent sets may block each other.
- Solution pick random Ti with bias towards large
sets.
36Parameters
- Allows arbitrary constant rate with mpoly(k) ?
Settles Q1 - Both the subcube code and the subset code can be
viewed as sub-codes of the binary RM code. - The full binary RM code cannot be batch decoded
when the rategt1/2.
37Concluding Remarks Batch Codes
- A common relaxation of very different
combinatorial objects - Expanders
- Locally-decodable codes
- Problem makes sense even for small values of m,k.
- For multiset codes with m3,k2, rate 2/3 is
optimal. - Open for m?k2.
- Useful building block for distributed data
structures.
38Concluding Remarks PIR
- Single-user amortization is useful in practice
only if PIR is significantly more efficient than
download. - Certainly true for multi-server PIR
- Most likely true also for single-server PIR
- Killer app for lattice-based cryptosystems?