Title: Poorvi Vora
1Information Theory and the Security of Binary
Data Perturbation
- Poorvi Vora
- Dept. of Computer Science
- George Washington University
2Statistical Database
- Database A
- Q q1 ,q2 ,...qi ,... (queryable bits) and
- S s1, s2,...si ,... (sensitive bits).
- Data collector B can ask for
- fi(q1, q2, q3, )qj ?Q Xi
3The statistical database security problem
- Can query multiple
- fi(q1, q2, q3, )qj?Q Xi
- And simultaneously solve
- (perfect zk protocols do not leak additional
information about xi, but Ai are revealed thus
not a traditional cryptographic problem)
4Random Data Perturbation (RDP)
- Used in public health community for twenty odd
years, can be used together with cryptographic
techniques - If xi perturbed each time, the simultaneous
equations are inconsistent - fi(q1?1i, q2 ?2i, q3 ?3i, ) Xi ?i
- Security and attack characterization open problem
for 20 years though many attempts (Denning,
Adams, Duncan, Landers).
5RDP
Salary 25,000
Salary 40,000
-25,000
25,000
q
0
0
p 1-q
F(x)
G(x)
Yes
HIV?
p 1-q
q
1
1
stats. over many are accurate
6 Known Security Property of RDP
- m repeated queries
- ?m probability of error
- ?m ? 0 ? m ? ?
- Chernoff Bound
- m ln(2/?) /0.38 ?2 ? ?m lt ?
- Probability of lie 0.5 ? ?
7A simple inference attack
- Query 1 Female?
- Query 2 Over 40?
- Query 3 Losing Calcium?
- Really asking about age and gender
- How does one characterize all such attacks?
- What can one say about security wrt such attacks?
8Our definitions
- Definition
- An inference attack is a set of queries x not
independent of the set of sensitive bits S, i.e. - I (S x) ? 0
- Definition
- A small error inference attack is one in which
- lim n?? ?m 0 .
- Definition
- The query complexity per bit, of query sequence x
of length m, as a means of distinguishing among M
possible values of x is - ?m m/log2M .
9Recall attack example
- Query 1 Female?
- Query 2 Over 40?
- Query 3 Losing Calcium?
- Query 3 checks answers to Query 1 and 2
- Is a parity-check bit of sorts, but not quite
- If 1 and 2 independent, ? 3/2
- ?m ? 0 ? ?m ? ? ?
10Our analogy (ISIT 03)
- All attacks are communication over channel
- When attacks are codes x f(S)
- What B queries is a codeword bit
- What B receives is the transmitted codeword that
he decodes
11Shannons theorems apply when x f(S) and ?
constant (ISIT 03)
- Assuming
- x f(S) (including adaptive, related queries)
queries are channel codes - constant reliable transmission
- Result
- ?m ? 0 ? ? ? 1/C
- Above this bound, ?m ? 0 exponentially,
- Below it, it ?m increases exponentially
12What about the general zero-error inference
attack?
- All inference attacks are not codes, i.e. x ?
f(S). - ? is not necessarily kept constant as m ??, i.e.
transmission is not necessarily reliable.
13Thm. 1
- lim m ?? ?m 0
- ?? ? mm1 ? s.t. ?i ? ? m ? i?m lim m ?? ?m
1/C - Proof modifies the converse of Shannons proof of
the channel coding theorem
14The Proof
- log2M H(sm) H(smym) I(smym)
- 1 Emlog2M I(smym)
- 1 Emlog2M mC
- ?m m/log2M ? (1-Em)/(1/mC) ?m
- Lim m?? ?m 1/C
15Thm. 2
- Small error attacks with constant ? ? 1/C exist.
- Proof Follows from channel coding theorem
16Thm. 3
- For data of entropy H, stationary record
sequence, Nr records, and ?m the number of
queries per record, - lim m ?? ?m 0
- ?? ?mm1 ? s.t. ?i ? ? m ? i?m lim m ?? ?m
H/C - Proof Modification of source-channel coding
theorem
17Proof
- Given Theorem 1, smaller lengths can be shown to
violate Shannons source coding theorem when the
data is stationary.
18Corollary
- ?m ? ln2/2?2
- When p 0.5??
- For any probability of error
- Different from Chernoff bound, does not increase
with a smaller probability of error - This is the improvement bought over the
repetition code
19Where to?
- Block Ciphers as channels for properties of the
key (Filiol, ePrint 2003) - Attacks on Stream Ciphers as codes over key bits
(Johansson et al, Golic et al, Filiol et al) - It appears there is a framework (Vora, working
documents) - all statistical attacks as channel communication
- efficient attacks as codes
- related-input (key, message) attacks as
concatenated codes1 - Wagners Cryptanalytic Model (FSE 03) to
determine inner codes - Do related-key attacks provide an improvement in
efficiency over repeated key attacks? -
- 1Filiol shows the repeated key attack on block
ciphers as a concatenated code with the outer
code as the repetition code
20Also traffic analysis, e.g.Crowds Reiter and
Rubin/Lucent and ATT
N nodes C colluding pf probability of forwarding
At node i1 Probability that node i originated
the message (probability of truth) 1 pf
(N-C-1)/N Probability of any other
non-collaborating node originating message
pf/N Observable information changes the pdf on
the data of interest the originator of the
message
Crowds
21The Crowds protocol as a simplex channel
X
Y
F X set of originator nodes 0, ..N-3 ? Y
set of predecessor nodes 0, ..N-3 F(X)
Y Assumption all senders equally likely P(Y j
X i) pij pf/N i ?j 1 pf(N-2)/N ij
22The Crowds protocol
X
Y
C 1 (N-2)pf/N log 1- (N-2)pf/N pf/N log
pf/N 2log2/N if pf1 ? 2log2/N (N-1)?2 if
pf 1 - ? Average path length (1 - ?)/? O(1/
?)
23The replay attack on Crowds
- Repetition code ? resending message, along
different (randomly chosen) route - How about attacks corresponding to other codes?