Title: Private Analysis of Data Sets
1Private Analysis of Data Sets
- Benny Pinkas
- HP Labs, Princeton
2A story
Were experiencing a lot of fraud lately
Here too..
I cant find a pattern to recognize fraud in
advance..
Neither can I..
- But, what about
- Patients privacy
- Business secrets
Maybe we should share information..
Have you heard of Secure function evaluation ?
This is all theory. It cant be efficient.
3New Opportunities for Interaction
- Between
- Enterprises, and government agencies holding
sensitive data. - P2P users
- Mobile wireless crowds (PDAs, cell phones)
- What about privacy?
- A bidirectional approach
- Finding what is actually needed
- Designing useful and efficient cryptographic tools
4Cryptographic Protocols for Privacy Preserving
Computation
y
x
Input
F(x,y) and nothing else
Output
y
As if
x
F(x,y)
F(x,y)
5Does the trusted party scenario make sense?
y
x
F(x,y)
F(x,y)
- We cannot hope for more privacy
- Does the trusted party scenario make sense?
- Are the parties motivated to submit their true
inputs? - Can they tolerate the disclosure of F(x,y)?
- If so, we can implement the scenario without a
trusted party.
6Secure Function Evaluation Yao,GMW,BGW
- F(x,y) A public function.
- Represented as a Boolean circuit C(x,y).
- Implementation
- O(X) oblivious transfers. O(C)
communication. - Pretty efficient for small circuits! (but what
about - larger circuits?)
7An equality circuit
1 if xy 0 otherwise
x
y
8Cryptographic methods vs. randomization methods
overhead
Our goal
inaccuracy
lack of privacy
9Examples of Simple Privacy Preserving Primitives
(with reasonable solutions)
- Is X Y? Is X gt Y?
- What is X ? Y? What is median of X ? Y?
- Auctions (negotiations). Many parties, private
bids. Compute the winning bidder and the sale
price, but nothing else. NPS - Voting
- Add privacy to data mining algs (ID3 LP)
10Private Set Intersection
- with
- Mike Freedman, NYU
- Kobbi Nissim, MSR
11Applications of Set Intersection
Government agency B
Government agency A
People on welfare
Expensive car buyers
Compute intersection and nothing else
12Computing the Intersection
- Private Equality Test (PET)
- Alice x. Bob y.
- Output 1 iff xy
- Privacy preserving solutions
- Cannot use hash functions alone
- Yao, FNW, NP
- Generalization list intersection
- X x1, , xn Y y1, , yn
13The basic tool Homomorphic Encryption
- Semantically secure public key encryption
- Given Enc(M1), ENC(M2), can compute (without
knowing the decryption key) - Enc(M1M2)
- Enc(c M1) for any constant c.
- I.e. Enc(a0)Enc(a1)xEnc(an)xn Enc(P(x))
- Examples El Gamal, Paillier, DJ.
14The Scenario
- Client X x1, , xn
- Server Y y1, , yn
- Output
- Client learns X ? Y.
- Server learns nothing.
15The Protocol
- Client defines a polynomial of degree n whose
roots are x1,,xn - P(y) (x1-y)(x2-y)(xn-y)
- anyn a1y a0
- Sends to server homomorphic encryptions of
coefficients - Enc(an),, Enc(a0)
- (only the client can decrypt)
16The Protocol
- Server uses homomorphic properties to compute
- ?y Enc( rP(y) y) (r is random)
- If y?X?Y result is Enc(r0y)Enc(y), otherwise
result is Enc(random). - Server sends (permuted) results to C.
- C decrypts, compares to its list.
17Security
- Bad server? The server only sees semantically
secure encryptions. Learning about Cs input
breaking enc. - Bad client? The client can, given only the output
X?Y, simulate her view in the protocol. (I.e.
she generates encryptions of items in X?Y, and of
random items.)
18Efficiency
- Client encrypts and decrypts n values
- Communication is O(n)
- Server
- For each input computes Enc(rP(y)y), i.e. n
exponentiations. - Total O(n2) exponentiations
- Can use hashing to reduce overhead to O(n lnln
n).
19Is Approximation easier?
- Can we approximate size of intersection (i.e.
scalar product) with sublinear overhead? - Lower bound ?
- Approximating X?Y within 1 ? e factor requires
?(n) communication (?constant e). - True even for randomized algorithms.
- Proof reduction to Razborovs lower bound for
Disjointness. - Upper bound protocols with matching overhead.
20Secure Computation of the Kth-ranked element
- with
- Gagan Aggarwal, Stanford
- Nina Mishra, HPL
21Secure Computation of the Kth-ranked element
- Inputs
- A SA B SB
- Large sets of unique items (?D).
- Theres also the multi-party scenario
- Output x ? SA ? SB
- s.t. y yltx, y?SA?SB k-1
- Median k (SA SB) / 2
22Motivation
- Basic statistical analysis of distributed data
- E.g. histogram of salaries in competing business
in the same area - Sometimes the parties might want to hide the size
of their inputs
23Some information is always revealed
- The Kth-ranked element reveals some information
- Suppose SA x1,,x1000
- Median of SA ? SB x400
- Party A now learns that SB contains at least 200
elements smaller than x400 - But she shouldnt learn more
24Results, and previous work
- Previous work generic constructions overhead
at least linear in k. - New results
- Two-party log k secure comparisons of log D bit
numbers. - Multi-party log D simple computations with log D
bit numbers.
25An (insecure) two-party median protocol
RA
LA
SA
mA
mA lt mB
SB
RB
LB
mB
LA lies below the median, RB lies above the
median. New median is same as original median.
Recursion ? Need log n rounds (suppose each set
contains 2i items)
26Secure two-party median protocol
A deletes x?SA s.t. x lt mA. B deletes x?SB s.t.
x gt mB.
YES
A finds median of SA, call it mA B finds
median of SB, call it mB
mA lt mB
A deletes x?SA s.t. x gt mA. B deletes x?SB s.t.
x lt mB.
NO
Secure comparison (e.g. a small circuit)
27Proof of security
- Simulation Given the protocols output, each
party can simulate the execution of the protocol
SA
median
First comparison mA lt mB
Second comparison mA gt mB
28Arbitrary inputs, arbitrary k
SA
K
2i
SB
Now, compute the median of two sets of size k
Size should be a power of 2
median of new inputs kth element of original
inputs
29Conclusions
- Efficient privacy preserving primitives for basic
tasks - Open problems
- Intersection approximate matching?
- Median clustering?
- Theory and applications can and should interact
- Tools from the theory of cryptography (e.g. SFE)
can be used in applications - Applications can benefit from rigorous analysis
- Theres a lot more to be done