Title: Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining
1Practical Private Computation and Zero-Knowledge
Tools for Privacy-PreservingDistributed Data
Mining
- Yitao Duan and John Canny
- http//www.cs.berkeley.edu/duan
- Berkeley Institute of Design
- Computer Science Division
- University of California, Berkeley
2Goal
- To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale
3Goal
- To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale
4The Scenario
- Two data miners mine data from n users
- The data miners are semi-honest follow the
protocol but try to get more info - Some fraction of users can be malicious they may
input bogus data to disrupt the computation - A more realistic adversary model than most
existing privacy-preserving data mining schemes
5Model
f
di in Zfm f lt 32 or 64-bit
6A Practical Solution
- Provable privacy Cryptography
- Efficiency
- VSS over small field.
- Minimize the number of expensive primitives and
rely on probabilistic guarantee - Realistic adversary model An extremely efficient
zero-knowledge proof to bound the L2-norm of a
users vector. An effective way to limit the
influence malicious users could have on the
computation
7Basic Approach
S
f
8The Power of Addition
- A large number of popular algorithms can be run
with addition-only steps - Linear algorithms voting and summation,
nonlinear algorithm regression, SVD, PCA,
k-means, ID3, EM etc - All algorithms in the statistical query model
Kearns 93 - Many other gradient-based numerical algorithms
- A trick used a lot for parallelization in
distributed computing Chu 06, Das 07 - Addition-only framework has very efficient
private implementation in cryptography and admits
efficient ZKPs
9Private Addition
- The computation secret sharing over small field
- Malicious users efficient zero-knowledge proof
to bound the L2-norm of the user vector
10Big Integers vs. Small Ones
- Most applications work with regular-sized
integers (e.g. 32- or 64-bit). Arithmetic
operations are very fast when each operand fits
into a single memory cell (10-9 sec) - Public-key operations (e.g. used in encryption
and verification) must use keys with sufficient
length (e.g. 1024-bit) for security. Existing
private computation solutions must work with
large integers extensively (10-3 sec) - A 6 orders of magnitude difference!
11Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
12Private Addition
µ Sui
? Svi
ui vi di
13Private Addition
µ
?
µ Sui
? Svi
ui vi di
14Private Addition
µ ?
15Private Addition
- Provable privacy
- Computation on each server is over small field
same cost as non-private implementation O(m)
small field operations - So the cost for privacy is only due to
verification - For that we have a solution that involves only
O(log m) large field operations
16The Need for Verification
- Private computation obfuscates user data. A
malicious user could input anything. - Think of a voting scheme Please place your vote
0 or 1 in the envelope
17Zero Knowledge Proofs
- I can prove that I know X without disclosing what
X is. - I can prove that an encrypted number is a ZERO OR
ONE, i.e. a bit. (6 extra numbers needed) - I can prove that an encrypted number is a k-bit
integer. I need 6k extra numbers to do this (!!!)
18Bounding the L2-Norm
- A natural and effective way to restrict a
cheating users malicious influence - You must have a big vector to produce large
influence on the sum - Perturbation theory bounds system change with
norms - si(A) - si(B) A-B2 Weyl
- Can be the basis for other checks
- Setting L 1 forces each user to have only 1
vote
19An Efficient ZKP of Boundedness
- Luckily, we dont need to prove that every number
in a users vector is small, only that the vector
is small. - The server asks for some random projections of
the users vector, and expects the user to prove
that the square sum of them is small.
- O(log m) public key crypto operations (instead
of O(m)) to prove that the L-2 norm of an m-dim
vector is smaller than L. - Running time reduced from hours to seconds.
20Random Projection-basedL2-Norm ZKP
- Server generates N random m-vectors in
- -1, 0, 1m with i.i.d. probability ¼, ½, ¼
- User projects his data to the N directions.
provides ZKP that the square sum of the
projections lt NL2/2 - Expensive public key operations are only on the
projections and the square sum
21Effectiveness
22Acceptance/rejection Probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
23Performance Evaluation
- Verifier and (b) prover times in seconds for the
validation protocol where (from top to bottom) L
(the required bound) has 40, 20, or 10 bits. The
x-axis is the vector length. - Standard technique takes 6 to 10 hours at m 106
24Current Status
- The protocols (the L2-norm ZKP and the private
vector addition) have been implemented - Adding more mid-tier components
- In Java using native code for big integer
- Runs on Linux platform
- Made an open-source toolkit for building
privacy-preserving real-world applications
25More info
- duan_at_cs.berkeley.edu
- http//www.cs.berkeley.edu/duan/research/p4p.html
- Thank You!