Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

About This Presentation

Title:

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

Description:

An effective way to limit the influence malicious users could have on the computation ... In Java using native code for big integer. Runs on Linux platform ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 26

Provided by: duan

Learn more at: http://bid.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

1
Practical Private Computation and Zero-Knowledge
Tools for Privacy-PreservingDistributed Data
Mining

Yitao Duan and John Canny
http//www.cs.berkeley.edu/duan
Berkeley Institute of Design
Computer Science Division
University of California, Berkeley

2
Goal

To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale

3
Goal

To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale

4
The Scenario

Two data miners mine data from n users
The data miners are semi-honest follow the
protocol but try to get more info
Some fraction of users can be malicious they may
input bogus data to disrupt the computation
A more realistic adversary model than most
existing privacy-preserving data mining schemes

5
Model
f
di in Zfm f lt 32 or 64-bit

6
A Practical Solution

Provable privacy Cryptography
Efficiency
VSS over small field.
Minimize the number of expensive primitives and
rely on probabilistic guarantee
Realistic adversary model An extremely efficient
zero-knowledge proof to bound the L2-norm of a
users vector. An effective way to limit the
influence malicious users could have on the
computation

7
Basic Approach
S
f

8
The Power of Addition

A large number of popular algorithms can be run
with addition-only steps
Linear algorithms voting and summation,
nonlinear algorithm regression, SVD, PCA,
k-means, ID3, EM etc
All algorithms in the statistical query model
Kearns 93
Many other gradient-based numerical algorithms
A trick used a lot for parallelization in
distributed computing Chu 06, Das 07
Addition-only framework has very efficient
private implementation in cryptography and admits
efficient ZKPs

9
Private Addition

The computation secret sharing over small field
Malicious users efficient zero-knowledge proof
to bound the L2-norm of the user vector

10
Big Integers vs. Small Ones

Most applications work with regular-sized
integers (e.g. 32- or 64-bit). Arithmetic
operations are very fast when each operand fits
into a single memory cell (10-9 sec)
Public-key operations (e.g. used in encryption
and verification) must use keys with sufficient
length (e.g. 1024-bit) for security. Existing
private computation solutions must work with
large integers extensively (10-3 sec)
A 6 orders of magnitude difference!

11
Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
12
Private Addition
µ Sui
? Svi
ui vi di
13
Private Addition
µ
?
µ Sui
? Svi
ui vi di
14
Private Addition
µ ?
15
Private Addition

Provable privacy
Computation on each server is over small field
same cost as non-private implementation O(m)
small field operations
So the cost for privacy is only due to
verification
For that we have a solution that involves only
O(log m) large field operations

16
The Need for Verification

Private computation obfuscates user data. A
malicious user could input anything.
Think of a voting scheme Please place your vote
0 or 1 in the envelope

17
Zero Knowledge Proofs

I can prove that I know X without disclosing what
X is.
I can prove that an encrypted number is a ZERO OR
ONE, i.e. a bit. (6 extra numbers needed)
I can prove that an encrypted number is a k-bit
integer. I need 6k extra numbers to do this (!!!)

18
Bounding the L2-Norm

A natural and effective way to restrict a
cheating users malicious influence
You must have a big vector to produce large
influence on the sum
Perturbation theory bounds system change with
norms
si(A) - si(B) A-B2 Weyl
Can be the basis for other checks
Setting L 1 forces each user to have only 1
vote

19
An Efficient ZKP of Boundedness

Luckily, we dont need to prove that every number
in a users vector is small, only that the vector
is small.
The server asks for some random projections of
the users vector, and expects the user to prove
that the square sum of them is small.

O(log m) public key crypto operations (instead
of O(m)) to prove that the L-2 norm of an m-dim
vector is smaller than L.
Running time reduced from hours to seconds.

20
Random Projection-basedL2-Norm ZKP

Server generates N random m-vectors in
-1, 0, 1m with i.i.d. probability ¼, ½, ¼
User projects his data to the N directions.
provides ZKP that the square sum of the
projections lt NL2/2
Expensive public key operations are only on the
projections and the square sum

21
Effectiveness
22
Acceptance/rejection Probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
23
Performance Evaluation

Verifier and (b) prover times in seconds for the
validation protocol where (from top to bottom) L
(the required bound) has 40, 20, or 10 bits. The
x-axis is the vector length.
Standard technique takes 6 to 10 hours at m 106

24
Current Status

The protocols (the L2-norm ZKP and the private
vector addition) have been implemented
Adding more mid-tier components
In Java using native code for big integer
Runs on Linux platform
Made an open-source toolkit for building
privacy-preserving real-world applications

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining - PowerPoint PPT Presentation

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

An effective way to limit the influence malicious users could have on the computation ... In Java using native code for big integer. Runs on Linux platform ... – PowerPoint PPT presentation