Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

Description:

An effective way to limit the influence malicious users could have on the computation ... In Java using native code for big integer. Runs on Linux platform ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 26
Provided by: duan
Learn more at: http://bid.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining


1
Practical Private Computation and Zero-Knowledge
Tools for Privacy-PreservingDistributed Data
Mining
  • Yitao Duan and John Canny
  • http//www.cs.berkeley.edu/duan
  • Berkeley Institute of Design
  • Computer Science Division
  • University of California, Berkeley

2
Goal
  • To provide practical solutions with provable
    privacy and adequate efficiency in a realistic
    adversary model at reasonably large scale

3
Goal
  • To provide practical solutions with provable
    privacy and adequate efficiency in a realistic
    adversary model at reasonably large scale

4
The Scenario
  • Two data miners mine data from n users
  • The data miners are semi-honest follow the
    protocol but try to get more info
  • Some fraction of users can be malicious they may
    input bogus data to disrupt the computation
  • A more realistic adversary model than most
    existing privacy-preserving data mining schemes

5
Model
f
di in Zfm f lt 32 or 64-bit

6
A Practical Solution
  • Provable privacy Cryptography
  • Efficiency
  • VSS over small field.
  • Minimize the number of expensive primitives and
    rely on probabilistic guarantee
  • Realistic adversary model An extremely efficient
    zero-knowledge proof to bound the L2-norm of a
    users vector. An effective way to limit the
    influence malicious users could have on the
    computation

7
Basic Approach
S
f


8
The Power of Addition
  • A large number of popular algorithms can be run
    with addition-only steps
  • Linear algorithms voting and summation,
    nonlinear algorithm regression, SVD, PCA,
    k-means, ID3, EM etc
  • All algorithms in the statistical query model
    Kearns 93
  • Many other gradient-based numerical algorithms
  • A trick used a lot for parallelization in
    distributed computing Chu 06, Das 07
  • Addition-only framework has very efficient
    private implementation in cryptography and admits
    efficient ZKPs

9
Private Addition
  • The computation secret sharing over small field
  • Malicious users efficient zero-knowledge proof
    to bound the L2-norm of the user vector

10
Big Integers vs. Small Ones
  • Most applications work with regular-sized
    integers (e.g. 32- or 64-bit). Arithmetic
    operations are very fast when each operand fits
    into a single memory cell (10-9 sec)
  • Public-key operations (e.g. used in encryption
    and verification) must use keys with sufficient
    length (e.g. 1024-bit) for security. Existing
    private computation solutions must work with
    large integers extensively (10-3 sec)
  • A 6 orders of magnitude difference!

11
Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
12
Private Addition
µ Sui
? Svi
ui vi di
13
Private Addition
µ
?
µ Sui
? Svi
ui vi di
14
Private Addition
µ ?
15
Private Addition
  • Provable privacy
  • Computation on each server is over small field
    same cost as non-private implementation O(m)
    small field operations
  • So the cost for privacy is only due to
    verification
  • For that we have a solution that involves only
    O(log m) large field operations

16
The Need for Verification
  • Private computation obfuscates user data. A
    malicious user could input anything.
  • Think of a voting scheme Please place your vote
    0 or 1 in the envelope

17
Zero Knowledge Proofs
  • I can prove that I know X without disclosing what
    X is.
  • I can prove that an encrypted number is a ZERO OR
    ONE, i.e. a bit. (6 extra numbers needed)
  • I can prove that an encrypted number is a k-bit
    integer. I need 6k extra numbers to do this (!!!)

18
Bounding the L2-Norm
  • A natural and effective way to restrict a
    cheating users malicious influence
  • You must have a big vector to produce large
    influence on the sum
  • Perturbation theory bounds system change with
    norms
  • si(A) - si(B) A-B2 Weyl
  • Can be the basis for other checks
  • Setting L 1 forces each user to have only 1
    vote

19
An Efficient ZKP of Boundedness
  • Luckily, we dont need to prove that every number
    in a users vector is small, only that the vector
    is small.
  • The server asks for some random projections of
    the users vector, and expects the user to prove
    that the square sum of them is small.
  • O(log m) public key crypto operations (instead
    of O(m)) to prove that the L-2 norm of an m-dim
    vector is smaller than L.
  • Running time reduced from hours to seconds.

20
Random Projection-basedL2-Norm ZKP
  • Server generates N random m-vectors in
  • -1, 0, 1m with i.i.d. probability ¼, ½, ¼
  • User projects his data to the N directions.
    provides ZKP that the square sum of the
    projections lt NL2/2
  • Expensive public key operations are only on the
    projections and the square sum

21
Effectiveness
22
Acceptance/rejection Probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
23
Performance Evaluation
  • Verifier and (b) prover times in seconds for the
    validation protocol where (from top to bottom) L
    (the required bound) has 40, 20, or 10 bits. The
    x-axis is the vector length.
  • Standard technique takes 6 to 10 hours at m 106

24
Current Status
  • The protocols (the L2-norm ZKP and the private
    vector addition) have been implemented
  • Adding more mid-tier components
  • In Java using native code for big integer
  • Runs on Linux platform
  • Made an open-source toolkit for building
    privacy-preserving real-world applications

25
More info
  • duan_at_cs.berkeley.edu
  • http//www.cs.berkeley.edu/duan/research/p4p.html
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com