Limiting Privacy Breaches in Privacy Preserving Data Mining - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Limiting Privacy Breaches in Privacy Preserving Data Mining

Description:

To ensure privacy, each Ci sends a modified yi of xi to server ... We are going to represent a randomized transaction by a seed ? Seed ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 41
Provided by: duc5
Category:

less

Transcript and Presenter's Notes

Title: Limiting Privacy Breaches in Privacy Preserving Data Mining


1
Limiting Privacy Breaches in Privacy Preserving
Data Mining
  • In Proceedings of the 22nd ACM SIGACT SIGMOD
    SIFART Symposium on Principles of Database
    Systems San Diego, CA, June 2003(PODS 2003)

Alexandre Evfimievsk Johannes Gehrke
Ramakrishnan Srikant Cornell University
Cornell University IBM Slmaden
Research Center
2
Introduction
  • Two broad approach in privacy preserving
  • secure multi-party computation approach
  • randomization approach
  • building classification models over
  • randomized data
  • discover association rules over
  • randomized data

3
Introduction
  • Privacy
  • We must ensure that the randomization is
  • sufficient for preserving privacy
  • e.g randomize age xi by adding ri(drawn
    uniformly from a segment-50, 50)
  • assuming that the server receives age 120 from
    a user than the server has learn that the real
    age of the user gt 70

4
Introduction
  • Two approaches for quantifying how privacy
    preserving a randomization method
  • Information theory
  • Privacy breaches

5
overview
  • The Model
  • N clients C1,CN connected to one server each
    Ci has private xi
  • To ensure privacy, each Ci sends a modified yi
    of xi to server
  • The server collects the modified information and
    recover the statistical properties

6
overview
  • Assumptions
  • xi VX , VX is a finite set
  • each xi is chosen independently at random
    according to the same fixed probability
    distribution px (not private)

7
overview
  • Randomization
  • randomization operator R(x)
  • yi is an instance of R(xi), is send to the
    server
  • All possible outputs of R(x) is denoted by VY
    , VY is a finite set
  • For all x VX and y VY , the probability
    that R(x) outputs y is denoted by

8
outline
  • Refined Definition of Privacy Breaches
  • Amplification
  • Itemset Randomization
  • Compression of Randomized Transactions
  • Worst- Case Information

9
Privacy breaches
  • Each possible value x of Cis private information
  • has probability px(x)
  • Define a random variable X such that
  • The randomized value yi is an instance of a
    random
  • variable Y such that
  • The joint distribution of X and Y is

10
Privacy breaches
  • Any property Q(x), Q Vx ? true, false

11
Privacy breaches
  • example
  • x between 0 1000
  • 1.R1(x) x 20, otherwise 80 (uniformly)
  • 2.R2(x) x ? (mod 1001), ? in -100 100
  • (uniformly)
  • 3.R3(x) be R2(x) 50, otherwise 50 (uniformly)

12
Privacy breaches
  • 1 ? 71.6
  • 40.5 ? 100

13
Privacy breaches
  • Some property has very low prior probability
  • but becomes likely once we learn that R(X) y
  • 1 ? 71.6
  • Some property has a probability far from 100
  • but becomes almost 100-probable
  • 40.5 ? 100

14
Privacy breaches
  • Let ?1, ?2 be two probabilities such that ?1
  • corresponds to our intuitive notion of very
  • unlikely whereas ?2 corresponds to likely

15
outline
  • Refined Definition of Privacy Breaches
  • Amplification
  • Itemset Randomization
  • Compression of Randomized Transactions
  • Worst- Case Information

16
Amplification
  • Use Def 1 to check privacy breaches
  • 1. There are 2VX possible properties check all
    ?
  • 2. Without px of X, how can we use Def 1 ?

17
Amplification
18
Amplification
19
Amplification
  • Proof
  • Assume that eor property Q(x) we have a ?1
  • to ?2 privacy breach

20
Amplification
21
Amplification
22
outline
  • Refined Definition of Privacy Breaches
  • Amplification
  • Itemset Randomization
  • Compression of Randomized Transactions
  • Worst- Case Information

23
Itemset Randomization
  • Assume that all transaction have same size m and
    each transaction is an independent instance
  • Selectasize (with parameters 0 lt ? lt 1 and
    )
  • 1.Selects an integer j at random from 0, 1, ,
    m
  • defined p j P j is chosen p j
  • 2.Select j item from t, uniformly at random, put
    them into t gt tnt j 1/(m, j)
  • 3.a ! t , tosses a coin, P head ?, if head
    added to t
  • ?m-j (1-
    ?) n-m-(m-j)

24
Itemset Randomization
  • Denote t R(t), m t, j tnt, n I

25
Itemset Randomization
26
Itemset Randomization
  • Frequent ?? Trying to have more items of t in t
  • Give ?, focus on pjs
  • Maximizing the following expectation

27
Itemset Randomization
  • Select parameters ? and to select ?
    and j

28
outline
  • Refined Definition of Privacy Breaches
  • Amplification
  • Itemset Randomization
  • Compression of Randomized Transactions
  • Worst- Case Information

29
Compressing randomized transactions
  • Randomized transactions are large
  • - Network resource
  • - Lots of memory

30
Compressing randomized transactions
  • A (Seed, n, q, ?) - pseudorandom generator is a
    function
  • G Seed 1,.,n ? 0, 1
  • that has following properties
  • - ? i P G(?, i ) 1 ?r Seed ?
  • - ? 1 i1 lt lt iq n, G(?, i1), G(?, i2),
    G(?, iq), are statistically independent

31
Compressing randomized transactions
  • We are going to represent a randomized
    transaction by a seed ? Seed
  • G(?, i ) 1 means that item i belongs to the
    randomized transaction
  • There is a mapping t from seeds to transactions
  • t(?) item i G(?, i ) 1
  • The set Seed Boolean strings 0, 1k , k ltlt n

32
Compressing randomized transactions
  • Another randomization operator similar to select
    - a - size , has parameters 0 lt ? lt 1 and
  • Given transaction t, a (Seed, n, q, ?) -
    pseudorandom generator with q m (size of t),
  • The operator generates the seed R( t ) in
    three steps

33
Compressing randomized transactions
  • 1.Selects an integer j at random from 0, 1, ,
    m
  • defined p j P j is chosen
  • 2.Select j item from t, uniformly at random, put
    them into t , W.L.O.G assume t1, 2, tj
    are selected
  • 3.Select a random seed ? Seed such that

34
outline
  • Refined Definition of Privacy Breaches
  • Amplification
  • Itemset Randomization
  • Compression of Randomized Transactions
  • Worst- Case Information

35
Worst Case information
  • X random variable, Y R(x) Random variable
  • The mutual information I ( X Y ) is

  • I(X Y) ? ? Privacy ?
  • KL(p1 p2) is Kullback-Leibler distance
  • between the distribution p1(x) and p2(x) of
  • two random variable

36
Worst Case information
  • e.g Vx 0, 1 P X 0 P X 1 ½
  • Y1 R1(X), Y2 R2(X)
  • PY1 x X x 0.6
  • PY1 1-x X x 0.4
  • PY2 e X x 0.9999
  • PY2 x X x 9910-6
  • PY2 1-x X x 110-6
  • I(X Y2) ltlt I(X Y1) ????

37
Worst Case information
38
Worst Case information
  • Revealing R(X) y for some y cause ?1 to
  • ?2 privacy breach
  • Revealing R(X) y for some y cause ?2 to
  • ?1 privacy breach

39
Conclusion
  • New definition of privacy breaches
  • A general approach amplification
  • Compressing long randomized transactions by using
    pseudorandom generators
  • Defined several new information theoretical

40
Future work
  • Continuous distribution
  • Tradeoff between privacy and accuracy
  • Combine randomization and secure multi-party
    computation approaches
Write a Comment
User Comments (0)
About PowerShow.com