Personalized Privacy Preservation - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Personalized Privacy Preservation

Description:

bronchitis. 19000. M. 9. Nash. pneumonia. 18000. M. 6. Ken ... bronchitis. 19000. M. 9. pneumonia. 18000. M. 6. dyspepsia. 14000. M. 5. gastric ulcer. 12000. M ... – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 40
Provided by: foxmu1
Category:

less

Transcript and Presenter's Notes

Title: Personalized Privacy Preservation


1
Personalized Privacy Preservation
  • Xiaokui Xiao, Yufei Tao
  • City University of Hong Kong

2
Privacy preserving data publishing
  • Microdata
  • Purposes
  • Allow researchers to effectively study the
    correlation between various attributes
  • Protect the privacy of every patient

3
A naïve solution
  • It does not work. See next.

publish
4
Inference attack
An external database (a voter registration list)

Published table
Quasi-identifier (QI) attributes
An adversary
5
Generalization
  • Transform each QI value into a less specific form

A generalized table
An external database
Information loss
6
k-anonymity
  • The following table is 2-anonymous

Quasi-identifier (QI) attributes
Sensitive attribute
5 QI groups
7
Drawback of k-anonymity
  • What is the disease of Linda?

A 2-anonymous table
An external database
8
A better criterion l-diversity
  • Each QI-group
  • has at least l different sensitive values
  • even the most frequent sensitive value does not
    have a lot of tuples

A 2-diverse table
An external database
9
Motivation 1 Personalization
  • Andy does not want anyone to know that he had a
    stomach problem
  • Sarah does not mind at all if others find out
    that she had flu

A 2-diverse table
An external database
10
Motivation 2 Non-primary case

Microdata
11
Motivation 2 Non-primary case (cont.)

2-diverse table
An external database
12
Motivation 3 SA generalization
  • How many female patients are there with age above
    30?
  • 4 (60 30 1) / (60 21 1) 3
  • Real answer 1

An external database
A generalized table
13
Motivation 3 SA generalization (cont.)
  • Generalization of the sensitive attribute is
    beneficial in this case

A better generalized table
An external database
14
Personalized anonymity
  • We propose
  • a mechanism to capture personalized privacy
    requirements
  • criteria for measuring the degree of security
    provided by a generalized table
  • an algorithm for generating publishable tables

15
Guarding node
  • Andy does not want anyone to know that he had a
    stomach problem
  • He can specify stomach disease as the guarding
    node for his tuple
  • The data publisher should prevent an adversary
    from associating Andy with stomach disease

16
Guarding node
  • Sarah is willing to disclose her exact symptom
  • She can specify Ø as the guarding node for her
    tuple

17
Guarding node
  • Bill does not have any special preference
  • He can specify the guarding node for his tuple as
    the same with his sensitive value

18
A personalized approach
19
Personalized anonymity
  • A table satisfies personalized anonymity with a
    parameter pbreach
  • Iff no adversary can breach the privacy
    requirement of any tuple with a probability above
    pbreach
  • If pbreach 0.3, then any adversary should have
    no more than 30 probability to find out that
  • Andy had a stomach disease
  • Bill had dyspepsia
  • etc

20
Personalized anonymity
  • Personalized anonymity with respect to a
    predefined parameter pbreach
  • an adversary can breach the privacy requirement
    of any tuple with a probability at most pbreach
  • We need a method for calculating the breach
    probabilities

What is the probability that Andy had some
stomach problem?
21
Combinatorial reconstruction
  • Assumptions
  • the adversary has no prior knowledge about each
    individual
  • every individual involved in the microdata also
    appears in the external database

22
Combinatorial reconstruction
  • Andy does not want anyone to know that he had
    some stomach problem
  • What is the probability that the adversary can
    find out that Andy had a stomach disease?

23
Combinatorial reconstruction (cont.)
  • Can each individual appear more than once?
  • No the primary case
  • Yes the non-primary case
  • Some possible reconstructions

the primary case
the non-primary case
24
Combinatorial reconstruction (cont.)
  • Can each individual appear more than once?
  • No the primary case
  • Yes the non-primary case
  • Some possible reconstructions

the primary case
the non-primary case
25
Breach probability (primary)
  • Totally 120 possible reconstructions
  • If Andy is associated with a stomach disease in
    nb reconstructions
  • The probability that the adversary should
    associate Andy with some stomach problem is nb /
    120
  • Andy is associated with
  • gastric ulcer in 24 reconstructions
  • dyspepsia in 24 reconstructions
  • gastritis in 0 reconstructions
  • nb 48
  • The breach probability for Andys tuple is 48 /
    120 2 / 5

26
Breach probability (non-primary)
  • Totally 625 possible reconstructions
  • Andy is associated with gastric ulcer or
    dyspepsia or gastritis in 225 reconstructions
  • nb 225
  • The breach probability for Andys tuple is
  • 225 / 625 9 / 25

27
Breach probability Formal results
28
Breach probability Formal results
29
More in our paper
  • An algorithm for computing generalized tables
    that
  • satisfies personalized anonymity with predefined
    pbreach
  • reduces information loss by employing
    generalization on both the QI attributes and the
    sensitive attribute

30
Experiment settings 1
  • Goal To show that k-anonymity and l-diversity do
    not always provide sufficient privacy protection
  • Real dataset
  • Pri-leaf
  • Nonpri-leaf
  • Pri-mixed
  • Nonpri-mixed
  • Cardinality 100k

31
Degree of privacy protection (Pri-leaf)
pbreach 0.25 (k 4, l 4)
32
Degree of privacy protection (Nonpri-leaf)
pbreach 0.25 (k 4, l 4)
33
Degree of privacy protection (Pri-mixed)
pbreach 0.25 (k 4, l 4)
34
Degree of privacy protection (Nonpri-mixed)
pbreach 0.25 (k 4, l 4)
35
Experiment settings 2
  • Goal To show that applying generalization on
    both the QI attributes and the sensitive
    attribute will lead to more effective data
    analysis

36
Accuracy of analysis (no personalization)
37
Accuracy of analysis (with personalization)
38
Conclusions
  • k-anonymity and l-diversity are not sufficient
    for the Non-primary case
  • Guarding nodes allow individuals to describe
    their privacy requirements better
  • Generalization on the sensitive attribute is
    beneficial

39
Thank you!
  • Datasets and implementation are available for
    download at
  • http//www.cs.cityu.edu.hk/taoyf
Write a Comment
User Comments (0)
About PowerShow.com