Title: Personalized Privacy Preservation
1Personalized Privacy Preservation
- Xiaokui Xiao, Yufei Tao
- City University of Hong Kong
2Privacy preserving data publishing
- Microdata
- Purposes
- Allow researchers to effectively study the
correlation between various attributes - Protect the privacy of every patient
3A naïve solution
-
- It does not work. See next.
publish
4Inference attack
An external database (a voter registration list)
Published table
Quasi-identifier (QI) attributes
An adversary
5Generalization
- Transform each QI value into a less specific form
-
A generalized table
An external database
Information loss
6k-anonymity
- The following table is 2-anonymous
-
Quasi-identifier (QI) attributes
Sensitive attribute
5 QI groups
7Drawback of k-anonymity
- What is the disease of Linda?
-
A 2-anonymous table
An external database
8A better criterion l-diversity
- Each QI-group
- has at least l different sensitive values
- even the most frequent sensitive value does not
have a lot of tuples -
A 2-diverse table
An external database
9Motivation 1 Personalization
- Andy does not want anyone to know that he had a
stomach problem - Sarah does not mind at all if others find out
that she had flu -
A 2-diverse table
An external database
10Motivation 2 Non-primary case
Microdata
11Motivation 2 Non-primary case (cont.)
2-diverse table
An external database
12Motivation 3 SA generalization
- How many female patients are there with age above
30? - 4 (60 30 1) / (60 21 1) 3
- Real answer 1
-
An external database
A generalized table
13Motivation 3 SA generalization (cont.)
- Generalization of the sensitive attribute is
beneficial in this case -
A better generalized table
An external database
14Personalized anonymity
- We propose
- a mechanism to capture personalized privacy
requirements - criteria for measuring the degree of security
provided by a generalized table - an algorithm for generating publishable tables
15Guarding node
- Andy does not want anyone to know that he had a
stomach problem - He can specify stomach disease as the guarding
node for his tuple - The data publisher should prevent an adversary
from associating Andy with stomach disease
16Guarding node
- Sarah is willing to disclose her exact symptom
- She can specify Ø as the guarding node for her
tuple
17Guarding node
- Bill does not have any special preference
- He can specify the guarding node for his tuple as
the same with his sensitive value
18A personalized approach
19Personalized anonymity
- A table satisfies personalized anonymity with a
parameter pbreach - Iff no adversary can breach the privacy
requirement of any tuple with a probability above
pbreach - If pbreach 0.3, then any adversary should have
no more than 30 probability to find out that - Andy had a stomach disease
- Bill had dyspepsia
- etc
20Personalized anonymity
- Personalized anonymity with respect to a
predefined parameter pbreach - an adversary can breach the privacy requirement
of any tuple with a probability at most pbreach
- We need a method for calculating the breach
probabilities
What is the probability that Andy had some
stomach problem?
21Combinatorial reconstruction
- Assumptions
- the adversary has no prior knowledge about each
individual - every individual involved in the microdata also
appears in the external database
22Combinatorial reconstruction
- Andy does not want anyone to know that he had
some stomach problem - What is the probability that the adversary can
find out that Andy had a stomach disease?
23Combinatorial reconstruction (cont.)
- Can each individual appear more than once?
- No the primary case
- Yes the non-primary case
- Some possible reconstructions
the primary case
the non-primary case
24Combinatorial reconstruction (cont.)
- Can each individual appear more than once?
- No the primary case
- Yes the non-primary case
- Some possible reconstructions
the primary case
the non-primary case
25Breach probability (primary)
- Totally 120 possible reconstructions
- If Andy is associated with a stomach disease in
nb reconstructions - The probability that the adversary should
associate Andy with some stomach problem is nb /
120 - Andy is associated with
- gastric ulcer in 24 reconstructions
- dyspepsia in 24 reconstructions
- gastritis in 0 reconstructions
- nb 48
- The breach probability for Andys tuple is 48 /
120 2 / 5
26Breach probability (non-primary)
- Totally 625 possible reconstructions
- Andy is associated with gastric ulcer or
dyspepsia or gastritis in 225 reconstructions - nb 225
- The breach probability for Andys tuple is
- 225 / 625 9 / 25
27Breach probability Formal results
28Breach probability Formal results
29More in our paper
- An algorithm for computing generalized tables
that - satisfies personalized anonymity with predefined
pbreach - reduces information loss by employing
generalization on both the QI attributes and the
sensitive attribute
30Experiment settings 1
- Goal To show that k-anonymity and l-diversity do
not always provide sufficient privacy protection - Real dataset
- Pri-leaf
- Nonpri-leaf
- Pri-mixed
- Nonpri-mixed
- Cardinality 100k
31Degree of privacy protection (Pri-leaf)
pbreach 0.25 (k 4, l 4)
32Degree of privacy protection (Nonpri-leaf)
pbreach 0.25 (k 4, l 4)
33Degree of privacy protection (Pri-mixed)
pbreach 0.25 (k 4, l 4)
34Degree of privacy protection (Nonpri-mixed)
pbreach 0.25 (k 4, l 4)
35Experiment settings 2
- Goal To show that applying generalization on
both the QI attributes and the sensitive
attribute will lead to more effective data
analysis
36Accuracy of analysis (no personalization)
37Accuracy of analysis (with personalization)
38Conclusions
- k-anonymity and l-diversity are not sufficient
for the Non-primary case - Guarding nodes allow individuals to describe
their privacy requirements better - Generalization on the sensitive attribute is
beneficial
39Thank you!
- Datasets and implementation are available for
download at - http//www.cs.cityu.edu.hk/taoyf