Personalized Privacy Preservation - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Personalized Privacy Preservation

Description:

bronchitis. 19000. M. 9. Nash. pneumonia. 18000. M. 6. Ken ... bronchitis. 19000. M. 9. pneumonia. 18000. M. 6. dyspepsia. 14000. M. 5. gastric ulcer. 12000. M ... – PowerPoint PPT presentation

Number of Views:180

Avg rating:3.0/5.0

Slides: 40

Provided by: foxmu1

Category:

more less

Transcript and Presenter's Notes

Title: Personalized Privacy Preservation

1
Personalized Privacy Preservation

Xiaokui Xiao, Yufei Tao
City University of Hong Kong

2
Privacy preserving data publishing

Microdata
Purposes
Allow researchers to effectively study the
correlation between various attributes
Protect the privacy of every patient

3
A naïve solution

It does not work. See next.

publish
4
Inference attack
An external database (a voter registration list)

Published table
Quasi-identifier (QI) attributes
An adversary
5
Generalization

Transform each QI value into a less specific form

A generalized table
An external database
Information loss
6
k-anonymity

The following table is 2-anonymous

Quasi-identifier (QI) attributes
Sensitive attribute
5 QI groups
7
Drawback of k-anonymity

What is the disease of Linda?

A 2-anonymous table
An external database
8
A better criterion l-diversity

Each QI-group
has at least l different sensitive values
even the most frequent sensitive value does not
have a lot of tuples

A 2-diverse table
An external database
9
Motivation 1 Personalization

Andy does not want anyone to know that he had a
stomach problem
Sarah does not mind at all if others find out
that she had flu

A 2-diverse table
An external database
10
Motivation 2 Non-primary case

Microdata
11
Motivation 2 Non-primary case (cont.)

2-diverse table
An external database
12
Motivation 3 SA generalization

How many female patients are there with age above
30?
4 (60 30 1) / (60 21 1) 3
Real answer 1

An external database
A generalized table
13
Motivation 3 SA generalization (cont.)

Generalization of the sensitive attribute is
beneficial in this case

A better generalized table
An external database
14
Personalized anonymity

We propose
a mechanism to capture personalized privacy
requirements
criteria for measuring the degree of security
provided by a generalized table
an algorithm for generating publishable tables

15
Guarding node

Andy does not want anyone to know that he had a
stomach problem
He can specify stomach disease as the guarding
node for his tuple
The data publisher should prevent an adversary
from associating Andy with stomach disease

16
Guarding node

Sarah is willing to disclose her exact symptom
She can specify Ø as the guarding node for her
tuple

17
Guarding node

Bill does not have any special preference
He can specify the guarding node for his tuple as
the same with his sensitive value

18
A personalized approach
19
Personalized anonymity

A table satisfies personalized anonymity with a
parameter pbreach
Iff no adversary can breach the privacy
requirement of any tuple with a probability above
pbreach
If pbreach 0.3, then any adversary should have
no more than 30 probability to find out that
Andy had a stomach disease
Bill had dyspepsia
etc

20
Personalized anonymity

Personalized anonymity with respect to a
predefined parameter pbreach
an adversary can breach the privacy requirement
of any tuple with a probability at most pbreach

We need a method for calculating the breach
probabilities

What is the probability that Andy had some
stomach problem?
21
Combinatorial reconstruction

Assumptions
the adversary has no prior knowledge about each
individual
every individual involved in the microdata also
appears in the external database

22
Combinatorial reconstruction

Andy does not want anyone to know that he had
some stomach problem
What is the probability that the adversary can
find out that Andy had a stomach disease?

23
Combinatorial reconstruction (cont.)

Can each individual appear more than once?
No the primary case
Yes the non-primary case
Some possible reconstructions

the primary case
the non-primary case
24
Combinatorial reconstruction (cont.)

Can each individual appear more than once?
No the primary case
Yes the non-primary case
Some possible reconstructions

the primary case
the non-primary case
25
Breach probability (primary)

Totally 120 possible reconstructions
If Andy is associated with a stomach disease in
nb reconstructions
The probability that the adversary should
associate Andy with some stomach problem is nb /
120
Andy is associated with
gastric ulcer in 24 reconstructions
dyspepsia in 24 reconstructions
gastritis in 0 reconstructions
nb 48
The breach probability for Andys tuple is 48 /
120 2 / 5

26
Breach probability (non-primary)

Totally 625 possible reconstructions
Andy is associated with gastric ulcer or
dyspepsia or gastritis in 225 reconstructions
nb 225
The breach probability for Andys tuple is
225 / 625 9 / 25

27
Breach probability Formal results
28
Breach probability Formal results
29
More in our paper

An algorithm for computing generalized tables
that
satisfies personalized anonymity with predefined
pbreach
reduces information loss by employing
generalization on both the QI attributes and the
sensitive attribute

30
Experiment settings 1

Goal To show that k-anonymity and l-diversity do
not always provide sufficient privacy protection
Real dataset
Pri-leaf
Nonpri-leaf
Pri-mixed
Nonpri-mixed
Cardinality 100k

31
Degree of privacy protection (Pri-leaf)
pbreach 0.25 (k 4, l 4)
32
Degree of privacy protection (Nonpri-leaf)
pbreach 0.25 (k 4, l 4)
33
Degree of privacy protection (Pri-mixed)
pbreach 0.25 (k 4, l 4)
34
Degree of privacy protection (Nonpri-mixed)
pbreach 0.25 (k 4, l 4)
35
Experiment settings 2