Protecting Privacy when Disclosing Information presentation

About This Presentation

Transcript and Presenter's Notes

Title: Protecting Privacy when Disclosing Information

1
Protecting Privacy when Disclosing Information

Pierangela Samarati
Latanya Sweeney

2
INTRODUCTION

Todays society places demands on person-specific
data.
more and more historically public information is
also electronically available
combined, you can identify the personal
information
This paper addresses the problem of releasing
person-specific data while preserving the
person's anonymity
k-anonymity Specific information is ambiguously
mapped to k-persons

3
EXAMPLE
4
RELATED WORK

several protection techniques in statistical
databases
scrambling, adding noise, swapping values etc..
suppression and generalization techniques but no
formal foundation
Different from traditional access control -
protecting the data vs identity of the data

5
OUTLINE

Formal foundation for anonymity problem and
against linking
quasi-identifiers attribute that can be
exploited for linking
k-anonymity degree of protection of data with
respect to inference by linking
preferred generalization allows user to select
among possible minimal generalizations - choose
attributes
Here, they protect the link between the identity
and data but not the data itself

6
DEFINITIONS ASSUMPTIONS

Quasi-identifier Let T(A1,..,An) be a table. A
quasi-identifier is a set of attributes
(A1,..,Aj) subset of (A1,..,An) whose release
must be controlled.
Goal Allow release of information in the table
which is related to atleast a given number k of
individuals, k is set by data holder
k-anonymity requirement Each release of the data
must be such that every combination of
quasi-identifier can be indistinctly matched to
atleast k individuals
Issue It is impossible to match the released
data to externally available data!!

7
DEFINITIONS ASSUMPTIONS

Although the data holder knows the external
attributes(contributes to quasi-identifiers), the
specific values can not be assumed.
Key Translate the requirement in terms of the
released data
Assumption All attributes in table PT which are
to be released and which are externally available
in combination to a data recipient are defined in
a quasi-identifier
Not a trivial assumption
Sweeney examines this risk and shows that this
can not be perfectly resolved.
k-anonymity for a table Let T(A1,,An) be the
table and QT be the set of quasi-identifiers of
T. T is said to satisfy k-anonymity iff for each
QI belongs to QT, each sequence of values in
TQI appears at least with k occurences in TQI.

8
GENERALIZING DATA

first approach is based on the definition and use
of generalization relationships between domains
and between values that attributes can assume.
Z0 is the zip code domain and Z1 is the domain
where last digit is replaced by 0.
to achieve k-anonymity, map the attributes in
domain Z0 to Z1 where Z1 is more general
This mapping between domains is stated by means
of a generalization relationship which represents
a partial order D on the set Dom of domains
each domain Di has at most one direct generalized
domain
all maximal elements of Dom are
singleton(eventually all domains can be
generalized to single value)?

9
DOMAIN VALUE GENERALIZATION HIERARCHIES
10
DOMAIN GENERALIZATION HIERARCHY

Let Dom be the set of domains, given a tuple DT
(D1, , Dn) such that Di belongs to Dom for i
1,,n, DGHDT DGHD1xxDGHDn, assuming the
cartesian product is ordered by imposing
coordinate wise order.
Each path from DT to unique maximal element of
DGHDT in the graph defines a possible alternative
path
The set of nodes in each such path together with
the generalization relationship is called a
generalization strategy for DGHDT

11
GENERALIZED TABLE

Tj is a Generalized Table of Ti, written Ti
Tj iff
Ti and Tj have same number of tuples
Domain of each attribute of Tj (denoted by
dom(Az,Tj) )is equal to or generalization of the
domain of the attribute in Ti and
Each tuple ti in Ti has a corresponding tuple tj
in Tj (and vice versa) such that the value for
each attribute in tj is equal to or
generalization of the value of corresponding
attribute in ti.
Not all generalized tables are satisfactory
Dont need extreme generalized table if more
specific table exists which satisfies k-anonymity
k-minimal generalization

12
k-minimal generalization

Distance vector Let Ti(A1,,An) and Tj(A1,,An)
be two tables such that Ti Tj. The distance
vector of Tj from Ti is the vector DVi,j
d1,,dn where dz is the length of unique path
between dom(Az,Ti) and dom(Az,Tj) in DGHD
Given two distance vectors DV d1,,dn and DV
d1,,dn, DV DV iff di di for all I
1,,n DV lt DV iff DV DV and DV ? DV.
k-minimal generalization Let Ti(A1,,An) and
Tj(A1,,An) be two tables such that Ti Tj. Tj
is said to be a k-minimal generalization of Ti
iff
Tj satisfies k-anonymity
There is no Tz Ti Tz, Tz satisfies
k-anonymity and DVi,z lt DVi,j

13
EXAMPLE

For k2, GT1,0 and GT0,1 are k-minimal
generalizations, but not GT0,2 and GT1,1 For
k3, GT1,0 and GT0,2 are k-minimal
generalizations.

14
SUPPRESSING DATA

Complementary approach to generalization
Used to moderate the generalization process when
there are limited number of tuples(with less than
k occurences)?
Generalized Table with suppression Ti(A1,,An)
and Tj(A1,,An) be two tables defined on same
attributes. Tj is said to be a generalization of
Ti
if sizeof(Tj) sizeof(Ti)
For all z 1,,n dom(Az,Ti) dom(Az,Ti)
There is an injective mapping between Ti and Tj
that associates tuples ti (in Ti) and tj(in Tj)
such that tiAz tjAz
Minimal Required suppression Let Tj be a
generalization of Ti satisfying k-anonymity, Tj
is said to enforce minimal required suppression
iff there is no Tz such that Ti Tz, DVi,z
DVi,j, and sizeof(Tj) lt sizeof(Tz) and Tz
satisfies k-anonymity.

15
EXAMPLE

The tuples written in bold face and marked with
double lines in each table are the tuples that
must be suppressed to achieve k-anonymity of 2.
Suppression of any superset would not satisfy
minimal required suppression.

16
k-minimal generalization with suppression

Generalization and suppression are used in
conjunction to obtain k-anonymity
Tradeoff between generalization and suppression
Acceptable suppression threshold MaxSup
Within the threshold, suppression is considered
better.
Reason Generalization affects all the tuples
whereas Suppression affects single tuple.
k-minimal generalization with suppression
Ti(A1,,An) and Tj(A1,,An) be two tables such
that Ti Tj and MaxSup be the specific threshold
of acceptance suppression. Tj is k-minimal
generalization of Ti iff
Tj satisfies k-anonymity
Sizeof(Ti) - Sizeof(Tj) MaxSup
There is no Tz Ti Tz, Tz satisfies conditions
1 and 2 and DVi,z lt DVi,j

17
EXAMPLE
18
PREFERENCES

There may be more than one minimal
generalization. Which one to choose?
Let Tj be a generalization of Ti with distance
vector DVi,jd1,,dn.
Absdisti,j ?i1 to n di and Reldisti,j ?z1 to
n dz/hz where hz is the height of DGH of
dom(Az,Ti)?
Policies
Minimum absolute distance (smaller total number
of generalization steps)?
Minimum relative distance (smaller total number
of relative steps)?
Maximum distribution (greatest number of distinct
tuples)?
Minimum suppression (contains greater number of
tuples))?
Depends on the application

19
COMPUTING A PREFERRED GENERALIZATION

The generalization is obtained by applying the
generalization on each quasi-identifier
independently.
Local minimal generalization the generalization
that is minimal with respect to the set of
generalizations in the strategy.
Theorem Let T(A1,,An) PTQI be the table to
be generalized and let DT(D1,,Dn) be the tuple
where Dzdom(Az,T), z1,,n, to be a table to be
generalized. Every k-minimal generalization of Ti
is a local minimal generalization for some
strategy of DGHDT
From this theorem, each generalization
strategy(bottom-up) would reveal local minimal
generalization from which k-minimal
generalization and an eventual preferred
generalization is chosen.
If policies are considered, the search has to be
extended beyond first result. It might be
expensive!

20
IMPROVEMENT

Distance vector between tuples Let x(v1,,vn)
and y(v1,,vn) belong to T. the distance vector
is the vector Vx,y d1,,dn where di is the
length of the paths from v1 and v1 to their
closest common ancestor in VGH.
Theorem Let Ti and Tj be two tables such that Ti
Tj. If Tj is the k-minimal generalization then
DVi,j Vx,y for some tuples x and y in Ti such
that either x or y has a smaller number of
occurences than k.
This implies the distance vector of minimal
generalization falls within the set of vectors
between outliers and other tuples in the table.
This property is exploited by them to prune the
number of generalizations considered

21
ALGORITHM - OUTLINE

All the distinct tuples in PTQI are determined
along with the number of occurences.
All the distance vectors between outliers and
every tuple in the table is computed.
A DAG, as nodes, all the distance vectors found
is constructed.
There is an arc from each vector to all the
smallest vector dominating it in the set.
Each path is followed until a local minimal
generalization is found.
As paths may not be disjoint keep track of
visited nodes.
After all the paths are examined, k-minimal and
preferred generalizations are found.

22
EXISTANCE

Theorem Let T be a table, MaxSup sizeof(T) be
the acceptable suppression threshold and k be
natural number. If sizeof(T) k then there is
atleast one k-minimal generalization for T. If
sizeof(T) lt K, there are no non-empty k-minimal
generalizations for T.
Experiments cost reduction
Computation of distance vectors greatly reduces
the cost
Generalizations are not computed but forseen by
looking at the tuples.
The fact that the algorithm keeps track of
evaluated generalizations allows to stop
evaluation whenever it crosses the path that is
already visited.

Write a Comment

User Comments (0)

About PowerShow.com

Protecting Privacy when Disclosing Information PowerPoint PPT Presentation