Protecting Privacy when Disclosing Information PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: Protecting Privacy when Disclosing Information


1
Protecting Privacy when Disclosing Information
  • Pierangela Samarati
  • Latanya Sweeney

2
INTRODUCTION
  • Todays society places demands on person-specific
    data.
  • more and more historically public information is
    also electronically available
  • combined, you can identify the personal
    information
  • This paper addresses the problem of releasing
    person-specific data while preserving the
    person's anonymity
  • k-anonymity Specific information is ambiguously
    mapped to k-persons

3
EXAMPLE
4
RELATED WORK
  • several protection techniques in statistical
    databases
  • scrambling, adding noise, swapping values etc..
  • suppression and generalization techniques but no
    formal foundation
  • Different from traditional access control -
    protecting the data vs identity of the data

5
OUTLINE
  • Formal foundation for anonymity problem and
    against linking
  • quasi-identifiers attribute that can be
    exploited for linking
  • k-anonymity degree of protection of data with
    respect to inference by linking
  • preferred generalization allows user to select
    among possible minimal generalizations - choose
    attributes
  • Here, they protect the link between the identity
    and data but not the data itself

6
DEFINITIONS ASSUMPTIONS
  • Quasi-identifier Let T(A1,..,An) be a table. A
    quasi-identifier is a set of attributes
    (A1,..,Aj) subset of (A1,..,An) whose release
    must be controlled.
  • Goal Allow release of information in the table
    which is related to atleast a given number k of
    individuals, k is set by data holder
  • k-anonymity requirement Each release of the data
    must be such that every combination of
    quasi-identifier can be indistinctly matched to
    atleast k individuals
  • Issue It is impossible to match the released
    data to externally available data!!

7
DEFINITIONS ASSUMPTIONS
  • Although the data holder knows the external
    attributes(contributes to quasi-identifiers), the
    specific values can not be assumed.
  • Key Translate the requirement in terms of the
    released data
  • Assumption All attributes in table PT which are
    to be released and which are externally available
    in combination to a data recipient are defined in
    a quasi-identifier
  • Not a trivial assumption
  • Sweeney examines this risk and shows that this
    can not be perfectly resolved.
  • k-anonymity for a table Let T(A1,,An) be the
    table and QT be the set of quasi-identifiers of
    T. T is said to satisfy k-anonymity iff for each
    QI belongs to QT, each sequence of values in
    TQI appears at least with k occurences in TQI.

8
GENERALIZING DATA
  • first approach is based on the definition and use
    of generalization relationships between domains
    and between values that attributes can assume.
  • Z0 is the zip code domain and Z1 is the domain
    where last digit is replaced by 0.
  • to achieve k-anonymity, map the attributes in
    domain Z0 to Z1 where Z1 is more general
  • This mapping between domains is stated by means
    of a generalization relationship which represents
    a partial order D on the set Dom of domains
  • each domain Di has at most one direct generalized
    domain
  • all maximal elements of Dom are
    singleton(eventually all domains can be
    generalized to single value)?

9
DOMAIN VALUE GENERALIZATION HIERARCHIES
10
DOMAIN GENERALIZATION HIERARCHY
  • Let Dom be the set of domains, given a tuple DT
    (D1, , Dn) such that Di belongs to Dom for i
    1,,n, DGHDT DGHD1xxDGHDn, assuming the
    cartesian product is ordered by imposing
    coordinate wise order.
  • Each path from DT to unique maximal element of
    DGHDT in the graph defines a possible alternative
    path
  • The set of nodes in each such path together with
    the generalization relationship is called a
    generalization strategy for DGHDT

11
GENERALIZED TABLE
  • Tj is a Generalized Table of Ti, written Ti
    Tj iff
  • Ti and Tj have same number of tuples
  • Domain of each attribute of Tj (denoted by
    dom(Az,Tj) )is equal to or generalization of the
    domain of the attribute in Ti and
  • Each tuple ti in Ti has a corresponding tuple tj
    in Tj (and vice versa) such that the value for
    each attribute in tj is equal to or
    generalization of the value of corresponding
    attribute in ti.
  • Not all generalized tables are satisfactory
  • Dont need extreme generalized table if more
    specific table exists which satisfies k-anonymity
  • k-minimal generalization

12
k-minimal generalization
  • Distance vector Let Ti(A1,,An) and Tj(A1,,An)
    be two tables such that Ti Tj. The distance
    vector of Tj from Ti is the vector DVi,j
    d1,,dn where dz is the length of unique path
    between dom(Az,Ti) and dom(Az,Tj) in DGHD
  • Given two distance vectors DV d1,,dn and DV
    d1,,dn, DV DV iff di di for all I
    1,,n DV lt DV iff DV DV and DV ? DV.
  • k-minimal generalization Let Ti(A1,,An) and
    Tj(A1,,An) be two tables such that Ti Tj. Tj
    is said to be a k-minimal generalization of Ti
    iff
  • Tj satisfies k-anonymity
  • There is no Tz Ti Tz, Tz satisfies
    k-anonymity and DVi,z lt DVi,j

13
EXAMPLE
  • For k2, GT1,0 and GT0,1 are k-minimal
    generalizations, but not GT0,2 and GT1,1 For
    k3, GT1,0 and GT0,2 are k-minimal
    generalizations.

14
SUPPRESSING DATA
  • Complementary approach to generalization
  • Used to moderate the generalization process when
    there are limited number of tuples(with less than
    k occurences)?
  • Generalized Table with suppression Ti(A1,,An)
    and Tj(A1,,An) be two tables defined on same
    attributes. Tj is said to be a generalization of
    Ti
  • if sizeof(Tj) sizeof(Ti)
  • For all z 1,,n dom(Az,Ti) dom(Az,Ti)
  • There is an injective mapping between Ti and Tj
    that associates tuples ti (in Ti) and tj(in Tj)
    such that tiAz tjAz
  • Minimal Required suppression Let Tj be a
    generalization of Ti satisfying k-anonymity, Tj
    is said to enforce minimal required suppression
    iff there is no Tz such that Ti Tz, DVi,z
    DVi,j, and sizeof(Tj) lt sizeof(Tz) and Tz
    satisfies k-anonymity.

15
EXAMPLE
  • The tuples written in bold face and marked with
    double lines in each table are the tuples that
    must be suppressed to achieve k-anonymity of 2.
    Suppression of any superset would not satisfy
    minimal required suppression.

16
k-minimal generalization with suppression
  • Generalization and suppression are used in
    conjunction to obtain k-anonymity
  • Tradeoff between generalization and suppression
  • Acceptable suppression threshold MaxSup
  • Within the threshold, suppression is considered
    better.
  • Reason Generalization affects all the tuples
    whereas Suppression affects single tuple.
  • k-minimal generalization with suppression
    Ti(A1,,An) and Tj(A1,,An) be two tables such
    that Ti Tj and MaxSup be the specific threshold
    of acceptance suppression. Tj is k-minimal
    generalization of Ti iff
  • Tj satisfies k-anonymity
  • Sizeof(Ti) - Sizeof(Tj) MaxSup
  • There is no Tz Ti Tz, Tz satisfies conditions
    1 and 2 and DVi,z lt DVi,j

17
EXAMPLE
18
PREFERENCES
  • There may be more than one minimal
    generalization. Which one to choose?
  • Let Tj be a generalization of Ti with distance
    vector DVi,jd1,,dn.
  • Absdisti,j ?i1 to n di and Reldisti,j ?z1 to
    n dz/hz where hz is the height of DGH of
    dom(Az,Ti)?
  • Policies
  • Minimum absolute distance (smaller total number
    of generalization steps)?
  • Minimum relative distance (smaller total number
    of relative steps)?
  • Maximum distribution (greatest number of distinct
    tuples)?
  • Minimum suppression (contains greater number of
    tuples))?
  • Depends on the application

19
COMPUTING A PREFERRED GENERALIZATION
  • The generalization is obtained by applying the
    generalization on each quasi-identifier
    independently.
  • Local minimal generalization the generalization
    that is minimal with respect to the set of
    generalizations in the strategy.
  • Theorem Let T(A1,,An) PTQI be the table to
    be generalized and let DT(D1,,Dn) be the tuple
    where Dzdom(Az,T), z1,,n, to be a table to be
    generalized. Every k-minimal generalization of Ti
    is a local minimal generalization for some
    strategy of DGHDT
  • From this theorem, each generalization
    strategy(bottom-up) would reveal local minimal
    generalization from which k-minimal
    generalization and an eventual preferred
    generalization is chosen.
  • If policies are considered, the search has to be
    extended beyond first result. It might be
    expensive!

20
IMPROVEMENT
  • Distance vector between tuples Let x(v1,,vn)
    and y(v1,,vn) belong to T. the distance vector
    is the vector Vx,y d1,,dn where di is the
    length of the paths from v1 and v1 to their
    closest common ancestor in VGH.
  • Theorem Let Ti and Tj be two tables such that Ti
    Tj. If Tj is the k-minimal generalization then
    DVi,j Vx,y for some tuples x and y in Ti such
    that either x or y has a smaller number of
    occurences than k.
  • This implies the distance vector of minimal
    generalization falls within the set of vectors
    between outliers and other tuples in the table.
  • This property is exploited by them to prune the
    number of generalizations considered

21
ALGORITHM - OUTLINE
  • All the distinct tuples in PTQI are determined
    along with the number of occurences.
  • All the distance vectors between outliers and
    every tuple in the table is computed.
  • A DAG, as nodes, all the distance vectors found
    is constructed.
  • There is an arc from each vector to all the
    smallest vector dominating it in the set.
  • Each path is followed until a local minimal
    generalization is found.
  • As paths may not be disjoint keep track of
    visited nodes.
  • After all the paths are examined, k-minimal and
    preferred generalizations are found.

22
EXISTANCE
  • Theorem Let T be a table, MaxSup sizeof(T) be
    the acceptable suppression threshold and k be
    natural number. If sizeof(T) k then there is
    atleast one k-minimal generalization for T. If
    sizeof(T) lt K, there are no non-empty k-minimal
    generalizations for T.
  • Experiments cost reduction
  • Computation of distance vectors greatly reduces
    the cost
  • Generalizations are not computed but forseen by
    looking at the tuples.
  • The fact that the algorithm keeps track of
    evaluated generalizations allows to stop
    evaluation whenever it crosses the path that is
    already visited.
Write a Comment
User Comments (0)
About PowerShow.com