Protecting Respondents Identities in Microdata Release - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Protecting Respondents Identities in Microdata Release

Description:

K = 2. The quasi-identifier is identified as (DOB,Sex,Zip) Name. Address. City. Zip. DOB ... locally minimal generalizations and choose the globally preferred ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 13
Provided by: sno4
Category:

less

Transcript and Presenter's Notes

Title: Protecting Respondents Identities in Microdata Release


1
Protecting Respondents Identities in Microdata
Release
  • Sushil Jajodia

2
References
  • P. Samarati, Protecting respondents identities
    in microdata release, IEEE Trans. On Knowledge
    and Data Engineering, Vol. 13, No. 6, 2001, pages
    1010-1027.
  • See also papers by Latanya Sweeney, CMU

3
Outline
  • The problem
  • Related work
  • Generalizing data
  • Suppressing data
  • Obtaining k-minimal generalization
  • Conclusion

4
Re-identifying Anonymous Data by Linking Attack
  • Anonymous medical data
  • Public available voter list
  • Sue Carlson has Aids!
  • (DOB,Sex,Zip) is thus called a quasi-identifier
  • Assumption quasi-identifier is pre-identified

5
K-anonymity
  • Anonymous medical data
  • Public available voter list
  • Who has Aids?
  • K 2
  • The quasi-identifier is identified as
    (DOB,Sex,Zip)

6
Related Work
  • The release of macrodata (i.e., tabular data
    containing aggregates) without privacy breaches
    is better studied than that of micro-data (i.e.,
    specific tuples)
  • Many existing approaches on micro-data releases
    are based on perturbation (adding noises), which
    loses truthfulness
  • Others lack a formal framework
  • This work gives a formal foundation for the
    problem (by formalizing k-anonymity, etc.),
    provides two methods to achieve the goal
    (generalization and suppression), and presents
    algorithms for computing the desired result

7
Generalizing Data Generalization Hierarchies
Z2 220
  • 220

Z1 2203,2204
  • 2203
  • 2204

Z0 22031,22030,22041,22044
  • 22031
  • 22030
  • 22041
  • 22044

R1 person
  • person

R0 asian,black,white
  • asian
  • black
  • white
  • Domain generalization hierarchy(totally ordered)
  • Value generalization hierarchy

8
Generalizing Data Table Generalization
  • ltR1,Z2gt
  • ltR1,Z1gt
  • ltR0,Z2gt
  • ltR1,Z0gt
  • ltR0,Z1gt
  • ltR0,Z0gt
  • Generalization hierarchy (partially ordered)
  • GT1,0
  • GT0,1
  • PT
  • Table Generalization
  • The same number of tuples
  • All the domains are generalized, or remain the
    same
  • All the values are generalized (a bijection
    exists)

9
Generalizing Data k-minimal Generalization
  • K-minimal generalization
  • The generalized table GT satisfies k-anonymity
  • GT is minimal (no other generalized table can
    satisfy k-anonymity and at the same time being
    generalized by GT)
  • For example, GT1,0 is 2-minimal generalization
    and GT0,1 is 3-minimal generalization

10
Suppressing Data Why? How?
  • Had the last tuple not been there, one-step
    generalization on Zip is enough
  • Suppress the last tuple reduces the level of
    generalization, and thus provides better accuracy
    of released data
  • Given a generalization level, minimal required
    suppression removes all and only the tuples that
    fail the k-anonymity requirements
  • K-minimal generalization is revised so
    suppression can be used, given that the number of
    suppressed tuples is no more than a given
    threshold

11
Computing k-minimal Generalization
  • The naïve approach
  • Searching for a locally minimal generalization
    along each path in the table generalization
    hierarchy, bottom-up
  • Then compare those locally minimal
    generalizations and choose the globally preferred
    result
  • Binary search
  • Based on a simple fact if a generalization fails
    the k-anonymity criteria, then those lower than
    it in the hierarchy will fail, too
  • Do a binary search w.r.t to the height (length of
    path to the bottom) of generalizations
  • Preference policies
  • For example, if the first hierarchy has 100
    elements while the second has two, then 1,0 may
    be much better than 0,1
  • As another example, the result requiring least
    suppression may be desired over those with less
    generalization

12
Conclusion
  • The problem is to release specific tuple without
    being vulnerable to linking attack of
    individuals privacy
  • The goal is formalized as k-anonymity (i.e.,
    every tuple can be linked to at least k
    indistinct individuals)
  • Generalization is to release less specific data
    such that k-anonymity can be achieved (e.g.,
    22030 ? 220)
  • Suppression is to suppress some of the tuples in
    order to avoid excessive generalization
  • The combination of the two methods yields the
    best result
  • By exploiting the hierarchies, binary search can
    more quickly locates the desired optimal solution
Write a Comment
User Comments (0)
About PowerShow.com