KAnonymity - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

KAnonymity

Description:

Society is experiencing exponential growth in the number ... Hang Nail. 53706. Female. 2/28/76. Flu. 53706. Female. 4/13/86. Broken Arm. 53703. Male. 1/21/76 ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 30
Provided by: yan68
Category:
Tags: hang | kanonymity

less

Transcript and Presenter's Notes

Title: KAnonymity


1
K-Anonymity
  • Present By He Yan
  • Jan.27, 2009

2
Outline
  • Introduction
  • K-Anonymity
  • Vulnerabilities
  • Conclusion

3
Data Publishing and Data Privacy
  • Society is experiencing exponential growth in the
    number and variety of person-specific
    information.
  • These information is valuable both in research
    and business. Data sharing is common.
  • Publishing the data may put the respondents
    privacy in risk.

4
Challenge
  • How do you publicly release a database without
    compromising individual privacy?
  • The Wrong Approach
  • Just leave out any unique identifiers like name
    and SSN and hope that this works.
  • Why?
  • The triple (DOB, gender, zip code) suffices to
    uniquely identify at least 87 of US citizens in
    publicly available databases (1990 U.S. Census
    summary data).

5
Challenge(Contd)
  • Re-identification by linking (Example)
  • Andre has heart disease!

Hospital Patient Data
Vote Registration Data
6
Objective
  • Maximize data utility while limiting disclosure
    risk to an acceptable level
  • Moral Any real privacy guarantee must be proved
    and established mathematically.

7
Related Works
  • Statistical Databases
  • The most common way is adding noise and still
    maintaining some statistical invariant.
  • Disadvantage
  • destroy the integrity of the data

8
Related Works(Contd)
  • Multi-level Databases
  • Data is stored at different security
    classifications and users having different
    security clearances.
  • Precise inference is possible. Sensitive
    information is suppressed in order to prevent
    inference.
  • Disadvantages
  • It is impossible to consider every possible
    inference
  • Suppression can drastically reduce the quality of
    the data.

9
Related Works(Contd)
  • Computer Security
  • Access control and authentication ensure that
    right people has right authority to the right
    object at right time and right place.
  • Thats not what this work tries to solve. This
    work tries to release all the information as much
    as the identities of the subjects (people) are
    protected.

10
Related Works(Contd)
  • In summary, the dramatic increase in availability
    of data from autonomous data holders make the
    problem more complex.
  • None of these works provide solutions for todays
    data rich setting.

11
Outline
  • Introduction
  • K-Anonymity
  • Vulnerabilities
  • Conclusion

12
K-Anonymity
  • What is K-Anonymity?
  • If each row in the table cannot be distinguished
    from at least other k-1 rows by only looking a
    set of attributes, then this table is
    K-anonymized on these attributes.
  • Example
  • If you try to identify a person from a table,
    but the only information you have is his birth
    date and gender. There are k people meet the
    requirement. This table adheres to k-Anonymity.

13
Classification of Attributes
  • Key Attribute
  • Name, Address, Cell Phone
  • which can uniquely identify an individual
    directly
  • Always removed before release.
  • Quasi-Identifier
  • 5-digit ZIP code,Birth date, gender
  • A set of attributes that can be potentially
    linked with external information to re-identify
    entities
  • 87 of the population in U.S. can be uniquely
    identified based on these attributes, according
    to the Census summary data in 1991.
  • Suppressed or generalized

14
Classification of Attributes(Contd)
  • Sensitive Attribute
  • Medical record, wage,etc.
  • Always released directly. These attributes is
    what the researchers need. It depends on the
    requirement.

Key Attribute
Quasi-Identifier
Sensive Attribute
15
K-Anonymity Protection Model
  • PT Private Table
  • RT Released Table
  • QI Quasi Identifier (Ai,,Aj)
  • (A1,A2,,An) Attributes
  • Definition
  • Let RT(A1,...,An) be a table and QIRT be the
    quasi-identifier associated with it. RT is said
    to satisfy k-anonymity if and only if each
    sequence of values in RTQIRT appears with at
    least k occurrences in RTQIRT.

16
Example
17
Example
Release Table
External Data Source
Suppose you have a external data table. By
linking these 2 tables, you still dont know
Andres problem.
18
Outline
  • Introduction
  • K-Anonymity
  • Vulnerabilities
  • Conclusion

19
K-Anonymity Vulnerabilities
  • Even when sufficient care is taken to identify
    the QI,
  • K-Anonymity is still be vulnerable to attacks.
  • Attacks
  • Unsorted Matching Attack
  • Complementary Release Attack
  • Temporal Attack
  • Fortunately, these attacks can be prevented by
    following some best practices.

20
Unsorted Matching Attack
  • This attack is based on the order in which tuples
    appear in the released table.
  • Solution
  • Randomly sort the tuples before releasing.

21
Complementary Release Attack
  • Different releases can be linked together to
    compromise k-anonymity.
  • Solution
  • Consider all of the released tables before
    releasing the new one, and try to avoid linking.
  • Other data holders may release some data that can
    be used in this kind of attack.
  • Generally, this kind of attack is hard to be
    prohibited completely.

22
Complementary Release Attack (Contd)
  • Both of them are 2-anonymized and QI is Race,
    Birth, Gender, ZIP.
  • But linking them on Problem will generate LT.
    See next slide.

23
Complementary Release Attack (Contd)
  • In LT, White, 1964, male, 02138 and White,
    1965, female, 02139 are unique.
  • So LT doesnt satisfy 2-anonymity.

24
Temporal Attack
  • Adding or removing tuples may compromise
    k-anonymity protection.
  • Solution Subsequent releases must use the
    already released table.
  • ??

25
Outline
  • Introduction
  • K-Anonymity
  • Vulnerabilities
  • Conclusion

26
Summary
  • K-Anonymity attributes are suppressed or
    generalized until each row is identical with at
    least k-1 other rows.
  • K-Anonymity thus can prevent definite external
    table linkages. At worst, the data released
    narrows down an individual entry to a group of k
    individuals.
  • K-Anonymity guarantees that the data released is
    accurate.

27
Open Issues
  • How to identify a proper quasi-identifier is a
    hard problem.
  • It depends on what the external table looks like.
  • It is hard to predict what external tables will
    be used to inference the sensitive information.
  • How to find a k-anonymity solution with
    suppressing fewest cells? This leads to the next
    paper.
  • We can suppress every cell, but this makes the
    data useless.
  • The cost of K-Anonymous solution to a database is
    the number of s introduced.
  • A minimum cost k-anonymity solution suppresses
    the fewest number of cells necessary to guarantee
    k-anonymity.

28
Open Issues (Contd)
  • k-anonymity does not provide privacy if
  • Sensitive values in an equivalence class lack
    diversity
  • The attacker has background knowledge
  • This leads to the l-Diversity paper

A 3-anonymous patient table
Lack diversity
Background Knowledge (Carls brother has heart
disease)
29
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com