KAnonymity - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

KAnonymity

Description:

Society is experiencing exponential growth in the number ... Hang Nail. 53706. Female. 2/28/76. Flu. 53706. Female. 4/13/86. Broken Arm. 53703. Male. 1/21/76 ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 30

Provided by: yan68

Category:

more less

Transcript and Presenter's Notes

Title: KAnonymity

1
K-Anonymity

Present By He Yan
Jan.27, 2009

2
Outline

Introduction
K-Anonymity
Vulnerabilities
Conclusion

3
Data Publishing and Data Privacy

Society is experiencing exponential growth in the
number and variety of person-specific
information.
These information is valuable both in research
and business. Data sharing is common.
Publishing the data may put the respondents
privacy in risk.

4
Challenge

How do you publicly release a database without
compromising individual privacy?
The Wrong Approach
Just leave out any unique identifiers like name
and SSN and hope that this works.
Why?
The triple (DOB, gender, zip code) suffices to
uniquely identify at least 87 of US citizens in
publicly available databases (1990 U.S. Census
summary data).

5
Challenge(Contd)

Re-identification by linking (Example)
Andre has heart disease!

Hospital Patient Data
Vote Registration Data
6
Objective

Maximize data utility while limiting disclosure
risk to an acceptable level
Moral Any real privacy guarantee must be proved
and established mathematically.

7
Related Works

Statistical Databases
The most common way is adding noise and still
maintaining some statistical invariant.
Disadvantage
destroy the integrity of the data

8
Related Works(Contd)

Multi-level Databases
Data is stored at different security
classifications and users having different
security clearances.
Precise inference is possible. Sensitive
information is suppressed in order to prevent
inference.
Disadvantages
It is impossible to consider every possible
inference
Suppression can drastically reduce the quality of
the data.

9
Related Works(Contd)

Computer Security
Access control and authentication ensure that
right people has right authority to the right
object at right time and right place.
Thats not what this work tries to solve. This
work tries to release all the information as much
as the identities of the subjects (people) are
protected.

10
Related Works(Contd)

In summary, the dramatic increase in availability
of data from autonomous data holders make the
problem more complex.
None of these works provide solutions for todays
data rich setting.

11
Outline

Introduction
K-Anonymity
Vulnerabilities
Conclusion

12
K-Anonymity

What is K-Anonymity?
If each row in the table cannot be distinguished
from at least other k-1 rows by only looking a
set of attributes, then this table is
K-anonymized on these attributes.
Example
If you try to identify a person from a table,
but the only information you have is his birth
date and gender. There are k people meet the
requirement. This table adheres to k-Anonymity.

13
Classification of Attributes

Key Attribute
Name, Address, Cell Phone
which can uniquely identify an individual
directly
Always removed before release.
Quasi-Identifier
5-digit ZIP code,Birth date, gender
A set of attributes that can be potentially
linked with external information to re-identify
entities
87 of the population in U.S. can be uniquely
identified based on these attributes, according
to the Census summary data in 1991.
Suppressed or generalized

14
Classification of Attributes(Contd)

Sensitive Attribute
Medical record, wage,etc.
Always released directly. These attributes is
what the researchers need. It depends on the
requirement.

Key Attribute
Quasi-Identifier
Sensive Attribute
15
K-Anonymity Protection Model

PT Private Table
RT Released Table
QI Quasi Identifier (Ai,,Aj)
(A1,A2,,An) Attributes
Definition
Let RT(A1,...,An) be a table and QIRT be the
quasi-identifier associated with it. RT is said
to satisfy k-anonymity if and only if each
sequence of values in RTQIRT appears with at
least k occurrences in RTQIRT.

16
Example
17
Example
Release Table
External Data Source
Suppose you have a external data table. By
linking these 2 tables, you still dont know
Andres problem.
18
Outline

Introduction
K-Anonymity
Vulnerabilities
Conclusion

19
K-Anonymity Vulnerabilities

Even when sufficient care is taken to identify
the QI,
K-Anonymity is still be vulnerable to attacks.
Attacks
Unsorted Matching Attack
Complementary Release Attack
Temporal Attack
Fortunately, these attacks can be prevented by
following some best practices.

20
Unsorted Matching Attack

This attack is based on the order in which tuples
appear in the released table.
Solution
Randomly sort the tuples before releasing.

21
Complementary Release Attack

Different releases can be linked together to
compromise k-anonymity.
Solution
Consider all of the released tables before
releasing the new one, and try to avoid linking.
Other data holders may release some data that can
be used in this kind of attack.
Generally, this kind of attack is hard to be
prohibited completely.

22
Complementary Release Attack (Contd)

Both of them are 2-anonymized and QI is Race,
Birth, Gender, ZIP.
But linking them on Problem will generate LT.
See next slide.

23
Complementary Release Attack (Contd)

In LT, White, 1964, male, 02138 and White,
1965, female, 02139 are unique.
So LT doesnt satisfy 2-anonymity.

24
Temporal Attack

Adding or removing tuples may compromise
k-anonymity protection.
Solution Subsequent releases must use the
already released table.
??

25
Outline

Introduction
K-Anonymity
Vulnerabilities
Conclusion

26
Summary

K-Anonymity attributes are suppressed or
generalized until each row is identical with at
least k-1 other rows.
K-Anonymity thus can prevent definite external
table linkages. At worst, the data released
narrows down an individual entry to a group of k
individuals.
K-Anonymity guarantees that the data released is
accurate.

27
Open Issues

How to identify a proper quasi-identifier is a
hard problem.
It depends on what the external table looks like.
It is hard to predict what external tables will
be used to inference the sensitive information.
How to find a k-anonymity solution with
suppressing fewest cells? This leads to the next
paper.
We can suppress every cell, but this makes the
data useless.
The cost of K-Anonymous solution to a database is
the number of s introduced.
A minimum cost k-anonymity solution suppresses
the fewest number of cells necessary to guarantee
k-anonymity.