Anonymizing Tables for Privacy Protection

About This Presentation

Title:

Anonymizing Tables for Privacy Protection

Description:

Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tom s Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 17

Provided by: cryptoSta

Learn more at: https://crypto.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Anonymizing Tables for Privacy Protection

1
Anonymizing Tables for Privacy Protection
Gagan Aggarwal, Tomás Feder, Krishnaram
Kenthapadi, Rajeev Motwani, Rina Panigrahy,
Dilys Thomas, An Zhu
2
An example Medical Records
Identifying Identifying Sensitive
SSN Name Age Race Zipcode Disease
614 Sara 31 Cauc 94305 Flu
615 Joan 34 Cauc 94307 Cold
629 Kelly 27 Cauc 94301 Diabetes
710 Mike 41 Afr-A 94305 Flu
840 Carl 41 Afr-A 94059 Arthritis
780 Joe 65 Hisp 94042 Heart problem
614 Rob 46 Hisp 94042 Arthritis
3
Medical Records De-identify Release
Sensitive
Age Race Zipcode Disease
31 Cauc 94305 Flu
34 Cauc 94307 Cold
27 Cauc 94301 Diabetes
41 Afr-A 94305 Flu
41 Afr-A 94059 Arthritis
65 Hisp 94042 Heart problem
46 Hisp 94042 Arthritis
4
Not sufficient! Swe02, SS98
Sensitive
Age Race Zipcode Disease
31 Cauc 94305 Flu
34 Cauc 94307 Cold
27 Cauc 94301 Diabetes
41 Afr-A 94305 Flu
41 Afr-A 94059 Arthritis
65 Hisp 94042 Heart problem
46 Hisp 94042 Arthritis
Uniquely identify you!
Public Database
5
k-anonymity Problem Definition

Input Database consisting of n rows, each with m
attributes drawn from a finite alphabet.
Goal Suppress some entries in the table such
that each modified row becomes identical to at
least k-1 other rows.
More the suppression, lesser the utility of the
modified table.
Objective Minimize the number of suppressed
entries.

6
Medical Records 2-anonymized table
Age Race Zipcode Disease
Cauc Flu
Cauc Cold
Cauc Diabetes
41 Afr-A Flu
41 Afr-A Arthritis
Hisp 94042 Heart problem
Hisp 94042 Arthritis
Suppress entries
Cost 10
7
k-anonymity Results

MW04
NP-hardness for a linear size alphabet
O(k log k) - approximation algorithm

NP-hardness (even for ternary alphabet)
O(k) - approximation for k-anonymity
1.5 - approximation for 2-anonymity
2 - approximation for 3-anonymity

8
O(k)-approximation algorithm (for k3)

Create a complete graph s.t.
Each row vector in the table is a vertex.
Weight of an edge is the number of attributes on
which the two rows differ (Hamming distance).

Age Race Zipcode
31 Cauc 94305
34 Cauc 94307
41 Afr-A 94305
41 Afr-A 94059
9
O(k)-approximation algorithm (for k3)

We create a forest as follows
Each node picks its nearest neighbor and connects
to it.
If the resulting graph has a component with only
two nodes, connect this component to the second
nearest neighbor of one of the two nodes.

10
An example graph
3
2
7
5
10
9
9
7
12
7
4
5
1
3
1
2
Nearest-neighbor edge
Other edges
11
The forest obtained
3
2
4
1
3
1
2
12
O(k)-approximation algorithm (for k3)

The forest has
Components of size at least 3.
The total cost of edges in the forest is no more
than the cost of the optimal solution.
In optimal solution, each node has at least as
many s as its Hamming distance to its second
nearest neighbor.
Each node has at most as many s as the cost of
the tree containing the node.
If there is any component with size greater than
5, break it into components of size at least 3
(resp. k).

13
The final partition
3
2
3
4
1
3
1
2
14
Analysis of the algorithm

Cluster the row vectors according to this
partition
Cost incurred OPT (size of largest
partition) 5 OPT.
For general k, the cost of this solution is
within max3k-5,2k-1 of the cost of optimal
solution.

15
Better than O(k)-approximation?

Not possible, using only the graph representation
Lose information about the structure of the
problem
There exist two instances with
Same underlying graph
k-anonymity costs differing by a factor of O(k)

16
Open problems

Lower bounds on the approximation factor (without
assuming the graph representation)
Extend the k-anonymity model to account for
changes in the database
Handle inserts, deletes and updates

Write a Comment

User Comments (0)

About PowerShow.com

Anonymizing Tables for Privacy Protection - PowerPoint PPT Presentation

Anonymizing Tables for Privacy Protection

Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tom s Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu – PowerPoint PPT presentation