Anonymizing Tables for Privacy Protection - PowerPoint PPT Presentation

About This Presentation
Title:

Anonymizing Tables for Privacy Protection

Description:

Input: Database consisting of n rows, each with m attributes drawn from a finite ... Goal: Suppress some entries in the table such that each modified row becomes ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 17
Provided by: cryptoS
Category:

less

Transcript and Presenter's Notes

Title: Anonymizing Tables for Privacy Protection


1
Anonymizing Tables for Privacy Protection
Gagan Aggarwal, Tomás Feder, Krishnaram
Kenthapadi, Rajeev Motwani, Rina Panigrahy,
Dilys Thomas, An Zhu
2
An example Medical Records
Identifying Identifying Sensitive
SSN Name Age Race Zipcode Disease
614 Sara 31 Cauc 94305 Flu
615 Joan 34 Cauc 94307 Cold
629 Kelly 27 Cauc 94301 Diabetes
710 Mike 41 Afr-A 94305 Flu
840 Carl 41 Afr-A 94059 Arthritis
780 Joe 65 Hisp 94042 Heart problem
614 Rob 46 Hisp 94042 Arthritis
3
Medical Records De-identify Release
Sensitive
Age Race Zipcode Disease
31 Cauc 94305 Flu
34 Cauc 94307 Cold
27 Cauc 94301 Diabetes
41 Afr-A 94305 Flu
41 Afr-A 94059 Arthritis
65 Hisp 94042 Heart problem
46 Hisp 94042 Arthritis
4
Not sufficient! Swe02, SS98
Sensitive
Age Race Zipcode Disease
31 Cauc 94305 Flu
34 Cauc 94307 Cold
27 Cauc 94301 Diabetes
41 Afr-A 94305 Flu
41 Afr-A 94059 Arthritis
65 Hisp 94042 Heart problem
46 Hisp 94042 Arthritis
Uniquely identify you!
Public Database
5
k-anonymity Problem Definition
  • Input Database consisting of n rows, each with m
    attributes drawn from a finite alphabet.
  • Goal Suppress some entries in the table such
    that each modified row becomes identical to at
    least k-1 other rows.
  • More the suppression, lesser the utility of the
    modified table.
  • Objective Minimize the number of suppressed
    entries.

6
Medical Records 2-anonymized table
Age Race Zipcode Disease
Cauc Flu
Cauc Cold
Cauc Diabetes
41 Afr-A Flu
41 Afr-A Arthritis
Hisp 94042 Heart problem
Hisp 94042 Arthritis
Suppress entries
Cost 10
7
k-anonymity Results
  • MW04
  • NP-hardness for a linear size alphabet
  • O(k log k) - approximation algorithm
  • NP-hardness (even for ternary alphabet)
  • O(k) - approximation for k-anonymity
  • 1.5 - approximation for 2-anonymity
  • 2 - approximation for 3-anonymity

8
O(k)-approximation algorithm (for k3)
  • Create a complete graph s.t.
  • Each row vector in the table is a vertex.
  • Weight of an edge is the number of attributes on
    which the two rows differ (Hamming distance).

Age Race Zipcode
31 Cauc 94305
34 Cauc 94307
41 Afr-A 94305
41 Afr-A 94059
9
O(k)-approximation algorithm (for k3)
  • We create a forest as follows
  • Each node picks its nearest neighbor and connects
    to it.
  • If the resulting graph has a component with only
    two nodes, connect this component to the second
    nearest neighbor of one of the two nodes.

10
An example graph
3
2
7
5
10
9
9
7
12
7
4
5
1
3
1
2
Nearest-neighbor edge
Other edges
11
The forest obtained
3
2
4
1
3
1
2
12
O(k)-approximation algorithm (for k3)
  • The forest has
  • Components of size at least 3.
  • The total cost of edges in the forest is no more
    than the cost of the optimal solution.
  • In optimal solution, each node has at least as
    many s as its Hamming distance to its second
    nearest neighbor.
  • Each node has at most as many s as the cost of
    the tree containing the node.
  • If there is any component with size greater than
    5, break it into components of size at least 3
    (resp. k).

13
The final partition
3
2
3
4
1
3
1
2
14
Analysis of the algorithm
  • Cluster the row vectors according to this
    partition
  • Cost incurred OPT (size of largest
    partition) 5 OPT.
  • For general k, the cost of this solution is
    within max3k-5,2k-1 of the cost of optimal
    solution.

15
Better than O(k)-approximation?
  • Not possible, using only the graph representation
  • Lose information about the structure of the
    problem
  • There exist two instances with
  • Same underlying graph
  • k-anonymity costs differing by a factor of O(k)

16
Open problems
  • Lower bounds on the approximation factor (without
    assuming the graph representation)
  • Extend the k-anonymity model to account for
    changes in the database
  • Handle inserts, deletes and updates
Write a Comment
User Comments (0)
About PowerShow.com