Title: Privacy-preserving Anonymization of Set Value Data
1Privacy-preserving Anonymization of Set Value
Data
Manolis Terrovitis, Nikos Mamoulis University of
Hong Kong Panos Kalnis National University of
Singapore www.comp.nus.edu.sg/kalnis
2Motivation
Helen
0 Milk
Beer
Pregnancy test
- Attacker can see up to m items
- Any m items
- No distinction between sensitive and
non-sensitive items
3Motivation (cont.)
Attacker Find all transactions that contain Beer
0 Milk
Published
t1 Beer, Milk, Pregnancy test t2 Cola,
Cheese t3 Milk, Coffee . tn Wine, Beer, Milk
t1 Beer, 0Milk, Pregnancy test t2 Cola,
Cheese t3 2 Milk, Coffee . tn Wine, Beer,
Full-fat Milk
4km-anonymity
5Related Work K-Anonymity Swe02
NOT suitable for high-dimensionality
Quasi-identifier
Age ZipCode Disease
42 25000 Flu
46 35000 AIDS
50 20000 Cancer
54 40000 Gastritis
48 50000 Dyspepsia
56 55000 Bronchitis
Age ZipCode Disease
42-46 25000-35000 Flu
42-46 25000-35000 AIDS
50-54 20000-40000 Cancer
50-54 20000-40000 Gastritis
48-56 50000-55000 Dyspepsia
48-56 50000-55000 Bronchitis
(a) Microdata
- 2-anonymous microdata
Swe02 L. Sweeney. k-Anonymity A Model for
Protecting Privacy. Int. J. of Uncertainty,
Fuzziness and Knowledge-Based Systems,
10(5)557-570, 2002.
6Related Work L-diversity in Transactions
Requires knowledge of (non)-sensitive attributes
GTK08 G. Ghinita, Y. Tao, P. Kalnis, On the
Anonymization of Sparse High-Dimensional Data,
ICDE, 2008
7Our Approach Employs Generalization
Generalization Hierarchy
k2 m2
8Lattice of Generalizations
9Count Tree
2
3
2
2
10Optimal Algorithm
?
Q ? ? ?
Q ?
Q ? ? ?
?
?
?
?
11Direct Anonymization
?
- Solves each problem independently
?
?
?
?
COUNT(a1,a2)1
12Apriori-based Anonymization
- Construct the count-tree incrementally
- Prune unnecessary branches
13Small Datasets (2-15K, BMS-WebView2)
14Small Datasets (BMS-WebView2)
15Apriori Anonymization for Large Datasets
D I
515K 1657
59K 497
77K 3340
16Points to Remember
- Anonymization of Transactional Data
- Attacker knows m items
- Any m items can be the quasi-identifier
- Global recoding method
- Optimal solution too slow
- Apriori Anonymization fast and low information
loss - On-going work
- Local recoding (sort by Gray order and partition)
- Transactional data in streaming environments
17Bibliography on LBS Privacy
- http//anonym.comp.nus.edu.sg
-