Title: When Random Sampling Preserves Privacy
 1When Random Sampling Preserves Privacy
- Kamalika Chaudhuri 
- U.C.Berkeley 
 Nina Mishra U.Virginia  
 2The Problem
Sanitizer
Sanitized Database
Database
- Setting 
- Table  Set of rows 
- Sanitizer Releases each row with probability p 
- What are the conditions under which this 
 sanitizer preserves privacy?
3Search Data
- AOL released user search data 
- Replaced usernames with random ids 
4Search Data
Kamalika
Cynthia
Nina
Berkeley restaurants Low degree spanning 
trees Tickets to India Privacy 
sampling Airfare Santa Barbara
Traffic on 101N Restaurants Mountain 
View Rank Aggregation Memory bound 
functions Crypto registration
Falafel Charlottesville Query 
Auditing Clustering streaming Tickets to 
SFO Privacy sampling 
 5 U.S. Census Data
- Random sample of preprocessed data 
- Removing unique values 
- Merging cells with less than a threshold number 
 of individuals
6Privacy Definition DMNS06, 
S
T
T
- ?-Indistinguishability 
- Two tables T, T, differ by a single row 
- S  Output of the sanitizer 
-  PrS  T  (1  ?) PrS  T 
7An Example
S
T
T
- Cannot always get ?-Indistinguishability with 
 random sampling
- T  n rows with value 0 
- T  n-1 rows with value 0, 1 row with value 1 
- S  1 row with value 1, s  1 rows with value 0 
8Privacy DefinitionDKMMiNa06,BDMN05
S
T
T
-  (?,?)-Indistinguishability  
- Two tables T, T, differ by a single row 
- S  Output of the sanitizer 
- With probability at least 1 - ?, 
-  PrS  T  (1  ?) PrS  T 
9An Example
S
T
T
- Cannot always get (?,?)-Indistinguishability for 
 all tables
- A table where all rows have unique values
10When does Random Sampling preserve Privacy?
- Parameters 
- (?, ?)-indistinguishability 
- k  number of distinct values in T 
- t  number of values which occur at most 
 log(k/?)/? times in T
- Theorem This can be guaranteed if 
- p lt ? (if t  0) 
- p lt Õ(? ? /t)
11Classification of Values
For (?, ?)-indistinguishability
Rare Value
Infrequent Value
Common Value 
 12Rare Values
S
T
T
- If a rare value v is observed in a random sample, 
 
- PrSTgt(1  ?/log(k/d)) PrST 
13Common Values
S
T
T
- For a common value v, 
- PrST  PrST 
- Typically, the number of rows with a common value 
 is close to its expectation
14Infrequent Values
S
T
T
- For an infrequent value v, 
- PrST  PrST 
- Typically, the number of rows with an infrequent 
 value is at most log(k/?) away from its expected
 value
15Properties of a Good Sample
- A sample S is ?-indistinguishable if 
- No rare values 
- The number of rows with common value v is within 
 a constant factor of expectation
- The number of rows with infrequent value v is at 
 most an additive O(log(k/?)) more than its
 expected value
16When does Random Sampling preserve Privacy?
- Such a sample occurs with probability at least 1 
 - ? if
- p lt ? (if t0) 
- p lt Õ(? ? /t) 
17Utility of Random Sampling
- Assuming no rare values 
- Error in the frequency of each value  additive 
 1/vn
- DMNS06 Estimates histogram with an additive 
 error of 1/n in each frequency
- Sampling may give a compact representation of the 
 histogram
18Conclusions 
- Random sampling preserves privacy only when there 
 are few rare values
- With rare values, the probability of failure can 
 be high
-  ?  ?(1/n) as opposed to 1/2n DKMMiNa06, 
 BDMN05
- Error in estimating the frequency of each value 
 can be high
- Additive 1/vn as opposed to 1/n of DMNS06 
19  20The Problem
- What are the conditions under which this 
 sanitizer preserves privacy?