Title: How may Auditors Inadvertently Compromise Your Privacy
1How may Auditors Inadvertently Compromise Your
Privacy
With Nina Mishra HP/Stanford Work in progress
PORTIA Workshop on Sensitive Data in Medical,
Financial, and Content-Distribution Systems
2The Setting
Statisticaldatabase
- Dataset dd1,,dn
- Entries di Real, Integer, Boolean
- Query q (f ,i1,,ik)
- f Min, Max, Median, Sum, Average, Count
- Bad users will try to breach the privacy of
individuals
3The Data Privacy Game an Information-Privacy
Tradeoff
f
?i
f
f
- Private functions
- Want to hide ?i(d)di
- Information functions
- Want to reveal query answers f(di1,,dik)
- Major question what may be computed over d (and
given to users) without breaching privacy? - Confidentiality control methods
- Perturbation methods give noisy answers
- Query restriction methods limit the queries
users may post, usually imposing some structure
(e.g. size/overlap restrictions)
4Auditing
- AW89 classify auditing as a query restriction
method - Auditing of an SDB involves keeping up-to-date
logs of all queries made by each user (not the
data involved) and constantly checking for
possible compromise whenever a new query is
issued - Partial motivation May allow for more queries to
be posed, if no privacy threat occurs - Early work Hofmann 1977, Schlorer 1976, Chin,
Ozsoyoglu 1981, 1986 - Recent interest Kleinberg, Papadimitriou,
Raghavan 2000, Li, Wang, Wang, Jajodia 2002,
Jonsson, Krokhin 2003
5Auditing
Heres the answer
OR
Query denied (as the answer would cause privacy
loss)
Heres a new query qi1
Query log q1,,qi
6Design choices in Prior Work (1)
- Privacy definition
- Privacy breached (only) when a database entry may
be deduced fully, or within some ? accuracy - These privacy guarantees do not generally
suffice - Should take into account Adversarys
computational power, prior knowledge, access to
other databases - Exact answers given
- Auditors viewed as a way to give quality
answers???
7Design choices in Prior Work (2)
- 3. Which information is taken into account in the
auditor decision procedure - Decision made based on queries q1,,qi, qi1 and
their answers a1,,ai, ai1 - Denials ignored
- 4. Offline vs. Online
- Offline auditing queries and answers checked for
compromise at the end of the day - Only detect breaches
- Online auditing answer/deny queries on the fly
- Prevent breaches just before they happen
8Example 1 Sum/Max auditing
di real, sum/max queries, privacy breached if
some di learned
q1 sum(d1,d2,d3)
sum(d1,d2,d3) 15
q2 max(d1,d2,d3)
Denied (the answer would cause privacy loss)
Oh well
9Some Prior Work on Auditors
Approx version in PTIME
Can we use the offline version for online
auditing?
10 After Two Minutes
di real, sum/max queries, privacy breached if
some di learned
q1 sum(d1,d2,d3)
sum(d1,d2,d3) 15
q2 max(d1,d2,d3)
Denied (the answer would cause privacy loss)
There must be a reason for the denial
q2 is denied iff d1d2d3 5 I win!
Oh well
11Example 2 Interval Based Auditing
di ? 0,100, sum queries, ? 1 (PTIME)
q1 sum(d1,d2)
Sorry, denied
q2 sum(d2,d3)
sum(d2,d3) 50
d1,d2 ? 0,1 d3 ? 49,50
Denial ? d1,d2?0,1 or 99,100
12Sounds Familiar?
13Max Auditing
di real
q1 max(d1,d2,d3,d4)
M1234
M123 / denied
If denied d4M1234
M12 / denied
If denied d3M123
14Adversarys Success
q1 max(d1,d2,d3,d4)
If denied d4M1234
q2 max(d1,d2,d3)
Denied with probability 1/4
q2 max(d1,d2)
If denied d3M123
Denied with probability 1/3
Success probability 1/4 (1- 1/4)1/3 1/2
Recover 1/8 of the database!
15Boolean Auditing?
di Boolean
1 / denied
1 / denied
qi denied iff di di1 ? learn
database/complement
Let di,dj,dk not all equal, where qi-1, qi,
qj-1, qj, qk-1, qk all denied
1 / 2
Recover the entire database!
16Two Problems
- Obvious problem denied queries ignored
- Algorithmic problem not clear how to incorporate
denials in the decision - Subtle problem
- Query denials leak (potentially sensitive)
information - Users cannot decide denials by themselves
17A Spectrum of Auditors
Size overlap restriction Algebraic structure
lt utility
gt privacy
Note can work in unsafe region, but need to
prove denials do not leak crucial information
18Simulatable Auditing
- An auditor is simulatable if a simulator exists
s.t.
?
Simulation ? denials do not leak information
self auditors in DN03
19Why Simulatable Auditors do not Leak Information?
20Summary
- Improper usage of auditors may lead to privacy
breaches, due to information leakage in the
decision procedure. - Cell suppression / some k-anonymity methods
should be checked similarly - Should make sure offline auditors do not leak
information in decision - Simulatable auditors provably dont leak
information - Give best utility while still safe
- A launching point for further research on
auditors - Further research
- Auditors with more reasonable privacy guarantees