Statistical Databases - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Statistical Databases

Description:

Statistical Databases Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial s credit: Vitaly Shmatikov, Univ Texas at Austin – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 28
Provided by: WeiZ5
Category:

less

Transcript and Presenter's Notes

Title: Statistical Databases


1
Statistical Databases Query Auditing
  • Li Xiong
  • CS573 Data Privacy and Anonymity

Partial slides credit Vitaly Shmatikov, Univ
Texas at Austin
2
Query Audit Problem
  • Maintaining privacy of data

Auditor
Database
Does answer to Q combined with answers to Q1,,Qn
reveal something?
3
Variations of the Problem
Specifies subset of the variables
Auditor
Database
List of real, integer, or Boolean values
Wants to learn value of some variable
Min, max, median, sum, average, or count of
specified subset
4
Offline vs. Online
  • Offline auditing
  • Given a collection of queries and answers to
    them, check whether anything forbidden was
    revealed
  • Detects privacy breaches after the fact
  • Online auditing
  • Queries are presented to auditor one at a time
    auditor checks if answering the current query (in
    combination with past answers) reveals
    forbidden information
  • Prevents privacy breaches on-the-fly

5
Disclosure measures
  • Full disclosure (exact-value disclosure) the
    exact value of a protected attribute is disclosed
  • Partial disclosure (interval-based disclosure)
    the disclosed range (difference of the lower and
    upper bounds) of the protected attribute is
    smaller than a predefined threshold
  • Probability-based disclosure the posterior
    distribution of the data after answering queries
    is significantly different from its prior
    distribution

6
Offline Auditors for Full Disclosure
  • Sum queries
  • Max and min queries
  • Median and average queries

7
Audit Expert (Chin 1982)
  • Query auditing method for SUM queries
  • A SUM query can be considered as a linear
    equation
  • where is whether record i belongs to the
    query set, xi is the sensitive value, and q is
    the query result
  • A set of SUM queries can be thought of as a
    system of linear equations
  • Maintains the binary matrix representing linearly
    independent queries and update it when a new
    query is issued
  • A row with all 0s except for ith column indicates
    disclosure

8
Offline Auditing for Full Disclosure
  • Arbitrary combinations of aggregate queries
  • It is unlikely there will be efficient on-line
    application algorithms for SUM MAX queries, SUM
    MIN queries, or SUM MAX MIN queries.
  • Example hardness results
  • Theorem. There is no polynomial time
    full-disclosure auditing algorithm for sum and
    max queries unless PNP.

9
Auditing Sum Queries on Booleans
  • Database collection of secret Boolean variables
  • Query specifies subset S of variables
  • Answer sum of variables in S
  • Privacy breach after asking several queries,
    user learns the value of some secret variable(s)
  • Auditing problem given a set of Boolean
    equations, is there a variable that has the same
    value in all solutions?
  • Auditing Boolean attributes, Kleinberg, 2000

10
Why Is This Interesting?
  • Linear Diaphantine equations
  • Query can be safe on real-valued, unbounded data,
    but reveal information when the data are
    discrete, with known bounds
  • Hardness results the auditing problem for
    Boolean values is coNP-complete.

x y w 1 y z 1 x z 1
Real multiple solutions, secure Boolean unique
solution, insecure (why?)
11
Offline Auditor for Partial Disclosure
  • Partial disclosure the disclosed range of the
    protected attribute is smaller than a predefined
    threshold
  • Sum queries
  • Interval-based disclosure monitoring upper and
    lower bounds for all confidential attributes
  • Auditing interval-based inference. Li et
    al. 2002

12
Offline Auditor for Partial Disclosure
  • New query
  • A series of linear programming problems

13
Incremental evaluation of the LP
  • Treats the auditing problem as a series of
    updation problems and updates the bounds with
    certain rules
  • Horizontal updation given the same set of
    queries, the bounds of one variable, how can the
    prior result be modified to get the bounds of
    other variables
  • Vertical updatation given the same set of
    variables, and bounds under the previous queries,
    how can the prior result modified to get the
    bounds when a new query arrives

An efficient online auditing approach to limit
private data disclosure, Lu, 2009
14
Online Auditing
Auditor
Qi1
Database
A or Denied
Previous queries Q1 Qi
Denied if answering Qi1 would cause a privacy
breach
15
Online Auditing
  • Given a sequence of queries and corresponding
    answers, and a new query, determine if the new
    query should be answered or denied in order to
    prevent a privacy breach.
  • Earliest approaches
  • Query set size control
  • Query set overlap control
  • Limited data utility

16
Offline to Online?
  • Can an offline auditor directly solve the online
    auditing problem?
  • Denials leak information!

17
Sounds Familiar?
slide stolen from Kobbi Nissim
Colonel Oliver North, on the Iran-Contra arms deal
On the advice of my counsel I respectfully and
regretfully decline to answer the question based
on my constitutional rights.
David Duncan, former auditor for Enron and
partner in Arthur Andersen
Mr. Chairman, I would like to answer the
committee's questions, but on the advice of my
counsel I respectfully decline to answer the
question based on the protection afforded me
under the Constitution of the United States.
18
Example Sum/Max
  • Variables di are real, privacy breached if
    adversary learns some di

Auditor
Database
Denied
Oh well
Wait there must be a reason why second query was
denied
The only possible reason for denial is if
d1d2d35
19
Online Audit
  • Denials reduce the search space

20
One workaround
  • Deny whenever the offline algorithm does, in
    addition, randomly deny queries.
  • Issues
  • Leakage is not prevented
  • Have to remember which queries were randomly
    denied
  • Semantically determine whether two queries are
    equivalent

21
Simulatable Auditing
  • Observation denials have the potential to leak
    information if the auditor uses information that
    is unavailable to the attacker (the answer to the
    new query)
  • Simlatable auditing the attacker should be able
    to simulate or mimic the auditors decisions to
    answer or deny a query
  • Denials provably do not leak information

Simulatable Auditing, Kenthapadi, 2005
22
Simulatable Auditing
  • An auditor is a function of Q, A and X
  • An auditor is simulatable if there exists a
    simulator that is a function of only Q and A-ai1
    and whose output is always equal to the auditor.

qi1
Auditor
Deny or answer
23
Simulatable Auditing
24
Simulatable Auditing
  • Query-set-size control
  • Query-set-overlap control
  • Audit expert for sum queries

25
Constructing Simulatable Auditor
  • General sufficient condition the auditor should
    determine if there is any possible dataset,
    consistent with all past responses, in which the
    answer to the current query would cause some
    element to be fully disclosed

26
Example revisited Sum/Max
Auditor
Database
Denied
  • Simulatable auditor would always deny the max
    query following a sum query
  • Lose some utility due to the requirement of
    simulatability

27
Challenges
  • Privacy definition
  • Privacy of groups/families
  • Algorithmic limitations
  • Simulatable algorithms computationally
    prohibitive
  • Most work on sum queries, some on max, min,
    median, hardness results on mixed queries
  • Collusion
  • Reduced utility for legitimate users
  • Large audit trail
  • Utility
  • Percentage of denials may not be the best measure
Write a Comment
User Comments (0)
About PowerShow.com