Database Privacy (ongoing work) - PowerPoint PPT Presentation

About This Presentation
Title:

Database Privacy (ongoing work)

Description:

Database Privacy (ongoing work) Shuchi Chawla, ... using an individual's information for marketing, discrimination ... attention invites further privacy loss ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 12
Provided by: Shuchi2
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Database Privacy (ongoing work)


1
Database Privacy (ongoing work)
  • Shuchi Chawla, Cynthia Dwork, Adam Smith, Larry
    Stockmeyer, Hoeteck Wee

2
You are being watched!
  • Databases abound
  • Population Census
  • Market Research
  • Used for statistical analysis
  • explaining phenomena
  • making predictions
  • Prone to malicious use
  • using an individuals information for marketing,
    discrimination

3
The Privacy vs. Utility trade-off
  • Inherent tension between Privacy and Utility
  • One extreme no information complete privacy
  • Other extreme complete information no privacy
  • We want a middle path
  • - Preserve macroscopic properties
  • statistical/distributional information
  • clustering information
  • - Disguise individual identifying information

4
What is privacy?
  • Gavison Protection from being brought to the
    attention of others
  • inherently valuable
  • attention invites further privacy loss
  • Each individual should blend in a sufficiently
    large crowd

5
Application-oriented approaches
  • Statistical approaches
  • Alter the frequency of particular features, while
    preserving means.
  • Alternately, erase records that reveal too much
  • Do not consider possible privacy breach from
    combining information from different records
  • Query-based approaches
  • Disallow queries that reveal too much
  • Combination of seemingly innocuous queries could
    reveal individual traits
  • Only good for specific applications

6
Towards a general approach
  • Allow arbitrary tests and queries
  • Preserve macroscopic properties, but not
    individual records
  • Approach perturb individual records
    appropriately and publish the entire dataset
  • Perturbation has to be probabilistic

7
A geometric view
  • A first-attempt an oversimplified abstract
    model
  • Simplifying assumption
  • each attribute is real-valued
  • Think metric space
  • Real Database (RDB)
  • n unlabeled points in d-dimensional space.
  • Sanitized Database (SDB)
  • n new points possibly in a different space.

8
The adversary or Isolator
  • Using SDB and auxiliary information (AUX),
    outputs a point q
  • q isolates a real point x, if it is very close
    to x, but not to many other real points.
  • No way of obtaining privacy if AUX already
    reveals too much!
  • SDB compromises privacy if the adversary is able
    to increase his probability of isolating a point
    considerably by looking at it

9
Isolation a relative notion
  • Tightly clustered points have a smaller radius of
    isolation
  • T-radius of x distance to its T-nearest
    neighbor
  • x is isolated if B(q,cd) contains less than T
    points
  • x is safe if distance between x and q is more
    than T-radius/(c-1)

c privacy parameter constant
10
Our contribution
  • A precise definition of privacy using T-radii
  • A perturbation algorithm, closely linked to the
    definition of privacy
  • Prove that the algorithm preserves privacy under
    reasonable assumptions
  • Working towards showing that macroscopic
    properties are preserved

11
What about the real world?
  • Lessons from the abstract model
  • High dimensionality is our friend
  • Outliers
  • Our notion of c-isolation deals with them they
    get perturbed by a very large amount
  • Existence of outlier may be disclosed
  • Put more on this slide

12
What about Outliers?
  • Bill Gates example here
  • Reconsider definition of privacy
  • do not want to disclose existence of outlier
  • do not want to disclose anything about outlier
  • do not want to disclose identity of outlier
  • c-isolation falls in the third category
Write a Comment
User Comments (0)
About PowerShow.com