Privacy Research Overview - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Privacy Research Overview

Description:

More details in lecture on Nov 8. Data Privacy. Releasing sanitized ... Mathematical definition motivated by Gavison's idea that privacy is protected to ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 20
Provided by: vita67
Category:

less

Transcript and Presenter's Notes

Title: Privacy Research Overview


1
Privacy Research Overview
18739A Foundations of Security and Privacy
  • Anupam Datta
  • Fall 2007-08

2
Privacy Research Space
What is Privacy? Philosophy, Law, Public Policy
TODAY
Formal Model, Policy Language, Compliance-check
Algorithms Programming Languages, Logic
Next 3 lectures
TODAY
Implementation-level Compliance Software Engg,
Formal Methods
Data Privacy Databases, Cryptography
3
Philosophical studies on privacy
  • Reading
  • Overview article in Stanford Encyclopedia of
    Philosophy
  • http//plato.stanford.edu/entries/privacy/
  • Alan Westin, Privacy and Freedom, 1967
  • Ruth Gavison, Privacy and the Limits of Law, 1980
  • Helen Nissenbaum, Privacy as Contextual
    Integrity, 2004 (more on Nov 8)

4
Westin 1967
  • Privacy and control over information
  • Privacy is the claim of individuals, groups or
    institutions to determine for themselves when,
    how, and to what extent information about them is
    communicated to others
  • Relevant when you give personal information to a
    web site agree to privacy policy posted on web
    site
  • May not apply to your personal health information

5
Gavison 1980
  • Privacy as limited access to self
  • A loss of privacy occurs as others obtain
    information about an individual, pay attention to
    him, or gain access to him. These three elements
    of secrecy, anonymity, and solitude are distinct
    and independent, but interrelated, and the
    complex concept of privacy is richer than any
    definition centered around only one of them.
  • Basis for database privacy definition discussed
    later

6
Gavison 1980
  • On utility
  • We start from the obvious fact that both perfect
    privacy and total loss of privacy are
    undesirable. Individuals must be in some
    intermediate state a balance between privacy
    and interaction Privacy thus cannot be said to
    be a value in the sense that the more people have
    of it, the better.
  • This balance between privacy and utility will
    show up in data privacy as well as in privacy
    policy languages, e.g. health data could be
    shared with medical researchers

7
Privacy Laws in the US
  • HIPAA (Health Insurance Portability and
    Accountability Act, 1996)
  • Protecting personal health information
  • GLBA (Gramm-Leach-Bliley-Act, 1999)
  • Protecting personal information held by financial
    service institutions
  • COPPA (Childrens Online Privacy Protection Act,
    1998)
  • Protecting information posted online by children
    under 13
  • More details in lecture on Nov 8.

8
Data Privacy
  • Releasing sanitized databases
  • k-anonymity
  • (c,t)-isolation
  • Differential privacy
  • Privacy preserving data mining

9
Sanitization of Databases
Add noise, delete names, etc.
Real Database (RDB)
Sanitized Database (SDB)
  • Health records
  • Census data
  • Protect privacy
  • Provide useful information (utility)

10
Re-identification by linking
  • Linking two sets of data on shared attributes
    may
  • uniquely identify some individuals
  • Example Sweeney De-identified medical data
    was released,
  • purchased Voter Registration List of MA,
    re-identified Governor
  • 87 of US population uniquely identifiable by
    5-digit ZIP, sex, dob

11
K-anonymity (1)
  • Quasi-identifier Set of attributes (e.g. ZIP,
    sex, dob) that can be linked with external data
    to uniquely identify individuals in the
    population
  • Make every record in the table indistinguishable
  • from at least k-1 other records with respect
    to quasi-identifiers
  • Linking on quasi-identifiers yields at least k
    records for each possible value of the
    quasi-identifier

12
K-anonymity and beyond
  • Provides some protection linking on ZIP, age,
    nationality yields 4 records
  • Limitations lack of diversity in sensitive
    attributes, background knowledge,
  • subsequent releases on the same data set
  • Utility less suppression implies better utility

13
(c,t)-isolation (2)
  • Mathematical definition motivated by Gavisons
    idea that privacy is protected to the extent that
    an individual blends into a crowd.
  • Image courtesy of WaldoWiki http//images.wikia.c
    om/waldo/images/a/ae/LandofWaldos.jpg

14
Definition of (c,t)-isolation
  • Let y be any RDB point, and let dyq-y2. We
    say that q (c,t)-isolates y iff B(q,cdy) contains
    fewer than t points in the RDB, that is, B(q,c
    dy) n RDB lt t.
  • A database is represented by n points in high
    dimensional space
  • (one dimension per column)

x2
xt-2
x1
q
cdy
dy
y
15
Definition of (c,t)-isolation (contd)
16
Differential Privacy Motivation (3)
  • Guaranteeing that a sanitized database does not
    imply any private information is too hard
  • Auxiliary info Terry is an inch taller than
    average
  • Sanitized database The average height is 6 feet
  • Sanitized database only provided non-private
    data, but resulted in private info being learned
  • All surveyors really need is for people to be
    comfortable supplying their private data
  • People will be comfortable if providing data does
    not change the sanitized database enough to be
    noticed

17
Differential Privacy Formalization
  • Want a sanitization function K that maps two
    databases D1 and D2 that differ by one person to
    about the same sanitized databases K(D1) and
    K(D2)
  • Make a disclosure S about as likely with K(D1) as
    K(D2)
  • A randomized function K give e-differential
    privacy if for all data sets D1 and D2 differing
    in at most one element and all subset S of
    Range(K),
  • PrK(D1) in S exp(e) PrK(D2) in S

18
Privacy Preserving Data Mining
  • Reference
  • Y. Lindell and B. Pinkas. Privacy Preserving Data
    Mining, Journal of Cryptology, 15(3)177-206,
    2002.
  • Problem
  • Compute some function of two confidential
    databases without revealing unnecessary
    information
  • Example Govt. database of suspected
    terrorists intersection with airline passengers
    database
  • Approach
  • Cryptographic techniques for secure multiparty
    computation

19
The Security Definition (Slide Lindell)
?
Computational Indistinguishability every
probabilistic polynomial-time observer that
receives the input/output distribution of the
honest parties and the adversary, outputs 1 upon
receiving the distribution generated in IDEAL
with negligibly close probability to when it is
generated in REAL.
IDEAL
REAL
Write a Comment
User Comments (0)
About PowerShow.com