Overview of Privacy Preserving Techniques - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Overview of Privacy Preserving Techniques

Description:

Overview of Privacy Preserving Techniques. This is a high-level ... (x1, y1), (x2, y2), ...(xk, yk) Using these pairs to estimate the perturbation parameters ... – PowerPoint PPT presentation

Number of Views:868
Avg rating:3.0/5.0
Slides: 42
Provided by: keke9
Category:

less

Transcript and Presenter's Notes

Title: Overview of Privacy Preserving Techniques


1
Overview of Privacy Preserving Techniques
2
  • This is a high-level summary of the
    state-of-the-art privacy preserving techniques
    and research areas
  • Focus on problems and the basic ideas
  • Office hours are changed to Wed 2-5pm

3
Outline
  • Privacy problem in computing
  • Major techniques
  • Data perturbation
  • Data anonymization
  • Cryptographic methods
  • Privacy in different areas
  • Data mining
  • Data publishing
  • Database access/information retrieval
  • Social network
  • Mobile computing

4
Privacy problem
  • Individual privacy
  • Customer data
  • Public data census data, voting record
  • Health record
  • locations
  • Online activities
  • etc
  • Organization privacy
  • Owning collections of personal data
  • Business secrets
  • Legal issues prevent data sharing
  • etc

5
Privacy vs. Security
  • Security
  • Assumption the two parties trust each other, but
    the communication network is not trusted.

Alice
Bob
Communication channel
Encrypting data
Decrypting data
Bob knows the original data that Alice owns.
6
  • Privacy
  • Parties do not trust each other curious parties
    (including malicious insiders) may look at
    sensitive contents
  • Parties follow protocols honestly (semi-honest
    assumption)
  • (1) Transformation based methods

Might be a curious party
Alice
Bob
Communication channel
transformed data
Works on the transformed data only
Bob do not know the original data.
7
  • (2) Cryptographic methods

Some protocol using cryptographic primitives
Statistical Info/ Intermediate result
Info from other parties
Party 1
Party 2
Party n
data
data
data
8
Privacy sensitive scenarios
  • Web model
  • Corporate model

user 1
user 1
user 1
Private info
9
Issues with data transformation
  • Techniques performing the transformation
  • Transformation should preserve important
    information
  • How much information loss
  • How to recover the information from the
    transformed data
  • Methods reconstructing the original data from the
    transformed data
  • Various attacks
  • The cost
  • Transforming data
  • Recovering the important information

10
Transformation techniques
  • Data Perturbation
  • Additive perturbation
  • Multiplicative perturbation
  • Randomized responses
  • Data Anonymization
  • k-anonymization

11
Additive Data Perturbation
  • Definition
  • Y X e
  • X is the original data column, e is some
    zero-mean random noise, and Y is the perturbed
    data
  • History
  • Census data
  • statistical databases 14

?
Released data
12
Additive Data Perturbation
  • In data mining
  • Perturb only selected data columns
  • Some data mining algorithms care only the
    distribution of the data column, rather than
    exact record values
  • Distribution can be reconstructed from the
    perturbed data, if the noise is known 10,11
  • Need to develop new DM algorithms (disadvantage)

reconstructed distribution
perturbed distribution
Original distribution
13
Attacks to additive data perturbation
  • Noise e can be filtered out
  • random matrix theory
  • spectral analysis
  • Paper 13,15,16 discuss the data (not
    distribution) reconstruction techniques
  • When the perturbation is effective in preserving
    privacy

14
Additive perturbation to categorical data
  • Transactional data
  • User A clicked url1, url3, url8
  • User B bought items x,y,z
  • Categorical data perturbation
  • Add/remove fake items to the itemset
  • While preserving some global distribution
  • Widely used in privacy-preserving association
    rule mining
  • Related work 12,17,111

15
Multiplicative Data Perturbation
  • Definition
  • X is the original data (multiple columns), Y is
    perturbed data
  • Random projection perturbation Y PX
  • P is a random projection matrix
  • Rotation perturbation YRX
  • R is a random rotation matrix ? distance is
    preserved
  • Geometric perturbation YRXTD
  • T is translation matrix
  • D is random noise matrix

16
Multiplicative Data Perturbation
  • Unique benefits
  • No need to release the information of
    perturbation parameters
  • E.g., P,R,T,D
  • More robust to spectral analysis
  • Preserving (or approximately) distances
  • Can use many existing DM algorithms directly on
    the perturbed data
  • No need to develop special DM algorithms

17
Attacks to Multiplicative Data Perturbation
  • Independent Component Analysis
  • For any YAX
  • If Y is known, X columns are independent, no more
    than one X column has normal distribution
  • A and X can be estimated
  • Requires additional info to be an effective
    attack
  • Attackers knowing a few input/output pairs
  • (x1, y1), (x2, y2), (xk, yk)
  • Using these pairs to estimate the perturbation
    parameters
  • Related work 22,23

18
Randomized Response
  • Definition
  • Problem need to know the yes/no answers over a
    sensitive survey question
  • Each user perturbs the answer in some way
  • The real probability of yes answer can still be
    calculated
  • Applications
  • Related work31,32,33
  • No attack is studied yet.

19
Data Anonymization
  • Publishing micro data for research
  • The problem
  • Normally, the explicit user ids (ssn, names) are
    removed
  • Virtual identifier or quasi identifier use
    multiple attributes to infer individuals

Voting record
Medical record
Together, the MA governors medical info is
identified
20
  • K-anonymity
  • At least k records have the same virtual
    identifier
  • Challenges
  • Techniques to efficiently anonymize the tables
  • Risk of privacy breach
  • Information loss

?
21
Anonymization implementation
  • Generalization 37,40
  • Suppression 37,43
  • Multidimensional clustering47,48,49

22
Risk of privacy breach
  • l-diversity 39
  • t-closeness53
  • M-invariance52

?
Privacy is not protected
23
Risk of privacy breach
  • Attackers prior background knowledge
  • Difficult to quantify
  • Bayesian analysis is the major tool
  • Paper 69,70,71,72,73,74

24
Cryptographic approaches
  • Using the following cryptographic primitives
  • Secure multiparty computation (SMC)
  • Yaos millionaire problem
  • Alice wants to know whether she has more money
    than Bob
  • AliceBob cannot know the exact number of each
    others money. Alice knows only the result
  • Oblivious transfer
  • Bob holds n items. Alice wants to know i-th item.
  • Bob cannot know i Alices privacy
  • Alice knows nothing except the i-th item
  • Homomorphic encryption
  • Allow computation on encrypted data
  • E.g., E(X)E(Y) E(XY)

25
  • Characteristics
  • Pro preserving total privacy
  • Con expensive, limited of parties
  • Applications distributed datasets (the corporate
    model)
  • All kinds of data mining algorithms
  • Statistical analysis (matrix, vector computation)
  • Often discussed in two-party scenarios.

26
Privacy-preserving data mining
  • Privacy-preserving data classification
  • Decision tree, naïve bayes classifier
  • They work on individual column distributions
  • Additive perturbation can be applied
  • Distance-based classifiers
  • Kernel methods, SVM, linear methods,
  • Multiplicative perturbation can be applied
  • Cryptographic protocols

27
  • Data clustering
  • Using similarity measure (distance)
  • Group data items
  • Privacy-Preserving Methods
  • Multiplicative perturbation can be used
  • Cryptographic protocols

28
  • Association Rule mining
  • Transactional datasets
  • Find relationship a,b ? c
  • Support probability of abc appear together in
    the whole dataset
  • Confidence a,b appear then the prob of c appears
  • methods
  • Protecting the original transactional data
  • Categorical data perturbation
  • Protecting sensitive rules
  • rule hiding

29
  • Stream mining
  • Limited memory, unlimited streaming data
  • Your algorithm can look at each record only once
  • Analysis has to be done incrementally
  • Statistical properties evolve over time
  • Applications
  • Monitoring the correlation between streams
  • Monitoring change of clustering structures
  • Adaptive classifiers

30
  • Privacy-preserving stream mining
  • Private info in data streams
  • Additive perturbation 159
  • Sensitive rules in output
  • Hiding rules 160
  • Private search over data streams 155, 156

31
Privacy-preserving data access
  • Goal allow user to query database while hiding
  • The query she submitted
  • The identity of the records in the result
  • Motivation patent databases stock quotes web
    access many more....

32
Basic Modeling
  • Server holds n-bit string x
  • n should be thought of as very large
  • User wishes
  • to retrieve xi
  • (and
  • to keep i private)

33
Different Scenarios DuAtallah2000
34
Private information matching(PIM)
  • Alice does not want Bob knows her query and the
    query result.
  • Bobs database can be private or public
  • Private Alice should know only the required
    content (probing by queries)
  • Public no restriction (PIMPD)
  • Related work
  • 132,136,143,144,145,148

35
Secure Storage Outsourcing
  • Bob hosts Alices encrypted database
  • Alice needs to query the database
  • Query privacy
  • Result privacy
  • Other clients use Alices outsourcing database
    (SSCO)
  • Alice charges the client if she knows the client
    queried her database
  • possible collusion between clients and Bob, how
    to prevent?
  • Related work
  • 138,142,147

36
Naïve private protocol
x1,x2 , . . ., xn
xi
x x1,x2 , . . ., xn
SERVER
USER
Server sends entire database x to User.
Communication cost n
Bad news it has been proved that with single
server the minimum communication cost is n for a
private protocol
37
The state-of-the-art
  • Information-theoretic approaches
  • Better protocols available when the data are
    replicated in gt2 servers
  • Cryptographic protocols with cryptographic
    primitives
  • E.g., oblivious transfer protocol
  • Can be expensive
  • Other protocols
  • combined with perturbation techniques

38
Privacy in Social Network Data
  • Publishing social network structure
  • Attacks can be applied to reveal the mapping
    163,167
  • Characteristics of subgraph
  • Adversarial background knowledge

Anonymization is the major method
39
Privacy in Mobile computing
  • Location-based services
  • location-aware emergency response,
  • location-based advertisement,
  • location-based entertainment, etc.
  • Location privacy threats
  • Ads spam
  • Visits to clinics, doctors offices medical
    info
  • Visits to entertainment districts life style
  • Visits to political events unpopular political
    views
  • Physical harm domestic abuse

40
  • Preserving location privacy
  • User-defined or system supplied privacy policies
    BambaLiu2008, BeresfordStajano2003
  • Extending k-anonymity techniques to location
    cloaking GedikLiu2008, GruteserGrunwald2002
  • Pseudonymity of user identities frequently
    changing internal id. BeresfordStajano2003

41
  • Any other questions?
Write a Comment
User Comments (0)
About PowerShow.com