PrivacyMaxEnt: Integrating Background Knowledge in Privacy Quantification - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

PrivacyMaxEnt: Integrating Background Knowledge in Privacy Quantification

Description:

Our Privacy-MaxEnt method can be applied to Generalization and Bucketization. ... Most unbiased solution. Maximum Entropy Principle ' ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 34
Provided by: ecs4
Learn more at: https://web.ecs.syr.edu
Category:

less

Transcript and Presenter's Notes

Title: PrivacyMaxEnt: Integrating Background Knowledge in Privacy Quantification


1
Privacy-MaxEnt Integrating Background Knowledge
in Privacy Quantification
  • Wenliang (Kevin) Du,
  • Zhouxuan Teng,
  • and Zutao Zhu.
  • Department of Electrical Engineering Computer
    Science
  • Syracuse University, Syracuse, New York.

2
Introduction
  • Privacy-Preserving Data Publishing.
  • The impact of background knowledge
  • How does it affect privacy?
  • How to measure its impact on privacy?
  • Integrate background knowledge in privacy
    quantification.
  • Privacy-MaxEnt A systematic approach.
  • Based on well-established theories.
  • Evaluation.

3
Privacy-Preserving Data Publishing
  • Data disguise methods
  • Randomization
  • Generalization (e.g. Mondrian)
  • Bucketization (e.g. Anatomy)
  • Our Privacy-MaxEnt method can be applied to
    Generalization and Bucketization.
  • We pick Bucketization in our presentation.

4
Data Sets
Identifier
Quasi-Identifier (QI)
Sensitive Attribute (SA)
5
Bucketized Data
Quasi-Identifier (QI)
Sensitive Attribute (SA)
P( Breast cancer female, college, bucket1 )
1/4 P( Breast cancer female, junior,
bucket2 ) 1/3
6
Impact of Background Knowledge
  • Background Knowledge
  • Its rare for male to have breast cancer.
  • This analysis is hard for large data sets.

7
Previous Studies
  • Martin, et al. ICDE07.
  • First formal study on background knowledge
  • Chen, LeFevre, Ramakrishnan. VLDB07.
  • Improves the previous work.
  • They deal with rule-based knowledge.
  • Deterministic knowledge.
  • Background knowledge can be much more
    complicated.
  • Uncertain knowledge

8
Complicated Background Knowledge
  • Rule-based knowledge
  • P (s q) 1.
  • P (s q) 0.
  • Probability-Based Knowledge
  • P (s q) 0.2.
  • P (s Alice) 0.2.
  • Vague background knowledge
  • 0.3 P (s q) 0.5.
  • Miscellaneous types
  • P (s q1) P (s q2) 0.7
  • One of Alice and Bob has Lung Cancer.

9
Challenges
  • How to analyze privacy in a systematic way for
    large data sets and complicated background
    knowledge?
  • What do we want to compute?
  • P( S Q ), given the background knowledge and
    the published data set.
  • P(S Q ) is primitive for most privacy metrics.
  • Directly computing P( S Q ) is hard.

10
Our Approach
Consider P( S Q ) as variable x (a vector).
Background Knowledge
Constraints on x
Solve x
Published Data
Constraints on x
Most unbiased solution
Public Information
11
Maximum Entropy Principle
  • Information theory provides a constructive
    criterion for setting up probability
    distributions on the basis of partial knowledge,
    and leads to a type of statistical inference
    which is called the maximum entropy estimate. It
    is least biased estimate possible on the given
    information.
  • by E. T. Jaynes, 1957.

12
The MaxEnt Approach
Background Knowledge
Constraints on P( S Q )
Maximum Entropy Estimate
Estimate P( S Q )
Published Data
Constraints on P( S Q )
Public Information
13
Entropy
Because H(S Q, B) H(Q, S, B) H(Q, B)
Constraint should use P(Q, S, B) as variables
14
Maximum Entropy Estimate
  • Let vector x P(Q, S, B).
  • Find the value for x that maximizes its entropy
    H(Q, S, B), while satisfying
  • h1(x) c1, , hu(x) cu equality constraints
  • g1(x) d1, , gv(x) dv inequality
    constraints
  • A special case of Non-Linear Programming.

15
Constraints from Knowledge
Background Knowledge
Constraints on P(Q, S, B)
  • Linear model quite generic.
  • Conditional probability
  • P (S Q) P(Q, S) / P(Q).
  • Background knowledge has nothing to do with B
  • P(Q, S) P(Q, S, B1) P(Q, S, Bm).

16
Constraints from Published Data
Published Data Set D
Constraints on P(Q, S, B)
  • Constraints
  • Truth and only the truth.
  • Absolutely correct for the original data set.
  • No inference.

17
Assignment and Constraints
Observation the original data is one of the
assignments Constraint true for all possible
assignments
18
QI Constraint
Constraint
Example
19
SA Constraint
Constraint
Example
20
Zero Constraint
  • P(q, s, b) 0, if q or s does not appear in
    Bucket b.
  • We can reduce the number of variables.

21
Theoretic Properties
  • Soundness Are they correct?
  • Easy to prove.
  • Completeness Have we missed any constraint?
  • See our theorems and proofs.
  • Conciseness Are there redundant constraints?
  • Only one redundant constraint in each bucket.
  • Consistency Is our approach consistent with the
    existing methods (i.e., when background knowledge
    is Ø).

22
Completeness w.r.t Equations
  • Have we missed any equality constraint?
  • Yes!
  • If F1 C1 and F2 C2 are constraints, F1 F2
    C1 C2 is too. However, it is redundant.
  • Completeness Theorem
  • U our constraint set.
  • All linear constraints can be written as the
    linear combinations of the constraints in U.

23
Completeness w.r.t Inequalities
  • Have we missed any inequalities constraint?
  • Yes!
  • If F C, then F C0.2 is also valid
    (redundant).
  • Completeness Theorem
  • Our constraint set is also complete in the
    inequality sense.

24
Putting Them Together
Tools LBFGS, TOMLAB, KNITRO, etc.
Background Knowledge
Constraints on P( S Q )
Maximum Entropy Estimate
Estimate P( S Q )
Published Data
Constraints on P( S Q )
Public Information
25
Inevitable Questions
  • Where do we get background knowledge?
  • Do we have to be very very knowledgeable?
  • For P (s q) type of knowledge
  • All useful knowledge is in the original data set.
  • Association rules
  • Positive Q ? S
  • Negative Q ? S, Q ? S, Q ? S
  • Bound the knowledge in our study.
  • Top-K strongest association rules.

26
Knowledge about Individuals
Alice (i1, q1) Bob (i4, q2) Charlie
(i9, q5)
Knowledge 1 Alice has either s1 or s4.
Constraint
Knowledge 1 Two people among Alice, Bob, and
Charlie have s4.
Constraint
27
Evaluation
  • Implementation
  • Lagrange multipliers
  • Constrained Optimization ?Unconstrained
    Optimization
  • LBFGS solving the unconstrained optimization
    problem.
  • Pentium 3Ghz CPU with 4GB memory.

28
Privacy versus Knowledge
Estimation Accuracy KL Distance between
P(MaxEnt) (S Q) and P(Original) (S Q).
29
Privacy versus of QI attributes
30
Performance vs. Knowledge
31
Running Time vs. Data Size
32
Iteration vs. Data size
33
Conclusion
  • Privacy-MaxEnt is a systematic method
  • Model various types of knowledge
  • Model the information from the published data
  • Based on well-established theory.
  • Future work
  • Reducing the of constraints
  • Vague background knowledge
  • Background knowledge about individuals
Write a Comment
User Comments (0)
About PowerShow.com