Combinations of SDC methods for continuous microdata - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Combinations of SDC methods for continuous microdata

Description:

Methods have very different properties, so combining them we can improve the utility. ... Combinations of the methods. Decrease the Risk ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 15

Provided by: aoga

Category:

more less

Transcript and Presenter's Notes

Title: Combinations of SDC methods for continuous microdata

1
Combinations of SDC methods for continuous
microdata
Anna Oganian National
Institute of Statistical Sciences
2
Introduction

SDC methods have two goals

Methods for continuous microdata

Rankswapping
Additive noise
Resampling
Microaggregation
microaggregation based on one variable
microaggregation based on several variables

3
Why combinations?
Methods have very different properties, so
combining them we can improve the utility.
Example Microaggregation with z-scores
projection
Red points microaggregated data Green points
original data
For data close to normal we can add normal
noise Mic(O) N(0, Cov(O)-Cov(Mic(O)))
4
Performance measures

Propensity score utility measure
(Mi-Ja work)
Two kinds of DR
- identification disclosure
- attribute disclosure

5
DR

Identification disclosure
It is considered that disclosure occurs when
the intruder can correctly identify a record
in the released data file, that is to relate it
to a particular individual.
Attribute disclosure
Intruder's target is an original value of a
particular attribute, for example a salary of a
particular individual.
So attribute disclosure measures the gain in
information achieved by the intruder about some
attribute after releasing masked data. More
precisely - how tight can be found the bounds for
the original values given masked data.

6
Examples of attribute disclosure for several
methods

Assumption SDC method and parameters are
released together with the data set

Rankswapping
Upper and lower bound for every value in the
masked data are
If the algorithm of rankswapping is known, so the
distribution of the values in these intervals
could be found by the intruder by the means of
running rankswapping large number of times on
the vector of length N.
7

Rankswapping example

Suppose data set X has 1000 records and variable
j in data set X is lognormal. Rankswapping with
p5 was applied to this data set. The range of
the variable is 0.04,25.57. Choose the value
in the dense area of masked data xj0.50, so
lower and upper bounds for the corresponding
original are 0.42, 0.58. Consider the largest
value in the masked data x25.57, using the
distribution for the highest rank we can find 95
confidence interval 5.20, 6.92. Consider the
smallest value in the masked data x0.04, 95
confidence interval for the corresponding
original data is 0.07, 0.019.
8

Noise addition
Variance of added noise is
and its mean is 0, so 100(1-a)
confidence regions around masked records xm
could be computed based on multivariate normal
distribution

Example
9
Several stages of masking

Ideally the security of the SDC method should be
guarantied by the masking algorithm and not
depend on keeping in secret the parameter or
details of the algorithm.

In cryptography Data Encryption Standard (DES)
10
Combinations of the methods
Original data
M1(Original)
M2(M1(Original))
Masking M2
Masking M1

Decrease the Risk
We can even increase utility of the resulted data
if we combine properly the methods!
For example
Combine microaggregation with noise

11
Several stages of masking
Or in general case
where
12
Combinations of methods

Microaggregation using z-scores projection, p3
?Microaggregation using z-scores projection, p3
(Micz03_Micz03)
Microaggregation using z-scores projection, p3 ?
Microaggregation using principal component
projection, p3 (Micz03_Micpcp03)
Microaggregation using z-scores projection, p3 ?
Multivariate microaggregation, p10
(Micz03_Micmul10)
Microaggregation using z-scores projection, p3 ?
Rankswapping, p1 (Micz03_Rank1)
Single Microaggregation using z-scores
projection, p3 (Micz03)

13
Propensity score utility
sym sym sym nonsym nonsym nonsym
high cor high cor low cor low cor low cor high cor high cor low cor low cor
pos neg pos pos neg pos neg pos neg
micz03__micz03 23.5 28.3 37.3 37.3 33.9 45.1 180.1 40 94.1
micz03__micpcp03 12.8 9.3 7.9 7.9 9.3 5 3.4 8.6 5.8
micz03__micmul10 18.4 16.5 14.8 14.8 13 5.5 9.2 11.2 8.7
micz03__rank1 28.6 34.8 27.3 27.3 29.4 14.8 42 39.5 26.5
micz03 128.1 281.5 132.1 132.1 233.4 592.1 639.4 463.8 639
14
Identification DR
sym sym sym nonsym nonsym nonsym
high cor high cor low cor low cor high cor high cor low cor low cor
pos neg pos neg pos neg pos neg
micz03__micz03 0.0275 0.0015 0.0035 0.0023 0.0033 0.0004 0.0033 0.0009
micz03__micpcp03 0.0198 0.2516 0.0133 0.0029 0.0806 0.2926 0.0477 0.18
micz03__micmul10 0.0077 0.0046 0.0203 0.0025 0.1122 0.0947 0.0071 0.1265
micz03__rank1 0.0092 0.0119 0.0087 0.0067 0.034 0.0079 0.0096 0.0091
micz03 0.0036 0.0025 0.0024 0.0019 0.0043 0.0011 0.0012 0.0011

Write a Comment

User Comments (0)