Combinations of SDC methods for continuous microdata - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Combinations of SDC methods for continuous microdata

Description:

Methods have very different properties, so combining them we can improve the utility. ... Combinations of the methods. Decrease the Risk ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 15
Provided by: aoga
Category:

less

Transcript and Presenter's Notes

Title: Combinations of SDC methods for continuous microdata


1
Combinations of SDC methods for continuous
microdata
Anna Oganian National
Institute of Statistical Sciences
2
Introduction
  • SDC methods have two goals

Methods for continuous microdata
  • Rankswapping
  • Additive noise
  • Resampling
  • Microaggregation
  • microaggregation based on one variable
  • microaggregation based on several variables

3
Why combinations?
Methods have very different properties, so
combining them we can improve the utility.
Example Microaggregation with z-scores
projection
Red points microaggregated data Green points
original data
For data close to normal we can add normal
noise Mic(O) N(0, Cov(O)-Cov(Mic(O)))
4
Performance measures
  • Propensity score utility measure
  • (Mi-Ja work)
  • Two kinds of DR
  • - identification disclosure
  • - attribute disclosure

5
DR
  • Identification disclosure
  • It is considered that disclosure occurs when
    the intruder can correctly identify a record
    in the released data file, that is to relate it
    to a particular individual.
  • Attribute disclosure
  • Intruder's target is an original value of a
    particular attribute, for example a salary of a
    particular individual.
  • So attribute disclosure measures the gain in
    information achieved by the intruder about some
    attribute after releasing masked data. More
    precisely - how tight can be found the bounds for
    the original values given masked data.

6
Examples of attribute disclosure for several
methods
  • Assumption SDC method and parameters are
    released together with the data set

Rankswapping
Upper and lower bound for every value in the
masked data are
If the algorithm of rankswapping is known, so the
distribution of the values in these intervals
could be found by the intruder by the means of
running rankswapping large number of times on
the vector of length N.
7
  • Rankswapping example

Suppose data set X has 1000 records and variable
j in data set X is lognormal. Rankswapping with
p5 was applied to this data set. The range of
the variable is 0.04,25.57. Choose the value
in the dense area of masked data xj0.50, so
lower and upper bounds for the corresponding
original are 0.42, 0.58. Consider the largest
value in the masked data x25.57, using the
distribution for the highest rank we can find 95
confidence interval 5.20, 6.92. Consider the
smallest value in the masked data x0.04, 95
confidence interval for the corresponding
original data is 0.07, 0.019.
8
  • Noise addition
  • Variance of added noise is
    and its mean is 0, so 100(1-a)
    confidence regions around masked records xm
    could be computed based on multivariate normal
    distribution

Example
9
Several stages of masking
  • Ideally the security of the SDC method should be
    guarantied by the masking algorithm and not
    depend on keeping in secret the parameter or
    details of the algorithm.

In cryptography Data Encryption Standard (DES)
10
Combinations of the methods
Original data
M1(Original)
M2(M1(Original))
Masking M2
Masking M1
  • Decrease the Risk
  • We can even increase utility of the resulted data
    if we combine properly the methods!
  • For example
  • Combine microaggregation with noise

11
Several stages of masking
Or in general case
where
12
Combinations of methods
  • Microaggregation using z-scores projection, p3
    ?Microaggregation using z-scores projection, p3
    (Micz03_Micz03)
  • Microaggregation using z-scores projection, p3 ?
    Microaggregation using principal component
    projection, p3 (Micz03_Micpcp03)
  • Microaggregation using z-scores projection, p3 ?
    Multivariate microaggregation, p10
    (Micz03_Micmul10)
  • Microaggregation using z-scores projection, p3 ?
    Rankswapping, p1 (Micz03_Rank1)
  • Single Microaggregation using z-scores
    projection, p3 (Micz03)

13
Propensity score utility
  sym sym sym     nonsym nonsym nonsym  
  high cor high cor low cor low cor low cor high cor high cor low cor low cor
  pos neg pos pos neg pos neg pos neg
micz03__micz03 23.5 28.3 37.3 37.3 33.9 45.1 180.1 40 94.1
micz03__micpcp03 12.8 9.3 7.9 7.9 9.3 5 3.4 8.6 5.8
micz03__micmul10 18.4 16.5 14.8 14.8 13 5.5 9.2 11.2 8.7
micz03__rank1 28.6 34.8 27.3 27.3 29.4 14.8 42 39.5 26.5
micz03 128.1 281.5 132.1 132.1 233.4 592.1 639.4 463.8 639
14
Identification DR
  sym sym sym   nonsym nonsym nonsym  
  high cor high cor low cor low cor high cor high cor low cor low cor
  pos neg pos neg pos neg pos neg
micz03__micz03 0.0275 0.0015 0.0035 0.0023 0.0033 0.0004 0.0033 0.0009
micz03__micpcp03 0.0198 0.2516 0.0133 0.0029 0.0806 0.2926 0.0477 0.18
micz03__micmul10 0.0077 0.0046 0.0203 0.0025 0.1122 0.0947 0.0071 0.1265
micz03__rank1 0.0092 0.0119 0.0087 0.0067 0.034 0.0079 0.0096 0.0091
micz03 0.0036 0.0025 0.0024 0.0019 0.0043 0.0011 0.0012 0.0011
Write a Comment
User Comments (0)
About PowerShow.com