Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata

Description:

Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata Paul Massell and Jeremy Funk Statistical Research Division – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 33
Provided by: masse003
Category:

less

Transcript and Presenter's Notes

Title: Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata


1
Protecting the Confidentiality of Tables by
Adding Noise to the Underlying Microdata
  • Paul Massell and Jeremy Funk
  • Statistical Research Division
  • U.S. Census Bureau
  • Washington, DC 20233
  • Paul.B.Massell_at_census.gov

2
Talk Outline
  • Overview of EZS Noise
  • Measuring Effectiveness of Perturbative
    Protection
  • Noise Applied to Weighted Data
  • Noise Applied to Unweighted Data Random vs.
    Balanced Noise
  • Conclusions and Future Research

3
The EZS Noise Method (Evans, Zayatz, Slanta)
  • Developed by Tim Evans, Laura Zayatz, and John
    Slanta in the 1990s
  • Multiplicative noise is added to the underlying
    microdata, before table creation
  • A noise factor or multiplier is randomly
    generated for each record

4
The EZS Noise Method (Evans, Zayatz, Slanta)
  • The distribution of the multipliers should
    produce unbiased estimates, and ensure that no
    multipliers are too close to 1
  • Weights both known and unknown to users are
    combined with the noise factors to obtain noisy
    values for all records
  • When tabulated, in general, sensitive cells are
    changed quite a bit and non-sensitive cells are
    changed only by a small amount

5
Attractive Features of EZS
  • Tables with noisy data are created in
  • the same way as the original tables
  • simply replace var X with var X-noisy
  • Tables are automatically additive
  • An approximate value could be released for
    every cell
  • (depends on agency policy)
  • No Complementary Suppressions

6
Attractive Features of EZS
  • Linked tables and special tabs are automatically
    protected consistently
  • EZS allows for protection at the company level
    (Census requirement)
  • Ease of implementation compared to methods such
    as cell suppression

7
Measuring Effectiveness of the EZS Method
  • Step 1 Determine which cells in a table are
    sensitive e.g., using p Sensitivity Rule
  • Step 2 Measure level of protection to sensitive
    cells (using protection multipliers)
  • Step 3 Measure amount of perturbation to
    non-sensitive cells (via change graph)

8
The p Sensitivity Rule
  • Unweighted Data
  • Let T cell total x1, x2 top 2
    contributions
  • Let rem denote remainder
  • Set rem T (x1 x2)
  • Let prot denote suggested protection
  • Set prot (p/100) x1 rem
  • if prot gt 0, when Contributor 2 tries to
  • estimate x1, rem does NOT provide enough
    uncertainty additional protection is needed
    noise may provide this uncertainty

9
p Sensitivity Rule
  • Weighted Data
  • TA Fully Weighted Cell Estimate
  • X1 Largest Cell Respondent Contribution
  • X2 2nd Largest Cell Contribution
  • wkn Known Weights
  • wun Unknown Weights

10
Extended p rule w. weights rounding
  • rem TA (X1 wkn1 X2 wkn2 )
  • prot ( (p/100) X1 wkn1 ) rem

11
Measuring the Effectiveness of a Perturbative
Protection Method
  • Protection of Sensitive Cells
  • Define Protection Multiplier (PM)
  • PM abs (perturbation) / prot
  • Find how many (or ) have PM lt 1
  • Data Quality
  • Important change for non-sensitive cells
  • Less important over-pertubation for
  • sensitive cells

12
EZS Noise Factors for Unweighted Data
  • Let X original microdata value
  • Let Y perturbed value
  • Let M noise multiplier i.e. a draw from a
    specified noise distribution of EZS type
  • Y X M

13
Noise Distribution used for all
examples (a1.05, b1.15) 5 to 15
noise
14
Noise Applied to Weighted Data
  • Key idea weights (e.g., sample weights)
  • provide protection to microdata since users
    typically know weights only roughly (except
    when close to 1)
  • Not necessary to apply full M factor to X unless
    w 1

15
EZS Noise Factor for Weighted Data
  • Weighted Data
  • For a simple weight w with associated
    uncertainty interval at least as wide as 2bw
  • the noise factor S can be combined with w to
  • form the Joint Noise-Weight Factor

16
Noise Formula for Known and Unknown Weights
  • Calculation of Perturbed Values
  • wkn is the known weight
  • wun is the unknown weight.

17
Noise for Weighted DataCommodity Flow Survey
(CFS)
  • Measures flow of goods via transport system in
    U.S.
  • Estimates volume and value of each commodity
    shipped by origin, destination, modes of
    transport
  • Used for transport modeling, planning, ... Some
    users have objected to disclosure suppressions

18
Effect of Noise on High Level Aggregate Cells
  • CFS Table National 2-DigitCommodityData
    Quality Measure 43 cells 0 are sensitive
  • 41 cells change by 0 - 1
  • 2 cells change by 1 - 2

19
CFS Test Table
  • (Origin State by Destination State by 2 digit
    Commodity)
  • 61,174 cells of which 230 are sensitive
  • Data Quality and Protection Assessments
  • (following slides)

20
CFS Noise ResultsData Quality Assessment
  • While some cells may receive large doses of
    noise, vast majority get less than 1 or 2

21
CFS Random NoiseProtection Assessment
  • Most sensitive cells receive significant noise,
    i.e. 5 to 11
  • Only 2 out of 230 sensitive cells do not receive
    full protection from noise, as measured by
    Protection Multipliers (PM)

22
Noise for Unweighted DataNon-Employers
Statistics
  • Special Features of Microdata
  • Unweighted adminstrative data
  • Only 1 variable to protect receipts
  • Many small integers (after rounding to
    1000)
  • Special Features of Key Table
  • Many cells have a small number of
    contributors these include many safe cells
  • Many sensitive cells with only 1 or 2
    contributors

23
NE Noise ResultsData Quality Assessment
  • Lack of weights results in much more distortion
    to non-sensitive cells than occurs for CFS

24
NE Noise ResultsProtection Assessment
  • Resembles noise factor distribution, due to
    prevalence of 1 respondent cells in NE test table
    and no weights

25
Noise Balancing
  • Is there a way to improve data quality in this
    situation?
  • Yes, if one can focus on one key table T
  • Idea balance noise at each cell in balancing
    sub-table B of T (defn every micro value is in
    at most one cell of B)
  • Choose noise directions to maximize noise
    cancellation for each cell of B

26
Noise BalancingSupportive NE Characteristics
  • Balancing works especially well for NE because a
    high of microdata is single unit
  • After balancing interior cells, need to check
    noise effect on aggregate cells in same table
  • Also need to check noise effect in higher and
    lower tables these we call trickle up and
    trickle down effects
  • For NE, there are few of these other tables
  • this makes balancing decision easier

27
NE Balanced NoiseData Quality Assessment
  • Vast improvement in data quality
  • Resembles that of weighted data in CFS

28
NE Balanced NoiseProtection Assessment
  • Very similar to Random Noise application
  • 91.7 of sensitive cells fully protected

29
Random Noise vs. Balanced NoiseNon Employer Test
Data
Percent Fully Protected ( PM gt 1 ) Percent Fully Protected ( PM gt 1 )
Random 92.14
Balanced 91.70
  • Data Quality is greatly improved
  • Protection Level is not significantly reduced
  • Thus Balanced Noise is a Good Choice Here

PM density curves on 0,1 are nearly identical
for 2 methods
30
Conclusions
  • Conclusions
  • EZS Noise is a useful method for protecting
    tables from a variety of economic programs
  • There are now several variations of the basic EZS
    method which is best for a survey depends on
    both microdata and table characteristics

31
Future Research
  • 1. Should some sensitive cells be suppressed
    high noise cells flagged ?
  • 2. How to handle multiple variables ?
  • 3. What is the most that users can be told about
    noise process without compromising data
    protection ?
  • 4. How to handle company dynamics (births,
    deaths, mergers, .) ?
  • 5. How to coordinate survey protection ?

32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com