Quality Measures for Disclosure Controlled Statistical Data - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Quality Measures for Disclosure Controlled Statistical Data

Description:

... square statistic. Also, the same measure for entropy and the Pearson Statistic. Variance of ... Ratio of the deviance (likelihood ratio test statistic) between ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 37
Provided by: will150
Category:

less

Transcript and Presenter's Notes

Title: Quality Measures for Disclosure Controlled Statistical Data


1
Quality Measures for Disclosure Controlled
Statistical Data
  • Natalie Shlomo and Caroline Young
  • ONS and University of Southampton

2
Topics of Discussion
  • Introduction and Motivation
  • Quality Measures for Assessing the Impact of SDC
    methods
  • Demonstration of Software Application
  • Example
  • Conclusions and Future Research

3
Introduction
  • Data suppliers assess disclosure risk before
    releasing statistical data
  • Attribute disclosure - small counts are used to
    identify a statistical unit and confidential
    information revealed
  • Data suppliers need to make informed decisions on
    an appropriate SDC method that manages disclosure
    risk
  • SDC methods reduce disclosure risk by perturbing,
    modifying, or summarizing the data depending on
    the format of statistical outputs

4
Introduction
  • The most common forms of statistical outputs are
    tables (containing counts or aggregates) and
    microdata. Data can be collected from surveys or
    censuses and registers
  • Choosing an appropriate SDC method is an
    iterative process
  • Assess trade-off between managing disclosure
    risk and obtaining high quality outputs

5
Introduction
  • Examples of common SDC methods
  • pre-tabular methods (implemented on microdata)
    recoding, coarsening and eliminating variables,
    sub-sampling, record swapping or a probabilistic
    perturbation process,
  • post-tabular methods (implemented on tables)
    table redesign (coarsening and recoding),
    suppression and rounding (controlled, full random
    rounding, small cell rounding)
  • For cell suppression, users may want to impute
    suppressed cells
  • zeros, average of the total suppressed cells,
    weighted average

6
Quality Measures
  • Basic Statistics Number of cells and the total
    information in the table number of zeros, ones,
    and twos average cell size in the table maximum
    and minimum average cell size for each row and
    column and the standard error of these averages
  • For suppressed tables number and percent
    suppressed cells and total information lost
    choice of imputation method
  • For random rounded tables Binomial hypothesis
    test to check for bias in the rounding scheme,
    i.e. were the expected number of cells rounded up
    and down
  • For all other SDC methods paired sign rank test
    to check for no change in the location

7
Quality Measures
  • Distance metrics distortions to distributions
    on internal cells according to rows Let
    be a table for row k, the number of
    cells in the row, the number of rows, and
    the cell frequency for cell c
  • Hellingers Distance
  • (HD)
  • Relative Absolute Distance
  • (RAD)
  • Average Absolute Distance per Cell
  • (AAD)

8
Quality Measures
  • Distance metrics distortions to distributions
    on marginal sub-totals and totals
  • Let be a
    sub-total or total of cells
    and the number of totals on a row k

9
Quality Measures
  • Impact on Tests for Independence Cramers V
    measure of associationwhere is the
    Pearson chi-square statistic
  • Also, the same measure for entropy and the
    Pearson Statistic
  • Variance of Cell Counts
  • For each row

10
Quality Measures Between variance of target
variables for proportions Let the proportion
in a row k and
the overall proportion Between
variance and For continuous
variables, impact on correlations and
11
Quality Measures
Impact on Rank Correlations Sort original
cell counts and define deciles
Repeat on perturbed cell counts where I is
the indicator function and the number of
rows Log Linear Analysis Ratio of the
deviance (likelihood ratio test statistic)
between perturbed table and original table
for a given model
12
Risk Measures
  • For Census Data
  • Proportion of small cells (ones and twos) that
    were changed.
  • For Sample Data
  • Probability that a one in the table/microdata is
    a population unique

13
  • Part II - Software Application

14
Software Application in SASCompares original
outputs to disclosure controlled outputs
15
Windows of the program
16
Software Application
17
Software Application
18
Software Application
19
Software Application
20
Example
  • Census table at ward level
  • Sex (2)
  • Long-term illness (2)
  • Economic status (9)
  • Wards (70)

21
Example Output in html format
22
Example Output in html format
Number of small cells 226 (8.97)
23
Example Output in html format
Number of suppressed cells 254
24
Basic Measures of Distortion
25
Basic Measures of Distortion
Column Cells Moved Percent Moved a4 25 35.71
26
Basic Measures of Distortion
Absolute Average Distance 0.1358
27
More Complex Measures of Distortion
28
More Complex Measures of Distortion
29
More Complex Measures of Distortion
30
Disclosure Risk Assessment
31
Disclosure Risk Assessment
Percent 1s and 2s changed - 100.00
32
Additional Features
  • Error messages (specifying cause of error)
  • Easy to use (click on icon and it runs)
  • Handout explaining measures in simple terms

33
  • Part III - Conclusions and Future Work

34
Risk-Utility Confidentiality Map
35
Conclusions and Future Work
  • Emergence of some guidelines
  • - skewed tables (one or two large columns
    and the rest small columns) - prefer
    rounding to cell suppression
  • - uniform tables - less information loss
    due to SDC methods so choose method with
    least changes to the table
  • - sparse tables need to have benchmarked
    totals so control round (if possible) or
    semi- control random round
  • Quality measures for users and guidance on how
    to allow for statistical analysis with disclosure
    controlled statistical data

36
Contact Details
  • Natalie Shlomo
  • n.shlomo_at_soton.ac.uk
  • Caroline Young
  • cjy_at_soton.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com