Disclosure control of analytical outputs - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Disclosure control of analytical outputs

Description:

orthogonality of regressors is not a sufficient condition for identification ... the relevant variable is not orthogonal to all other variables. Approximate disclosure ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 15
Provided by: fel49
Category:

less

Transcript and Presenter's Notes

Title: Disclosure control of analytical outputs


1
Disclosure control of analytical outputs
  • Felix Ritchie, Office for National Statistics

2
SDC in a research environment
  • Almost all SDC research concerned with
  • Preparing tables for publication
  • Anonymising datasets for release
  • Little work on characteristics of the research
    environment
  • Practically no work on disclosiveness of
    analytical results
  • Does this matter?
  • Data custodians may want to apply inappropriate
    rules
  • Analysts assume analytical results are too
    complex to be disclosive
  • No clear agreement or strategy for dealing with
    analytical results

3
Paper aims to show that, for regressions,
  • The analysts are fundamentally correct
  • There are a small number of identified problems
  • A simple rule is available to assess/quantify
    residual risk
  • Concern over the nature of variables and validity
    of analysis is misplaced

4
Exact identification in a linear regression1.
Direct disclosure
  • Consider estimated result
  • In theory, can observe K unknowns by knowing K
    values of coeffient vector and all other values
    in the data set (eg all Y, all X bar one
    observation)
  • Is this a practical concern?
  • Consider more realistic cases where you dont
    have every other variable.

5
Exact identification, cont.2. Disclosure by
differencing 2 variables
  • Two estimates, one with an additional observation
  • Solving normal equations and differencing
  • 2 equations, 2 unknowns, but in general insoluble
    without full knowledge of variables because of
    inverse term
  • so are the analysts correct?

6
Exact identification, cont.2. 2 variable case -
exceptions
  • y0 can be identified if means of variables are
    known
  • But this only works for the mean of the
    additional observations
  • Binary explanatory variables
  • Can determine both y0 and X0
  • Works because this is effectively a table
  • But key point is that variables counts are
    summary statistics for aggregate values ie can be
    dealt with in same framework

7
Exact identification, cont.2. 2 variable case -
exceptions
  • Binary dependent variable
  • Works for linear and non-linear regressions

8
Exact identification, cont.2. 2 variable case -
summary
  • So analysts correct in general
  • Specific cases with known information requirement
    can be identified
  • Even non-linear regressions can be differenced to
    identify categorical values
  • Results extend to K-variable case, except that
  • orthogonality of regressors is not a sufficient
    condition for identification
  • an incomplete knowledge of the matrix of
    explanatory variables is a sufficient condition
    for non-disclosiveness, unless
  • a sufficient statistic for ?xik exists, in which
    case an intruder can at best only determine ?y0i

9
Exact identification, cont.3. Prevention
  • In general, the exact values of variables
    underlying a regression cannot plausibly be
    determined unless
  • the regression consists entirely of categorical
    variables, or
  • has a dependent binary variable
  • and disclosure by differencing is only possible
    route for identification.
  • A linear regression is completely non-disclosive
    if
  • one or more coefficients is effectively
    suppressed (that is, the coefficient could not
    reasonably be determined from published
    information), and
  • the relevant variable is not orthogonal to all
    other variables

10
Approximate disclosure4. Calculating the
prediction error
  • Calculate the variance for an individual data
    point
  • Cant be determined without exact value of X? Can
    if you have x1 because
  • So all you need is R2

11
Approximate disclosure4. Prediction errors,
continued
  • Can calculate the minimum predictive error for
    any data point included in regression by
    substituting largest value
  • Can calculate the minimum predictive error for
    new data points by using amended formula
  • But to do this without full knowledge of
    explanatory variables requires full set of
    coefficients
  • Suppressing coefficients prevents prediction
    error being assessed
  • Even if coefficients are insignificant

12
Non-linear regressions
  • Above analysis requires
  • Not the case for non-linear models
  • Differencing not an issue
  • May be other issues not identified yet

13
Does statistical validity matter?
  • Above analysis carried out without reference to
    types of variables
  • Multicollinearity, measurement error, influential
    points, outliers, public variables etc not
    necessary to prove regressions disclosive
  • Bad regressions dont make for safe regressions
  • Good regressions dont make safe regressions

14
  • Felix Ritchie
  • Microeconomic Analysis
  • Office for National Statistics
  • felix.ritchie_at_ons.gov.uk
Write a Comment
User Comments (0)
About PowerShow.com