Diapositiva 1 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Diapositiva 1

Description:

STATISTICAL CONFIDENTIALITY IN LONGITUDINAL LINKED DATA: OBJECTIVES AND ATTRIBUTES Mario Trottini University of Alicante (Spain) mario.trottini_at_ua.es – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 32
Provided by: uneceOrgst
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Diapositiva 1


1
STATISTICAL CONFIDENTIALITY IN LONGITUDINAL
LINKED DATA OBJECTIVES AND ATTRIBUTES Mario
Trottini University of Alicante
(Spain) mario.trottini_at_ua.es
Joint UNECE/Eurostat Work Session on Statistical
Confidentiality, Geneva 9-11 November 2005
2
Problem Definition
Longitudinal Linked Microdata
Microdata that contain observations from two or
more related sampling frame, with measurements
for multiple time periods for all units of
observation (Abowd and Woodcock 2004)
  • How to create the data set ?
  • How to disseminate the data ?

3
Problem Definition
Longitudinal Linked Microdata
Microdata that contain observations from two or
more related sampling frame, with measurements
for multiple time periods for all units of
observation (Abowd and Woodcock 2004)
  • How to create the data set ?
  • How to disseminate the data ?

4
Data Dissemination Why is It Difficult?
  1. Should allow legitimate users to perform
    statistical analyses as if the were using the
    original data

2. Control the risk of misuses of the data by
potential intruders
3. Be operational
Two issues
(i) Objectives are too ambiguous
(ii) Objectives are conflicting
5
Data Dissemination as a Decision Problem
Step(1) Identify the alternatives
Step(2) Structuring the objectives
Step(3) Define suitable attributes
Step(4) Assessing the trade-off
between the fundamental objectives
6
Data Dissemination as a Decision Problem
Step(1) Identify the alternatives
Step(2) Structuring the objectives
Step(3) Define suitable attributes
Step(4) Assessing the trade-off
between the fundamental objectives
7
Outline
  • Identify the alternatives review of existing
    data

  • dissemination procedures
  • Structuring the objectives
  • -
    Theory
  • -
    Current practice
  • Selecting attributes
  • -
    Theory
  • -
    Current practice
  • Conclusions

8
Identifying the Alternatives
Let M Mk , k E denote the class of
alternatives data dissemination procedures
Two rationales
  • Data users and data users
  • needs are very diverse
  • (Mackie and Bradburn 2000)
  • Combining different methods
  • can produce greater data utility
  • for any level of disclosure risk
  • (Abowd and Lane 2003)

9
Identifying the Alternatives
Let M Mk , k E denote the class of
alternatives data dissemination procedures
MORE REALISTIC APPROACH Mk should be
Combination of 1-5
10
Structuring the Objectives Theory
Information Organization Overall Objective
The best data dissemination
Maximize safety
Minimize Cost
Maximize Usefulness
Too broad and ambiguous to be of operational use
STRATEGY Divide an objective in lower level
objectives that clarify the
interpretation of the broader objective
11
An Illustration
the data dissemination procedure should allow
legitimate data users to perform the statistical
analyses of interest as if they were using the
data set originally collected.
Usefulness
Sources of ambiguity
12
The Hierarchy
Maximize Usefulness
13
Structuring the Objectives Current Practice
  • Implicit hierarchy is often
  • incomplete

?
  • However, only few of them are
  • taken into account in applications

Transparency, accessibility, feasibility are
often not considered
?
14
An Illustration
?
ORIGINAL MICRODATA
DORIG
  • Apply some transformation, T, to the data
  • DREL T(
    DORIG) )
  • 2) Release to the user DMASKED ( DREL, I(T)
    )

DATA MASKING


Usefulness assessment
D F(DORIG)- F(DMASKED)
IGNORING TRANSPARENCY!
15
General Guidelines for Structuring the Objectives
  • Definition of safety, usefulness and cost
    are problem dependent.
  • However, providing a clear definition of them in
    any specific Data
  • Dissemination Problem is crucial for the
    quality of the final decision.
  • The use of hierarchies could be very beneficial
    in terms of

1. clarifying the interpretation of the relevant
objectives
2. check that no relevant aspects of the
problem have been ignored
3. facilitate communication
16
Selecting Attributes Theory
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes

17
Selecting Attributes Theory
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes

Example Objective Minimize Cost (Natural)
attribute Cost in Euros
  • Not very common in SDC

18
The Hierarchy
Maximize Usefulness
19
Selecting Attributes Theory
"subjective scale" constructed out of
several aspects typically associated with the
objective of interest.
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes

20
Attribute level Description of attribute level
1 Support No groups are opposed to the facility and at least one group has organized support for the facility.
0 Neutrality All groups are indifferent or uninterested.
-1 Controversy One or more groups have organized opposition, although no groups have action-oriented opposition. Other groups may either be neutral or support the facility.
-2 Action-oriented opposition Exactly one group has action-oriented opposition. The other groups have organized support, indifference, or organized opposition.
-3 Strong action-oriented opposition Two or more groups have action-oriented opposition.
Table 1. Constructed attribute for public
attitudes. (Keeney and Gregory 2005)
21
Selecting Attributes Theory
"subjective scale" constructed out of
several aspects typically associated with the
objective of interest.
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes
  • Defining feature Interpretability

?
  • Not used in SDC

22
Selecting Attributes Theory
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes

Reflects the degree to which an associate
objective is met but does not directly measure
the objective.
23
Proxy Attributes for Usefulness in SDC
GENERAL FORMULATION DORIG ORIGINAL DATA DREL
DISSEMINATED DATA F( Data) some feature of
Data PROXY DISCREPANCY ( F(XORIG), F(XREL) )
INTUITION Low distorsion of the data implies
nearly correct inferences for nearly all
statistical analyses
24
Proxy Attributes for Usefulness in SDC
PROXY DISCREPANCY ( F(DORIG), F(DREL) )
F DISCREPANCY



Proxy as discrepancy between summary statistics
Domingo Torra (2001), Yancey W.E. et al. (2002),
Oganyan, A. (2003), Grup Crises (2004)
Summary statistics Absolute (relative) difference Percentage variation Mean variation, etc
Density estimation Hellinger distance Kullback-Leibler divergence Other distances
Model based inferences Estimation Prediction Model Selection Difference in parameter estimates, Intervals overlaps Discrepancy in model ranking etc.
Proxy as discrepancy between distributions Agrawal
and Aggarwal (2001), Gomatam et al. (2004),
Karr et al. (2005)
Inference based proxy Gomatam et al. (2004). ,
A.F. Karr et al. (2005) ,
25
Selecting Attributes Theory
  • Defining features
  • Usually easier to handle
  1. Natural attributes
  2. Constructed Attributes
  3. Proxy attributes
  • Require some understanding
  • of the relationship between
  • the objective of interest and
  • the associated objective
  • measured by the proxy.
  • (TOO) OFTEN USED IN SDC

26
An Illustration
Goal Assessing the trade-off between
Maximize usefulness and maximize safety for a
given level c of Cost
  • Attribute for usefulness (Information loss)
    Hellinger Distance (IL)
  • Attribute for safety (Disclosure risk) of
    record correctly re-identified (DR)

What does D(IL)0.1 mean in terms of fitting a
regression model?
Data dissemination1 D1
Data dissemination 2 D2
IL(D1)0.4 DR(D1) 1
IL(D2)0.5 DR(D2)0.5
DR(D1)- 0.5 IL(D1) 0.1 C
DR(D1) IL(D1) C
????
27
Attribute Selection Theory and Current Practice
THEORY
Prescriptive Order in Attributes selection
  1. Natural attributes
  2. Constructed attributes
  3. Proxy attributes

28
Conclusions

There is a tendency in all problem solving to
move quickly away from the ill-defined to the
well-defined, from constraint-freethinking to
constrained thinking. There is a need to feel,
and perhaps even to measure, progress toward
reaching a solution" to a decision problem.
(Keeney, 1992, page 9)
  • In this talk it is argued that too little effort
    has been made for a comprehensive definition of
    the Data Dissemination
  • problem in terms of
  • - alternatives
  • - objectives
  • - attributes

29
Conclusions (Cont.)
  • Hierarchy and constructed attributes could
    represent useful
  • tools to address these problems.
  • Although the discussion has not focus on
    dissemination of
  • longitudinal linked data as much as desired, I
    think it is particularly relevant for this type
    of data given
  • - The complexity of the modeling
  • - The multiple decision makers
    involved
  • - The different perspectives of
    disclosure and utility
  • that must be accommodated in the
    final decision.

30
Acknowledgements

Preparation of this paper was supported by the
U.S. National Science Foundation under Grant
EIA-0131884 to the National Institute of
Statistical Sciences. The contents of the paper
reflects the authors' personal opinion. The
National Science Foundation is not responsible
for any views or results presented.
31
THANK YOU !
Write a Comment
User Comments (0)
About PowerShow.com