The ARGUS Software of the SDCproject - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

The ARGUS Software of the SDCproject

Description:

Statistics Netherlands. Washington, August 1999. Statistical ... Co-operation between The Netherlands, Italy ( Spain) and UK. General aims of SDC project ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 36
Provided by: CBS96
Category:

less

Transcript and Presenter's Notes

Title: The ARGUS Software of the SDCproject


1
The ARGUS Software of the SDC-project
  • Anco Hundepool
  • Statistics Netherlands
  • Washington, August 1999

2
Statistical Disclosure Control
  • the balance between the need for (more and more)
    information and
  • the privacy of the respondents

3
Statistical Disclosure Control
  • Need for detailed micro data files
  • electronic publications
  • computing power of users
  • Need for more detailed tables
  • But....!!!

4
Statistical Disclosure Control
  • Protection of privacy of respondents
  • persons, enterprises, institutions
  • Respondents must be able to trust Statistical
    Offices!
  • Risks
  • Intruders/ hackers
  • Accidental recognition
  • Advanced record linkage techniques

5
Statistical Disclosure Control
  • Produce safe datafiles and tables
  • Apply data modification techniques
  • Preserve as much information
  • Implemented in ARGUS!

6
Framework of developmentof ARGUS
  • SDC project
  • partly subsidised by EU (4th Framework)
  • Co-operation between The Netherlands, Italy
    (Spain) and UK

7
General aims of SDC project
  • Methodological research in SDC
  • microdata, tables
  • concerning statistics, OR
  • geographical data
  • (general) SDC Software development
  • microdata (m-ARGUS)
  • tables (t-ARGUS)

8
SDC project members
  • Netherlands
  • CBS (ARGUS)
  • TU-Eindhoven (OR for microdata)
  • Italy
  • Istat (with Univ. of Rome)(Research/testing)
  • CPR-Padova (with Univ. Tenerife)(OR for tabular
    data)

9
SDC project members
  • UK
  • ONS (data)
  • Univ. Manchester (with Univ. of
    Southampton)(Research on SARs)
  • Univ. Of Leeds (Geographical data)

10
Main software developed in SDC-project
  • m-ARGUS (CBS and TUE)
  • micro data
  • t-ARGUS (CBS and CPR)
  • tabular data

11
Ideas of m-ARGUS
  • Intruder uses information of identifying
    variables (e.g. region, sex, age, education,
    occupation) to identify records.
  • This leads to the sensitive information

12
m-ARGUS
  • Levels of protection
  • public use files (PUF)
  • micro files for researchers (MUC)universities,
    contract etc.
  • safe-setting

13
Ideas of m-ARGUS
  • a list of combinations of identifying variables
    must checked
  • find value combinations that are unsafe
  • e.g. a x b x c
  • threshold depends on level of protection
  • Public use files
  • Micro data for researchers (contract)

14
Ideas of m-ARGUS
  • eliminate the unsafe combinations
  • by global recoding (age - agegroup, region
    - province)
  • local suppression (imputing missings)
  • interactively/automatically
  • with minimum information loss (entropy)

15
m-ARGUS
  • For microdata
  • Developed in Borland C
  • Windows-95/98
  • Version 3.0 last SDC-version
  • interactive/automatic global recoding
  • automatic local suppression

16
Features of m-ARGUS
  • can handle large microdata files
  • only tables derived from microdata are being
    used
  • flexible global recoding
  • options for automatic mix of global recoding and
    local suppression (TU Eindhoven)

17
Addit. features of m-ARGUS
  • Micro-aggregation
  • Top/Bottom coding
  • Rounding

18
m-ARGUS
metadata
microdata
Generate tables
Recoding schemes
Global recoding
Local suppression
Micro aggregation
Top/bottom coding
Rounding
Report
metadata
microdata
19
m-ARGUS input data
  • Data Fixed format ASCII
  • Metadata
  • Name
  • Position
  • Missing values (2)
  • Identification level
  • Hierarchical coding
  • Codelist (opt.)

20
Using m-ARGUS
  • reading data file
  • generating tables
  • apply global recodes
  • local suppression
  • generate safe file
  • generate report

21
t-ARGUS
22
Ideas of t-ARGUS
  • identification of sensitive cellsusing e.g.
    dominance rule
  • at least n (e.g. 2) contributors to a cell
  • sum of largest 3 contributors 75(one large
    contributor could recalculate the contribution of
    its competitor)
  • easy part

23
Ideas of t-ARGUS
  • Eliminate/protect sensitive cells(hard part)
  • by applying SDC techniques
  • table redesign
  • cell suppression
  • rounding
  • interactively and/or automatically
  • with minimum information loss (e.g. cell weights)

24
Ideas of t-ARGUS
  • cell suppression in tables with marginals
  • identify primary sensitive cells
  • protect primary cells by suppressing additional
    (secondary) cells to prevent recalculation (to
    some approximation)
  • with minimal information loss (CPR)

25
t-ARGUS
  • 3-D tables
  • interactive table redesign
  • primary secondary cell suppression
  • optimisation routines for automatic cell
    suppression
  • rounding

26
t-ARGUS
metadata
microdata
tabulation
codelists
redesign
rounding
suppression
report
Safe table
27
Features of t-ARGUS
  • Initial run through microdata
  • Determine also top k per cell -sensitive cells
  • Table redesign possible without going back to
    microdata
  • Uses procedures for secondary cell suppression
    using state-of-the optimisation algorithms (CPR)
  • Prepared for linked tables

28
t-ARGUS
  • Data fixed format ASCII
  • Meta data
  • Variable name
  • Start. position
  • Field length
  • Status

29
t-ARGUS
  • Apply global recoding
  • Protect file with secondary suppression
  • Rounding
  • Safe table as ASCII or .WK1(plus report)

30
t-ARGUS
  • Version 2.0 final SDC-version
  • requires commercial OR-solver(Xpress by Dash,
    UK, 600 GBP)

31
Future / CASC
  • Computational Aspects of Statistical
    Confidentiality
  • New European project-proposal(2000-2002)
  • Extending ARGUS
  • New research
  • Additional joint USA/EU-project?

32
CASC-m
  • Concentration on business/economic data
  • microaggregation
  • PRAM
  • Noise-addition/ masking

33
CASC-t
  • Hierarchical tables
  • Linked tables
  • Optimal solution vz. heuristics
  • Different input formats

34
CASC-team
  • Statistics Netherlands
  • Istat (Italy)
  • ONS, Univ. Southampton, Manchester, London,
    Plymouth (UK)
  • Bundesambt, IAB (Germany)
  • Stat. Catalunya, Univ Tenerife (Spain)

35
Contact
  • Anco Hundepool
  • Statistics Netherlands
  • PO box 4000
  • 2200 JM Voorburg
  • The Netherlands
  • email ahnl_at_krypton.vb.cbs.nl
  • fax 31 70 3375990
  • phone 31 70 3375038
Write a Comment
User Comments (0)
About PowerShow.com