Sample surveys: Experiences from the past and challenges for the future - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Sample surveys: Experiences from the past and challenges for the future

Description:

Uses advanced imputation models. Complex algorithms. Computer-intensive ... Some values are the result of imputation. Some values are still subject to error ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 28
Provided by: maarten2
Category:

less

Transcript and Presenter's Notes

Title: Sample surveys: Experiences from the past and challenges for the future


1
Sample surveysExperiences from the past
andchallenges for the future
  • Jelke Bethlehem
  • Statistics Netherlands and
  • University of Amsterdam

2
Overview
  • Survey design
  • Data collection
  • Data cleaning
  • Nonresponse problem
  • Analysis of dirty data
  • Protection of privacy
  • Dissemination of survey data

3
Survey design
  • Man-made randomisation
  • Vital for representative samples
  • Allows for computation of accuracy of estimators
    (confidence intervals)
  • Sampling designs
  • Simple random sampling
  • Complex sampling for more accuracy
  • Complex estimates for more accuracy

4
Survey design
  • Sampling using auxiliary information
  • Discrete auxiliary variable
    Stratified sampling
  • Continuous auxiliary variable Unequal
    probability sampling
  • Estimation using auxiliary information
  • Discrete auxiliary variable
    Post-stratification estimation
  • Continuous auxiliary variable Regression
    estimator

5
Data collection (PAPI)
  • Face-to-face interviewing
  • Expensive
  • High data quality, high response rates
  • Mail interviewing
  • Cheap
  • Low data quality, low response rates
  • Telephone interviewing
  • Less expensive than face-to-face
  • Better quality than mail
  • Coverage problems
  • Limitations and possibilities

6
Data collection (CAI)
  • Computer-assisted interviewing
  • Easier for interviewers
  • Higher data quality
  • Faster survey processing
  • Forms of CAI
  • CATI Telephone interviewing
  • CAPI Face-to-face interviewing
  • CASI Self-interviewing

7
Data collection (CAI)
  • New possibility
  • CAWI Web interviewing
  • On-line help
  • CATI problems
  • Incomplete sampling frames
  • Change to mobile phones
  • Instrument documentation
  • Large / complex interviewing instruments
  • Need for automatic documentation tools

8
Data cleaning
  • Errors in PAPI surveys
  • Domain errors
  • Consistency errors
  • Route errors
  • Other measurement errors
  • Errors in CAI surveys
  • Other measurement errors

9
Data cleaning
  • Data editing
  • Detecting errors
  • Correcting detected errors
  • Traditional data editing micro editing
  • All forms are checked
  • Requires a lot of resources (40)
  • Question
  • Are we over-editing?

10
Data cleaning
  • Alternative 1 Automatic editing
  • Automatic micro-editing
  • Detects errors
  • Determines variable to be changed
  • Uses advanced imputation models
  • Complex algorithms
  • Computer-intensive
  • Not perfect

11
Data cleaning
  • Alternative 2 Selective editing
  • Split records in critical records and
    non-critical records
  • Only critical records contain errors affecting
    publication figures
  • Automatic editing of non-critical records
  • Manual editing of critical records
  • Requires criterion to assign records to critical
    or non-critical stream

12
Data cleaning
  • Alternative 3 Macro-editing
  • Perform checks on aggregated data (publication
    tables)
  • In case of an error, drill down to records
    causing the error, and edit these records
  • Minimised data editing efforts
  • Dangers of subjective judgement

13
Nonresponse
  • Unit nonresponse
  • No information at all is obtained from the
    respondent
  • Item nonresponse
  • Only some questions are left unanswered
  • Consequences of non-response
  • Less observations
  • Wrong conclusions

14
Nonresponse at Statistics NetherlandsPercentag
es
  • YEAR LFS CSS SWP MOS HOS
  • 1975 14 22 14
  • 1976 28 23 13
  • 1977 12 31 30 19
  • 1978 36 33 22
  • 1979 19 37 35 31 26
  • 1980 39 39 32 26
  • 1981 17 35 32 26
  • 1982 40 36 34 29
  • 1983 18 37 42 34 26
  • 1984 35 36 31
  • 1985 23 31 39 32
  • 1986 29 41 41 34
  • 1987 40 29 41
  • 1988 41 32 45
  • 1989 39 32 42
  • 1990 39 32 45
  • 1991 40 31 43

15
Nonresponse at Statistics Netherlands

16
Nonresponse in the Dutch Labor Force Survey
17
Treatment of nonresponse
  • Unit nonresponse
  • Simple weighting (post-stratification)
  • Linear/multiplicative weighting
  • Calibration
  • Item nonresponse
  • Simple imputation
  • Multiple imputation

18
The weighting problem
  • Question
  • How to compute weights?
  • Answer
  • Use auxiliary variables
  • Auxiliary variables
  • Measured in the survey
  • Population means must be available
  • Are correlated with target variables

19
The weighting problem
Target variable Y
  • Find auxiliary variable X that correlates with
    target variable Y
  • No correlation between X and ?
  • No bias for Y.
  • Correlation between X and ?
  • Bias for Y.

Auxiliary variable X
Response behavior ?
20
The weighting problem
  • Does weighting work?
  • Yes, but only to some extent
  • No proper auxiliary variables available
  • The future
  • More register data will become available
  • Construction of synthetic census files
  • Therefore, more information about nonrespondents

21
The weighting problem
  • Different approach
  • Focus on estimation of response probabilities
  • Politz Simmons (1949) estimate at-home
    probabilities
  • New methodology for estimating co-operation
    probabilities
  • Stratification by response probability

22
Analysis
  • The analysis of dirty data
  • Dependent sample
  • Unequal selection probabilities
  • Bizarre, unspecified distribution
  • Some values are missing
  • Some values are the result of imputation
  • Some values are still subject to error
  • Weights have been added
  • Question
  • May we use standard analysis techniques?

23
Privacy protection
  • Basic question
  • May we release survey data files, even after
    removing name and address information?
  • Disclosure problem
  • Identification a unique link is established
    between a record in a file and an individual
  • Disclosure consequently, sensitive information
    is revealed

24
Privacy protection
  • Types of disclosure
  • Disclosure by matching Survey file is matched
    with name and address file.
  • Disclosure by response knowledge The knowledge
    the an individual is in the file, makes it very
    easy to identify him.
  • Disclosure of rare persons It is easy to
    identify individuals with unique combinations of
    variable values

25
Privacy protection
  • Disclosure protection
  • Reduce number of variables in file
  • Reduce number of categories of a variable
  • Set rare values to missing
  • Delay publication
  • Add noise to data, while maintaining first and
    second moments
  • Contract only for statistical analysis
  • The dilemma
  • Privacy or information?

26
Dissemination
  • Data documentation
  • Information Data Meta-data
  • Important for secondary analysis
  • Costly and time-consuming
  • Boring, not planned
  • New approaches
  • Data Documentation initiative (DDI)
  • Various meta-data project in 4-th and 5-th
    framework of the EU
  • XML wil become new standard

27
Future of survey sampling
  • Trends
  • More and more use of register data
  • Less use of sample surveys to collect new data
  • Surveys as a quality control instrument for
    register data
Write a Comment
User Comments (0)
About PowerShow.com