Title: Data Quality in Nationwide German Social Surveys
1Data Quality in Nationwide (German) Social
Surveys
- Michael Blohm
- Centre for Survey Research and Methodology (ZUMA)
German General Social Survey (ALLBUS ) - Mannheim, Germany
- European Conference on Quality in Survey
Statistics, 24 26 April 2006, Cardiff, South
Wales
European Centre for Cross-Cultural Surveys
2 Outline
- Background
- Indicator for Data Quality
- Sampling Design and Data Quality
- Changes in Data Quality during the Fielding Period
3Background I
- Survey costs ? increasing
- Response rates ? decreasing
4Background II
- Questions
- Is data quality in expensive sampling designs
higher? - Is data quality higher the higher response
- rates are?
5A) Indicator for Data Quality I
- Research strategy
- Compare the net samples with the German
Microcensus - 7 Socio-demographic variables
- Age, sex, level of education, marital status,
size of household, nationality, employment
status
6Indicator for Data Quality II
- For distributions Index of Dissimilarity
Example Legal marital status
7Indicator for Data Quality III
- Index of Dissimilarity
- pro
- easy to interpret, takes all cases of a
sample into account - con does not consider the relative size of
categories and the direction of deviations
8B) Sampling Design and Data Quality I
9Sampling Design and Data Quality II
- A) Random-Route (ADM-Design)
- (3 Stages) Constituencies-Households-Individu
als - ? definition of target househould and
target person by interviewer - (according to rules)
- B) Adress-Random (ADM-Design) (3 Stages)
Constituencies-Households-Individuals - ? definition of target household by
field organization, definition of
target persons by interviewer (according to
rules) - C) Sample with named individuals/Register
(2 Stages) Municipalities-Individuals - ? definition of target persons by
field organization/researcher
10Sampling Design and Data Quality III
- Hypotheses
- the greater the leeway for interviewers with
regard to the selection of target persons the
greater the differences to the Microcensus
11Mean of Summed Index of Dissimilarity and Mean
Response Rate by Sampling Design
10
80
9
70
8
60
7
50
6
5
Mean Index of Dissimilarity
40
Response Rate ()
4
30
3
20
2
10
1
0
0
SR / RR
AR
Named Indi.
Index of Dissimilarity
Response Rate
12Sampling Design and Data Quality IV
- The more expensive the sampling designs ...
- the higher data quality
- the lower the response rates
13C) Data Quality During the Course of Fielding
Period I
-
- Research strategy
- Analyses of the deviations between the net
samples and Microcensus during the course
of fielding period - for distributions, bivariate and multivariate
associations - Only for samples with named individuals
(ALLBUS and ESS German Part) -
-
14Data Quality During the Course of Fielding Period
II
- Questions
- the higher response rates the higher data
quality? - typical sequences?
- differences between surveys?
15Data Quality During the Course of Fielding Period
III
- Hypotheses
- for distributions During fielding period
deviations from Microcensus should decrease - for (multivariate) Associations During field
work no change (slight improvement)
16Distributions Mean of Index of Dissimilarity for
7 Socio- demographic Variables, by Response
Rate, by Survey
17Distributions Employment status Index of
Dissimilarity, by Response Rate, by Survey
18Distributions Level of education Index of
Dissimilarity, by Response Rate, by Survey
19Bivariate Associations
- Deviations between correlation coefficents for
surveys and for German Microcensus - N 15 correlations
20Mean Deviations of Correlation Coefficients
(N15) ALLBUS and ESS (German Part) vs.
Microcensus
21Multivariate Associations
- Deviations between ß-coefficients and constant
for Surveys and for Microcensus - Logistic regression models
- Dependent Variable
- - Employment Status (in paid work / not
in work) - Independent Variables
- - Sex,
- - Age (4 Cat.),
- - Education (3 Cat.)
22Mean Deviations of ß-Coefficients (N 7) ALLBUS
and ESS (German Part) vs. Microcensus
23Data Quality During the Course of Fielding Period
IV
- Association between response rates and data
quality not clear-cut - Higher response rates trend to result in
higher data quality, but -
- in the case of distributions improvements after
a response rate of 30 to 35 has been
achieved are marginal - strong effect of field institute on the level
of bias/deviation
24Conclusions
- Sampling design does matter
- Response rates are not a good indicator for data
quality
25- Thank you for your attention !
- http//www.gesis.org/dauerbeobachtung/Allbus/index
.htm
26Index of Dissimilarity Mean for 7
socio-demographic variables, by response-rate,
by survey
27(No Transcript)
28(No Transcript)
29Level of educationIndex of dissimilarity, by
response rate, by surveyage lt 50
30Mean of Summed Index of Dissimilarity and
Response Rate by Sampling Design