Title: HOUSEHOLD%20SURVEYS
1HOUSEHOLD SURVEYS
- FRAMES AND SAMPLE SELECTION,DATA COLLECTION
METHODS AND DATA PROCESSING
Martin Schaaper OECD Directorate for Science,
Technology and Industry Economic Analysis and
Statistics Division
2FRAMES AND SAMPLE SELECTION
3General advice in the model surveys
- It is important to minimise sampling error and
non-sampling error (bias) by - using a population frame which accurately
reflects the target population - using well-designed samples which are large
enough to produce reliable data.
4Household ICT use surveys
- Target population for OECD model survey
- is the population about which we are producing
estimates scope of the survey - individuals aged 16-74 (or broader)
- households with at least one member in that age
group - some countries might have other scope
restrictions e.g. exclude non-residents.
5Household ICT use surveys ctd
- Survey population
- survey population coverage
- will not usually equal the target population e.g.
may exclude some households in remote areas.
6Household ICT use surveys ctd
- Survey frame (population frame)
- list of units in the survey population
- should be as complete and accurate as possible
- ideally contains information to improve
efficiency of sample. - Missing units on the list can lead to bias if
those units are different from the remaining
population. - Other problems with frames include duplication,
dead units, insufficient information including
poor contact information. - Some countries (including Australia) use a frame
of geographic areas rather than statistical
units. - Survey frames for household ICT use surveys vary
a lot among OECD countries.
7Household ICT use surveys ctd
- Samples selected from the frame
- should produce reliable results for the target
population and subgroups of the population (e.g.
females, or households with children) - stratification e.g. by region, income, household
type, degree or urbanisation - stratified random sampling e.g. of regions or
units from the population register - systematic sampling e.g. from an ordered list
- many countries use two stage sampling e.g. by
region then household (or dwelling) - OECD member country household ICT use surveys
vary a lot in how samples are selected.
8Household ICT use surveys ctd
- Sampling within the household
- some surveys select an individual within a
household - this should be random e.g. person with nearest
birthday. - Sample size
- should be enough to produce reliable estimates
for the total population and for subgroups
according to the output to be produced (e.g.
females, or households with children) - sample size is a trade-off between reliability
and cost - sampling error is usually indicated by the
standard error (or relative standard error). - Sample size needs to be higher
- in a more variable population
- where the characteristic being measured is rare
- where the level of detail required is greater.
9Household ICT use surveys ctd
- Mandatory versus voluntary surveys
- both types are found amongst OECD countries
- non-response will tend to be higher in voluntary
surveys with implications for non-response bias
and sampling error. - Weighting
- where the frame or the samples drawn are not
representative, responses should be weighted
according to an independent distribution of the
population. - Censuses
- some countries have added ICT use questions to
population censuses - estimates will not be subject to sampling error.
10References
- Australian National Statistical Service (NSS)
Basic Survey Design Manual http//www.nss.gov.au/n
ss/home.nsf/SurveyDesignDoc?OpenViewRestrictToCat
egoryBasicSurveyDesign - Eurostat Methodological Manual for statistics on
the Information Society (covers both business and
household use of ICT surveys) http//europa.eu.int
/estatref/info/sdds/en/infosoc/metmanual_2006.pdf
search22methodological20manual20for20statisti
cs20on20the20information20society22 - OECD Guide to Measuring the Information Society,
Annex 3 (metadata) http//www.oecd.org/sti/ictmeta
data
11DATA COLLECTION METHODS
12DATA COLLECTION TECHNIQUE
Following main techniques of collecting data can
be more or less used for ICT use surveys
- PERSONAL INTERVIEW
- Face-to-face
- Telephone interview
- POSTAL SURVEY (MAILED QUESTIONNAIRE)
- ELECTRONIC SURVEY
- E-mail
- Web based
13PERSONAL INTERVIEW
Personal interview is a data collection technique
used by most OECD countries for collecting data
on household and individual access and use of
ICT.
- Interview surveys can generally be conducted by
one of two methods - FACE-TO-FACE INTERVIEW (PERSONAL IN HOME
SURVEY) - Traditional face-to-face interview (paper and
pencil) - Computer assisted personal interviewing (CAPI)
- TELEPHONE INTERVIEW
- Traditional telephone interviews
- Computer assisted telephone dialling
- Computer assisted telephone interviewing (CATI)
Interviews are generally easier for the
respondent, especially if what is sought is
opinions or impressions (individual behaviour can
be observed and exchange of material/information
between interviewer and respondent is possible).
14FACE-TO-FACE INTERVIEW
- Strengths
- Interaction
- Opportunity for terms to be explained
- Opportunity to probe or ask follow-up questions.
- Recommended for long surveys
- Recommended for locations where telephone or
Internet penetration is low - High response rate
- Weaknesses
- Can be very time consuming
- Resource intensive and very expensive
- Interviewers have to be well trained
Face-to-face interview, very often in combination
with CAPI system (see later), is the most applied
data collection method used for the household
survey on ICT use.
15TELEPHONE INTERVIEW
- Strengths
- Enable to gather information rapidly,
- Allow for personal contact,
- Allow to ask follow-up questions,
- More flexible than face-to-face interviews
- Weaknesses
- Not all telephone numbers public-listed
- People often don't like the intrusion of a call
to their homes, - More difficult to contact right person
- Telephone interviews have to be relatively
short, - Inability to use visual aids,
- Lower response rate than face-to-face
interviews
Telephone interviews (CATI system) is also a
common collection technique used in OECD
countries for data collection for the household
survey on ICT use.
16COMPUTER ASSISTED INTERVIEW (CAPI/CATI)
- Strengths (advantages compared to paper and pen
interviews) - No routing errors
- Customising of questions
- Data quality
- Time - automatic clean data
- Weaknesses (compared to paper and pen
interviews) - Need for highly experienced interviewers
- Time - the construction and programming takes
time. - Costs ?
System CAPI or CATI is commonly used for personal
(face-to-face or telephone) interviews for
household survey (see further) among the OECD
countries.
17POSTAL SURVEY (MAILED QUESTIONNAIRE)
- Strengths
- They are relatively inexpensive to administer.
- They allow the respondent to fill it out at
their own convenience. - Weaknesses
- Response rates from mail surveys are often very
low - Data quality
- Not suitable for very complex issues)
- Long time delays
18ELECTRONIC (ONLINE) SURVEY
- Strengths
- Less expensive than to pay for postage or for
interviewers. - Easy manipulation of data
- Respondents may answer more honestly
- Data quality
- Time saving
- Weaknesses
- Population and sample limited
- Security
- More instruction may be necessary
- May have technical problems with hardware and
software.
19MANDATORY VERSUS VOLUNTARY SURVEY
- Voluntary surveys are usually cheaper, quicker
and easier to manage. - The advantage of a mandatory survey is usually
the higher response rate, thereby reducing the
risk of serious non-response bias. - However, a mandatory survey implies making
several attempts to contact the respondent or
sending several reminders. This process usually
makes the collection period longer as one needs
to wait a longer time for all responses.
20DATA PROCESSING
21DATA PROCESSING
- Coding, checking, data entry, editing and
monitoring the whole data processing procedure. - The main aim produce a data file free from
errors. - Systematic and sustained follow up, including a
system of reminders
- Data quality control to identify errors can be
executed - on-line, at the moment of the data capture by the
interviewer - in the statistical institute (using electronic
questionnaire), - after the data entry process
22MEASUREMENT ERRORS
- Invalid response
- Relationship error
- Compulsory question left unanswered
- Suspicious values
23EDITING
- Editing can identify only noticeable errors
- Records should only be transferred to the final
computer file after they have passed through all
the edit checks without a failure. - Nevertheless, errors may still occur.
24Main editing checks
- Structure checks
- Range edits
- Sequencing checks
- Duplication and omissions
- Logic edits
25NON-RESPONSE TREATMENT
- Two types
- item non-response
- unit non-response
- Non-response treatment
- Unit non-response is generally handled by
adjusting the weight of the households and/or
individuals that responded to the survey to
compensate for those that did not respond. - Item non-response is generally dealt with by
imputation.
26Effect of non-response on the quality of the data
- Non-response (unit as well as item non-response)
can seriously affect the quality of the data
collected in a survey - Characteristics of non-respondents different
- Reduction of the sample size (overall or for
certain questions) will increase the variance of
the estimates. - Impact on total cost of survey
- Could be an indicator of poor overall quality of
the survey and thus create an image or confidence
problem.
27UNIT NON-RESPONSE HOUSEHOLD SURVEY
- Ineligible cases
- Out-of-scope case (selected element is not in the
target population) - Other ineligible
- Eligible cases
- Non-contact (e.g. no one was at home)
- Refusal (e.g. selected individual was contacted
but refused to take part in the survey) - Rejected interview (e.g. the selected individual
did take part but the survey form cannot be used
due to its poor quality fill out) - Other non-response
28WEIGHTING ADJUSTMENT FOR UNIT NON-RESPONSE
- Weighting classes
- In order to implement non-response adjustments,
it is required to create weighting classes. It is
desirable to divide the sample in "response
homogeneity groups/classes". - Within these classes the respond rates should be
as homogeneous as possible, and the response rate
should be different among the classes. Data used
to form these classes must be available to both
non-respondents and respondents - For household survey information about
demographical (age, gender, ethnicity),
geographical (urban/rural, zip code) or
socioeconomic (employment, income) variables are
usually available from administrative data.
29ITEM NON-RESPONSE TREATMENT HOUSEHOLD SURVEY
Sampling units with a very high item non-response
can better be classified as total non-response or
unit non-response. In survey on households and
individuals access and use of ICT there are some
systematic patterns in the occurrence of non
response. E.G It is obvious that non-response
may be higher among older respondents or lower
educated respondents as they are more at risk of
not understanding the questions. We can take
this into account by imputing within strata or
classes. But the risk of wrongly imputing the
data of ICT users (who feel concerned and
happily answer the questions) to non ICT users
(who drop out because they consider themselves
not concerned by the survey) remains when it is
the research variable itself (e.g. internet use)
which may be the critical factor for the
willingness or ability to provide an answer. The
logical solution to this problem would be not to
impute at all
30IMPUTATION HOUSEHOLD SURVEY
Deductive methods These methods are rather
related to heuristics than to modeling. They try
to deduct the most logical answer using the
available information for the household or
individual. In general, such procedures will be
part of the validation checks and not of the
non-response treatment. Imputing the mean or
mode This method consists of imputing missing
values by the mean observed in the group of
respondents in case of numerical variables or the
mode in case of categorical or binary
variables. The big advantage of this method is
that it is very easy to implement and to explain.
31WEIGHTINGS GROSSING UP METHODS
Household survey The weighting factors are to be
calculated taking into account in particular the
probability of selection and external data
relating to the distribution of the population
being surveyed, where such external data are held
to be sufficiently reliable. As the sampling
design used differs strongly across countries, it
is difficult to present fit-all guidelines.
Moreover, the weighting procedures / grossing up
methods are usually determined by the sampling
design used.
32REFERENCES
- EUROSTAT
- Methodological Manual for statistics on the
Information Society (2006) http//europa.eu.int/
estatref/info/sdds/en/infosoc/metmanual_2006.pdfs
e arch22methodological20manual20for20statisti
cs20on20the 20information20society22 - OECD
- Guide to Measuring the Information Society
(2005), - http//www.oecd.org/sti/ictmetadata
- Metadata for OECD Countries' ICT Collections
2004/2005 - http//www.oecd.org/sti/ictmetadata
33THANK YOU!
- martin.schaaper_at_oecd.org