Title: Paradata Collection Research for Social Surveys at Statistics Canada
1Paradata Collection Research for Social Surveys
at Statistics Canada
- François Laflamme
- International Total Survey Error Workshop (ITSEW)
Quebec, June 2011
2Outline
- Data collection organization
- Data collection challenges
- Paradata
- Sources
- Database
- Paradata Research
- Objectives
- Scope
- Past research
- Current and future plans
3Data Collection Organization
- 3 regions and 8 locations across Canada
- HO collects quarterly and annual business survey
data - All other business moved in the regions
- Interviewers ( 1,500) collect the following data
- Face-to-face CAPI/PAPI ( monthly 100,000
attempts) - Concurrent surveys
- CATI call centres (5) ( monthly 900,000 calls)
- Household, agriculture and business surveys
- Concurrent surveys
- Unionization and operational constraints
4Data Collection Challenges
- Handling sophistication and increase of data
requirements - Maintaining acceptable response rates
- Ensuring highest quality of data collected
- Optimizing capacity
- Balancing work within and between Regional
Offices - Retention of workforce
- Reducing / maintaining collection costs
- Developing and deploying surveys consistently,
cost-effectively and timely - Keeping abreast of evolving survey collection
methodologies and technologies (e.g. multi-mode
surveys) - Taking into account operational constraints
4
5Paradata Sources
- Paradata is Data Collection Process Information
- Paradata sources
- Call and contact information
- Audit trail (interview key strokes)
- Interviewer administrative and payroll
information - Interviewer notes and observations - Not used
extensively - Can be enhanced with
- Sample design and sample unit information
- Capacity and planning assumptions
- Budget and target figures
- Paradata from previous cycle or supplement
surveys
6Paradata Database
- Paradata Database includes
- Call/attempts information for both
- Computer-Assisted Telephone Interview (CATI)
surveys - Computer-Assisted Personal Interview (CAPI)
surveys - Interviewer payroll information
- Processed and standardized information
- Raw files always available
- Historical information since 2003
- Updated on daily basis
- Prior to 2006, used for reporting purposes - not
for research - Audit trail kept separately
7Paradata Research
- Paradata can be used for
- Operational research (including survey
management) - Essentially before and during data collection
- Methodological research
- Historically, the focus is after data collection
(e.g. non-response and measurement errors) - Often grey zone between the two types of
research - Need to make the link between operational and
methodological research
2020-12-16
Statistics Canada Statistique Canada
7
7
8Paradata Research Objectives
- Better understand data collection process
- Identify potential operational efficiencies
- Evaluate new data collection initiatives
- Provide timely feedback and information
- Data collection survey management (Active
Management) - Maintain and improve data quality
- Improve the way surveys are conducted and managed
2020-12-16
Statistics Canada Statistique Canada
8
8
9Paradata Research Scope
- Initial focus on
- On CATI social surveys
- RDD, cross-sectional, longitudinal surveys
- Call and contact information
- Extended to
- CATI agriculture surveys
- CAPI surveys
- Payroll information
- Audit trail
- And more recently to
- Business surveys
10Past Research
- Initial analysis
- Effort spent calls and system time
- Reaching respondents contact rate, sequence of
calls, best time to call, contact versus
interview, etc. - Active management
- Customized reports
- Dashboard of key survey performance indicators
- Impact of cap on calls
- On response rates, survey estimates and costs
- Production and cost analysis
- Relationship between production and cost
- Productivity indicators and survey cost analysis
11Past Research 2..3
- Pace of interview (PoINT)
- CAPI surveys - Initial investigations
- Basic analysis attempts, time spent, contact
rates - Paradata quality and consistency
- Productivity and cost relationship
- Interaction between CAPI surveys
- Responsive Collection Design for CATI surveys
- Active management
- Identify a series of new indicators to assess
data collection quality and performance (e.g.
representativity, productivity and cost,
responding potential of in-progress case measure) - Implementation - two pilots surveys
- Analysis
12Past Research 3..3
- Many ad hoc research projects
- Interviewer productivity by level of experience
- Interaction between concurrent surveys
- System time versus non-system time, etc.
- Research increased knowledge about data
collection process and practices - Demonstrate potential benefits - Based on facts
(empirical data) - Investigate, test and implement new collection
strategies and tools - Think outside the box - Make the balance between theory and practice
- Focus on operationally viable projects
- Communicate and share information
- Documentation, papers, presentations, seminars,
etc.
13Distribution of Calls and Timeby Collection Phase
- More calls and system time spent after a first
contact for both respondents and non-respondents
13
14Relationship between Production and Cost
Throughout Survey Cycle
- Strong relationship
- Most distributions have the same shape
- System time is a good predictor for payroll hours
- Ratios of cost to production can be used to
derive productivity indicators
14
15Survey Productivity Indicators
- Based on time
- Completed Interview System Time / Total System
Time Ratios - Productivity ratios decrease during collection
period for CATI - Longitudinal CATI survey (SLID) shows larger
decreases - Productivity for CAPI survey is higher and more
stable - This ratio is affected by interview length and
response rate
16Current and Future Research Plans
- Focus on Strategies to improve the way data
collection is conducted and managed. - Hence the research need to
- Be sound and operationally viable
- Lead to more cost-effective collection and sample
design strategies - Lead to data quality improvements
2020-12-16
Statistics Canada Statistique Canada
16
16
17Current and Future Research Plans 2..3
- Responsive Collection Design (RCD) - ongoing
- Full RCD for SLID 2011 (including embedded
experiment for 1st call) - Improve current RCD strategy (e.g. propensity
models, phase-in of RCD, new conditions for
decision making, cost-efficiency objective) - RCD for CAPI surveys
- Documentation
- CATI cost-efficient framework (5 dimensions)
- Metrics used for costing and budgeting
- Optimal resources allocation within and between
surveys (2) - Collection process and practices
- Operational constraints
- Investigate approaches and assumptions to plan
data collection for multi-mode surveys
18Current and Future Research Plans 3..3
- Paradata course
- Describe the paradata (e.g. type, contents,
quality, etc.) - Applications of paradata to plan, manage,
monitor, assess and improve the survey process - Share experiences
- Long and short versions
- Other paradata research projects
- Sample coordination for CAPI surveys
- Consolidate and extend the use of audit Trail
- RCD - Theoretical framework
- Simulation and optimization projects
- Ad hoc research
19Potential Issues for Discussion
- Are there important gaps in paradata research? If
so - Which type of research need to be done?
- What are the research priorities?
- Any specific research with respect to TSE?
- Sharing information (communication)
- Paradata working group, conferences/events
(paradata sessions in many international events),
international network Is it enough/too much? Is
it efficient? - Potential collaboration between organizations -
can it be improved? - What is the most efficient organizational
structure for this type of research?
2020-12-16
Statistics Canada Statistique Canada
19
20For more information, please contact Pour plus
dinformation, veuillez contacter
François Laflamme francois.laflamme_at_statcan.gc.ca