Use of Administrative Data in Statistics Canada - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Use of Administrative Data in Statistics Canada

Description:

Stratified SRS (Stratum = NAICS * Province * Size) Small take-some ... Domains of interest: NAICS and Province. Analytical Studies. Motivation for two studies: ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 26
Provided by: matt146
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Use of Administrative Data in Statistics Canada


1
Use of Administrative Data in Statistics Canadas
Annual Survey of Manufactures
  • Steve Matthews and Wesley Yung
  • May 16, 2004
  • The United Nations Statistical Commission and
    Economic Commission for Europe
  • Conference of European Statisticians

2
Outline
  • Introduction
  • Tax data programs at Statistics Canada
  • The Annual Survey of Manufactures (ASM)
  • Overview
  • Strategy for use of tax data
  • Analytical studies
  • Conclusions and Future Work

3
Introduction
  • Desire to increase use of tax data
  • Reduce respondent burden
  • Reduce survey costs
  • Can be used at many stages of survey process
  • Stratification
  • Survey data validation
  • Edit and imputation
  • Estimation

4
Tax Data programs at Statistics Canada
  • Tax data available to Statistics Canada
  • Collected by Canada Revenue Agency (CRA)
  • Access via a data-sharing agreement
  • To be used only for statistical purposes
  • Two extensive tax data programs
  • Unincorporated businesses (T1)
  • Incorporated businesses (T2)

5
Tax Data programs at Statistics Canada (contd)
  • T1 - Population
  • Unincorporated businesses
  • Account for small share of revenues
  • Administrative Data
  • Sample-based
  • Limited set of variables
  • Edit and imputation is applied
  • Weighted benchmarked estimates

6
Tax Data programs at Statistics Canada (contd)
  • T2 - Population
  • Incorporated businesses
  • Account for large share of revenues
  • Administrative Data
  • Census-based
  • Extensive set of variables
  • Edit and imputation is applied
  • Micro-data is produced

7
The Annual Survey of Manufactures
  • Manufacturing is an important sector of Canadian
    economy
  • 17 of GDP
  • Annual Survey of Manufactures
  • Take-none Portion and Survey Portion
  • Extensive questionnaire (financial and commodity)
  • Data requirements (pseudo-census)

8
The Annual Survey of Manufactures (contd)
  • Target population
  • Drawn from Statistics Canadas Business Register
    (BR)
  • All businesses classified to manufacturing
  • Sample design
  • Non-survey portion
  • Administrative data
  • Survey portion
  • Stratified SRS (Stratum NAICS Province
    Size)
  • Small take-some / Large take-some / Take-all
  • Collected via mail-out / mail-back, follow-up via
    telephone

9
The Annual Survey of Manufactures (contd)
  • Edit and Imputation
  • Edits applied to ensure accuracy and coherence
  • Extensive imputation to produce pseudo-census
    dataset
  • Historical imputation
  • Ratio imputation
  • Nearest-neighbour donor imputation

10
The Annual Survey of Manufactures (contd)
  • Estimation
  • Non-survey portion (tax data)
  • Total Expenses only
  • T1 weighted domain estimates
  • T2 aggregates from administrative census dataset
  • Survey portion (survey data and imputed data)
  • Aggregates from pseudo-census dataset
  • Domains of interest NAICS and Province

11
Analytical Studies
  • Motivation for two studies
  • Which variables should be replaced?
  • What are the effects of the strategy on final
    estimates for all variables?

Study 1 Data comparison
Study 2 Impact Analysis
12
Analytical Study 1
  • Study to select appropriate variables
  • Comparison of reported data collected via survey
    and tax
  • Simple businesses only
  • Assess suitability for substitution of survey
    data

Based on 6,000 businesses
13
Analytical Study 1 (contd)
  • Correlation Analysis
  • Wide range of correlations
  • Total Expenses 0.9
  • Total Energy Expenses -0.10
  • Reporting Patterns
  • Same pattern (zero or positive) for individual
    businesses
  • Total Expenses 99
  • Total Energy Expenses 50

14
Analytical Study 1 (contd)
  • Distribution of Ratios
  • Examined histograms, fraction between 0.9 and 1.1
  • Total Expenses 60
  • Total Energy Expenses 16
  • Population Estimates
  • Relative difference between tax and survey-based
    estimates
  • Total Expenses 3
  • Total Energy Expenses 28

15
Analytical Study 1 (contd)
  • Selected several variables for direct
    substitution
  • Section totals and sub-totals
  • expenses, revenues, inventories, etc.
  • Remaining variables are imputed
  • Imputation gt assign distribution of details
    within each total

16
Analytical Study 1 - Conclusions
  • Distinctively different results for different
    variables
  • Direct substitution seems feasible for totals
  • Direct substitution not recommended for details
  • Use standard methods to impute other variables

17
Analytical Study 2
  • Analysis to evaluate impact of tax data strategy
  • Bias
  • Comparison of estimates from different scenarios
  • Variance
  • Shao-Steel approach for variance estimation
  • Reflects variance from sampling and imputation
  • Assume equal probability of response within
    imputation class

18
Analytical Study 2 (contd)
  • Scenarios

Tax Data Used in Imputation Estimator Variance
HT No Tax None (ratio imputation based on frame revenues) Horvitz-Thompson Sampling Imputation
PC No Tax None (ratio imputation based on frame revenues) Pseudo-census Imputation
PC - Tax Non-response (in or out of sample) Direct substitution Ratio imputation Pseudo-census Imputation
19
Analytical Study 2 (contd)
  • Comparison of resulting estimates for Total
    Expenses
  • Relative Difference from HT No Tax Total
    Expenses
  • Median value for all such domains

All Manufacturing NAICS3 x Province
PC No Tax 1.8 0.0
PC Tax 0.5 1.3
20
Analytical Study 2 (contd)
  • Comparison of estimated CVs for Total Expenses
  • Co-efficient of Variation Total Expenses
  • Median value for all such domains

All Manufacturing NAICS3 x Province
HT No Tax 0.3 1.5
PC No Tax 0.3 1.5
PC Tax 0.1 0.7
21
Analytical Study 2 (contd)
  • Comparison of resulting estimates for Total
    Energy Expenses
  • Relative Difference from HT No Tax Total
    Energy Expenses
  • Median value for all such domains

All Manufacturing NAICS3 x Province
PC No Tax 1.2 0.0
PC Tax 0.8 1.2
22
Analytical Study 2 (contd)
  • Comparison of estimated CVs for Total Energy
    Expenses
  • Co-efficient of Variation Total Energy Expenses
  • Median value for all such domains

All Manufacturing NAICS3 x Province
HT No Tax 0.3 1.8
PC No Tax 0.4 1.8
PC Tax 0.4 1.8
23
Analytical Study 2 - Conclusions
  • Bias
  • Small relative difference between estimated
    totals from scenarios
  • Variance
  • Relatively low CV for all options
  • Tax substitution variables Scenario 3 most
    efficient
  • Non-tax substitution variables Scenario 1 most
    efficient
  • Analytical capabilities
  • Scenarios 2 and 3 provide most detail

24
Conclusions
  • Results used to select 2004 strategy PC Tax
  • Meets needs of data users
  • Reduced cost and response burden
  • Maintain (improve) quality
  • Striving to further increase use of tax data
  • Increased portion of population
  • Increased number of variables

25
Future Work
  • Editing of tax data
  • Similar approach to survey data approach
  • Potential to expand list of direct substitution
    variables
  • Indirect use of tax data
  • More adaptive models
  • Quality indicators
  • Account for increased variance and potential for
    bias due to imputation
Write a Comment
User Comments (0)
About PowerShow.com