Title: Use of Administrative Data in Statistics Canada
1Use of Administrative Data in Statistics Canadas
Annual Survey of Manufactures
- Steve Matthews and Wesley Yung
- May 16, 2004
- The United Nations Statistical Commission and
Economic Commission for Europe - Conference of European Statisticians
2Outline
- Introduction
- Tax data programs at Statistics Canada
- The Annual Survey of Manufactures (ASM)
- Overview
- Strategy for use of tax data
- Analytical studies
- Conclusions and Future Work
3Introduction
- Desire to increase use of tax data
- Reduce respondent burden
- Reduce survey costs
- Can be used at many stages of survey process
- Stratification
- Survey data validation
- Edit and imputation
- Estimation
4Tax Data programs at Statistics Canada
- Tax data available to Statistics Canada
- Collected by Canada Revenue Agency (CRA)
- Access via a data-sharing agreement
- To be used only for statistical purposes
- Two extensive tax data programs
- Unincorporated businesses (T1)
- Incorporated businesses (T2)
5Tax Data programs at Statistics Canada (contd)
- T1 - Population
- Unincorporated businesses
- Account for small share of revenues
- Administrative Data
- Sample-based
- Limited set of variables
- Edit and imputation is applied
- Weighted benchmarked estimates
6Tax Data programs at Statistics Canada (contd)
- T2 - Population
- Incorporated businesses
- Account for large share of revenues
- Administrative Data
- Census-based
- Extensive set of variables
- Edit and imputation is applied
- Micro-data is produced
7The Annual Survey of Manufactures
- Manufacturing is an important sector of Canadian
economy - 17 of GDP
- Annual Survey of Manufactures
- Take-none Portion and Survey Portion
- Extensive questionnaire (financial and commodity)
- Data requirements (pseudo-census)
8The Annual Survey of Manufactures (contd)
- Target population
- Drawn from Statistics Canadas Business Register
(BR) - All businesses classified to manufacturing
- Sample design
- Non-survey portion
- Administrative data
- Survey portion
- Stratified SRS (Stratum NAICS Province
Size) - Small take-some / Large take-some / Take-all
- Collected via mail-out / mail-back, follow-up via
telephone
9The Annual Survey of Manufactures (contd)
- Edit and Imputation
- Edits applied to ensure accuracy and coherence
- Extensive imputation to produce pseudo-census
dataset - Historical imputation
- Ratio imputation
- Nearest-neighbour donor imputation
10The Annual Survey of Manufactures (contd)
- Estimation
- Non-survey portion (tax data)
- Total Expenses only
- T1 weighted domain estimates
- T2 aggregates from administrative census dataset
- Survey portion (survey data and imputed data)
- Aggregates from pseudo-census dataset
- Domains of interest NAICS and Province
11Analytical Studies
- Motivation for two studies
- Which variables should be replaced?
- What are the effects of the strategy on final
estimates for all variables?
Study 1 Data comparison
Study 2 Impact Analysis
12Analytical Study 1
- Study to select appropriate variables
- Comparison of reported data collected via survey
and tax - Simple businesses only
- Assess suitability for substitution of survey
data
Based on 6,000 businesses
13Analytical Study 1 (contd)
- Correlation Analysis
- Wide range of correlations
- Total Expenses 0.9
- Total Energy Expenses -0.10
- Reporting Patterns
- Same pattern (zero or positive) for individual
businesses - Total Expenses 99
- Total Energy Expenses 50
14Analytical Study 1 (contd)
- Distribution of Ratios
- Examined histograms, fraction between 0.9 and 1.1
- Total Expenses 60
- Total Energy Expenses 16
- Population Estimates
- Relative difference between tax and survey-based
estimates - Total Expenses 3
- Total Energy Expenses 28
15Analytical Study 1 (contd)
- Selected several variables for direct
substitution - Section totals and sub-totals
- expenses, revenues, inventories, etc.
- Remaining variables are imputed
- Imputation gt assign distribution of details
within each total
16Analytical Study 1 - Conclusions
- Distinctively different results for different
variables - Direct substitution seems feasible for totals
- Direct substitution not recommended for details
- Use standard methods to impute other variables
17Analytical Study 2
- Analysis to evaluate impact of tax data strategy
- Bias
- Comparison of estimates from different scenarios
- Variance
- Shao-Steel approach for variance estimation
- Reflects variance from sampling and imputation
- Assume equal probability of response within
imputation class
18Analytical Study 2 (contd)
Tax Data Used in Imputation Estimator Variance
HT No Tax None (ratio imputation based on frame revenues) Horvitz-Thompson Sampling Imputation
PC No Tax None (ratio imputation based on frame revenues) Pseudo-census Imputation
PC - Tax Non-response (in or out of sample) Direct substitution Ratio imputation Pseudo-census Imputation
19Analytical Study 2 (contd)
- Comparison of resulting estimates for Total
Expenses - Relative Difference from HT No Tax Total
Expenses - Median value for all such domains
All Manufacturing NAICS3 x Province
PC No Tax 1.8 0.0
PC Tax 0.5 1.3
20Analytical Study 2 (contd)
- Comparison of estimated CVs for Total Expenses
- Co-efficient of Variation Total Expenses
- Median value for all such domains
All Manufacturing NAICS3 x Province
HT No Tax 0.3 1.5
PC No Tax 0.3 1.5
PC Tax 0.1 0.7
21Analytical Study 2 (contd)
- Comparison of resulting estimates for Total
Energy Expenses - Relative Difference from HT No Tax Total
Energy Expenses - Median value for all such domains
All Manufacturing NAICS3 x Province
PC No Tax 1.2 0.0
PC Tax 0.8 1.2
22Analytical Study 2 (contd)
- Comparison of estimated CVs for Total Energy
Expenses - Co-efficient of Variation Total Energy Expenses
- Median value for all such domains
All Manufacturing NAICS3 x Province
HT No Tax 0.3 1.8
PC No Tax 0.4 1.8
PC Tax 0.4 1.8
23Analytical Study 2 - Conclusions
- Bias
- Small relative difference between estimated
totals from scenarios - Variance
- Relatively low CV for all options
- Tax substitution variables Scenario 3 most
efficient - Non-tax substitution variables Scenario 1 most
efficient - Analytical capabilities
- Scenarios 2 and 3 provide most detail
24Conclusions
- Results used to select 2004 strategy PC Tax
- Meets needs of data users
- Reduced cost and response burden
- Maintain (improve) quality
- Striving to further increase use of tax data
- Increased portion of population
- Increased number of variables
25Future Work
- Editing of tax data
- Similar approach to survey data approach
- Potential to expand list of direct substitution
variables - Indirect use of tax data
- More adaptive models
- Quality indicators
- Account for increased variance and potential for
bias due to imputation