Methodology of Allocating Generic Field to its Details - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Methodology of Allocating Generic Field to its Details

Description:

Title: General to Detail Allocation Author: Jessica Andrews Last modified by: Carl Girard Created Date: 9/20/2006 7:39:26 PM Document presentation format – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 32

Provided by: Jessica378

Category:

more less

Transcript and Presenter's Notes

Title: Methodology of Allocating Generic Field to its Details

1
Methodology of Allocating Generic Field to its
Details

Jessica Andrews
Nathalie Hamel
François Brisebois
ICESIII - June 19, 2007

2
Outline

Background Information on Tax Data
Objective
Current Methodology
Other Methodologies Considered
Comparison of the Methodologies
Future Work and Conclusions

3
Tax Data

Statistics Canada receives annual data from
Canada Revenue Agency (CRA) on incorporated (T2)
businesses
Tax data
Balance Sheet
Income Statement
88 different Schedules

4
Tax Data

About 700 different fields to report
Most companies provide only 30-40 fields
Only 8 fields are actually required by CRA
(section totals)
Non-farm revenue
Non-farm expenses
Farm revenue
Farm expenses
Assets
Liabilities
Shareholder Equity
Net Income/Loss

5
Objective

To impute the missing detail variables
Why ?
Tax data users need detailed data (tax
replacement project (TRP))
Different concepts and definitions between tax
and survey data
A subset of details linked to the same generic
can be mapped to different survey variables
(Chart of Account)

6
Challenges to meet

Methodology must
Work well for a large number of details
Be capable of dealing with details which are
rarely reported and those which are frequently
reported
Give good micro results for tax replacement, but
also give good macro results when examined at the
NAICS or full database level

7
First attempt to complete Tax Data

Edit rules
Outlier detection within a record
Deterministic edits (to ensure the record
balances within section)
Review and manual corrections
Overlap between fiscal period
Negative values
Consistency edits between tax variables
Outlier detection between records
(Hidiroglou-Berthelot)
CORTAX balancing edits
Deterministic imputation of key variables
Inventories
Depreciation
Salaries and wages

8
GDA Concepts

Corporation can use either generic or detail
fields to report their results

Case 1 Case 2 Case 3
Generic 8810 Office expenses amount 100 30
Details 8811 Office stationery and supply expense amount 20
Details 8812 Office utilities expense amount 30 10
Details 8813 Data processing expense amount 50 60
Total Total Total 100 100 100
9
GDA Concepts

Block is defined by a generic and its details
Generic field is not a total
Goal is to impute the most significant detail
variables when a generic amount has been reported
GDA Generic to detail allocation

10
Current method

Uses imputation classes based on industry codes
and size of company
First 2 digits of NAICS (about 25 industries)
Three sizes of revenue (boundaries of 5 and 25
million)
Calculates ratios within imputation classes for
each block
Uses all non-zero and non-missing details
Uses only details reported at least 10 of the
time (5 for block General Farm Expense)
Assigns ratios to businesses with a generic

11
Current method

Originally proposed as a solution with good macro
(aggregate) results
Now need good micro (business) level results for
TRP
Problems
Imputation classes are frequently not homogeneous
in terms of distribution
A large number of small imputation classes

12
Other methods considered

Historic imputation method
Scores method
Cluster method

13
Historic imputation method

Assumes distributions of details are the same
from one year to the next
Problems
A change in business strategies/properties will
not be considered this way
Most businesses which report details in the
previous year will report them also in the
current year, leaving few businesses which could
be imputed with this method (5 on all blocks
tested)
Requires use of another method for remaining
businesses

14
Scores method

Uses response/non response models for each detail
Groups businesses into imputation classes on the
basis of percentiles of response probability
Calculates ratios within imputation classes
Assigns ratios to businesses with a generic

15
Scores method

Problems
Need to create a model for each detail
Difficult to resolve what to do in the case of
blocks with many details (5 or more) which are
frequently reported
This method was excluded due to its difficulty
in coping with blocks with a moderate to large
number of details

16
Cluster method

Divides businesses into imputation classes on the
basis of response patterns to details
Uses clustering or dominant detail method
Uses discriminatory models (parametric or not) to
assign businesses with generic to imputation
classes
Calculates ratios within imputation classes
Assigns ratios to businesses with a generic

17
Cluster method

Problems
For certain blocks it can be difficult to find
good variables on which to discriminate
Issue of how often clustering method and models
should be reviewed

18
Comparing the methods

Estimate distributions of known data for year n
from ratios calculated for year n-1
Create a benchmark file
Reported details in years n-1 and n
Put all details into generic fields in year n
Calculate ratios from businesses in year n-1 for
all methods
Assign ratios to businesses in year n
Compare the results to the reported fields

19
Comparing the methods

Compare the results at the micro (businesses) and
the macro (aggregate) levels
Compare true and estimated distributions

20
Comparing the methods

Macro statistics
for the jth detail in the block

21
Comparing the methods

Micro Statistics
Median Pseudo CV
for the jth detail and ith business in the block

22
Comparing the methods

Micro Statistics
Median Pearson Contingency Coefficient
for the jth detail and ith business in the block
f values represent the marginal distributions
d2 represents the degree of dependency (depends
on n, r and c)

23
Comparing the methods

We show results for Block 8230 Other Revenue
This block has 20 details covering revenue
distribution
Important for clients as used in many surveys
The scores method is not shown as it is difficult
to implement with this many details

24
Comparing the methods
OTHER REVENUE FLDS 8230 TO 8250 OTHER REVENUE FLDS 8230 TO 8250
8230 Other revenue
8231 Foreign exchange gains/losses
8232 Income/loss of subsidiaries/affiliates
8233 Income/loss of other divisions
8234 Income/loss of joint ventures

8248 Insurance recoveries
8249 Expense recoveries
8250 Bad debt recoveries
25
Results
Block 8230 Micro Statistics Micro Statistics Micro Statistics Micro Statistics Macro Statistics Macro Statistics
Median PseudoCV IQR Median PearsonCont. Coeff. IQR SSE SSEP
Current Method 1.08 0.43 0.66 0.14 2.2e20 120
Cluster Method 0.34 1.39 0.36 0.63 2.8e20 12
Historic Cluster 0.51 0.99 0.10 0.7 9.9e19 4.5
26
Cluster methodology

Most blocks use dominant detail (attractor) x
clusters to define the imputation classes
A business i belongs to cluster j of attractor x
where xgt50 if
where is the total value reported by
business i in detail j. If this statement is not
true for any detail then the business is assigned
to cluster j1.

27
Cluster methodology

Distribution ratios to details are calculated for
each cluster
Discriminatory models are then created
(nonparametric for most blocks) to assign
businesses with a generic
Use variables on industry (NAICS), location
(province), size (revenue, log revenue), details
and totals of details in other blocks

28
Cluster methodology

Generic amounts are assigned to details in the
following 3 ways
If generic amount and no details reported then
ratios are assigned as calculated
If generic amount and all details with ratio
greater than 0 are reported then ratios are
assigned as calculated
If generic amount and some details but not all
are reported, then ratios are pro-rated and
generic is assigned only to details which were
not reported

29
Cluster methodology

Gives better micro results
Improved data for tax replacement
Macro results remain similar to current
methodology
Micro results are consistent year to year

30
Future work and conclusions

The cluster methodology will be implemented for
reference year 2006 for the Income Statement
Model fitting and implementation for Balance
Sheet will follow
Review of models and clustering methods as deemed
appropriate

31
Contact Information / Coordonnées