European Conference on Quality in Survey Statistics - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

European Conference on Quality in Survey Statistics

Description:

... Statistics. Exploring the statistical utilisation of financial statements to estimate sub ... Obtain statistical data at local level SLL (sub-regional level) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 53
Provided by: Ale8194
Category:

less

Transcript and Presenter's Notes

Title: European Conference on Quality in Survey Statistics


1
European Conference on Quality in Survey
Statistics
2
Exploring the statistical utilisation of
financial statements to estimate sub-regional
economic variables
  • P. Calia - C. Filippucci
  • University of Bologna - Italy
  • Filippucci_at_stat.unibo.it

3
Aim of the work
  • Obtain statistical data at local level SLL
    (sub-regional level) and by economic sector not
    provided by official statistics (value added,
    output, labour cost) referring to a digital data
    base of companies financial statements AIDA
  • SLL cluster of municipalities based on
    commuting flows
  • Emilia-Romagna Region and manufacturing sector
    are considered - year 2002

4
Aim of the work
  • THE PROBLEM
  • to face unit selection by an appropriate
    estimation strategy for variables

5
Why to use accounting data
  • Moreover
  • Information is available about a large number of
    business - much more than sample surveys,
  • Reduce the respondent burden
  • Supply micro data for modelling economic
    behaviour

6
Outlines of the paper
  • Does business accounts provide necessary
    information to estimate the economic variables of
    interest?
  • AIDA reliability
  • AIDAs coverage of the population
  • Strategy for estimation

7
AIDA
  • Edited by Bureau Van Dijk
  • Collects financial statements of Italian
    companies since 1996
  • Accessible via web after subscription
  • ----------------
  • 18,796 firms located in Emilia Romagna out of
    365,000
  • year 2002

8
AIDA cnt
  • It is possible
  • Select firms by different criteria (economic
    activity, location, identification codes, legal
    form, etc)
  • select variables of interest (accounting items
    and ratios, user defined indexes, structural
    economic variables, stakeholders, etc.)
  • perform peer group comparisons and simple
    statistical analysis
  • export data to be processed with other softwares

9
AIDA cnt
  • Companies compelled to present the financial
    statements are 26 but 72 in terms of
    employment.
  • In AIDA they are more than 99
  • Total coverage in the manufacturing sector is
    14 (firms) and 61 (employees) it improves
    considering the above group 24 (firms) and 84
    (employees)
  • In AIDA only firms having an estimated 2002
    production value gt500,000 Euros

10
From Business accounts to SNA concepts
  • The problem has been addressed by UN (UN, 2000)
    success depends on the formats used for
    compilation of financial statements
  • In Italy, business accounting structure, format
    and contents are established by law into the EU
    framework
  • Output components and costs are pointed out

11
From business accounts to SBS concepts cnt
  • Exact correspondence cannot be established for
    all variables (items detail and aggregation)
  • In Italy, the adequacy of accounting data to EC
    regulations for business statistics has been
    studied by National statistical Institute -
    Istat
  • accounting data covers about 65 of the variables
    considered by structural business surveys
  • high correspondence between accounting and
    surveys values
  • Actually, accounting data are used by Istat to
    impute for missing data in structural business
    surveys

12
From business accounts to SBS concepts cnt
  • Macrovariables in the frame of SNA are not
    exactly the same
  • BUT
  • Financial statements are the only source for
    obtaining fairly good indicators of firms
    perfomances at sub regional and activity sectors
    levels

13
Production value (output)
14
Value added at factor costs
15
Reliability
  • two strategies
  • Internal checks
  • External checks comparing with ASIA 2002
  • 2002 is considered because Asia is referred to
    2002
  • ASIA permanent archive of all active firms in
    industry and services, yearly updated. It is the
    frame for all the business surveys

16
Issue 3 avalaible information
  • From ASIA E-R (2002)
  • Fiscal code (unique id)
  • Name of the enterprise
  • Activity code (ateco 2002 Nace rev. 1.1)
  • Geographical location
  • Legal form
  • Number of employees
  • From AIDA
  • Fiscal code and business register code (CCIAA)
  • Name of the enterprise
  • Activity code (ateco 2002 Nace rev. 1.1)
  • Geographical location
  • Legal form
  • Date of birth
  • End date of financial statement
  • Accounting items from income statement

17
Preliminary check
  • Internal Checks Codes duplication and Missing
    data Essentially no problems
  • 2. External checks
  • Linking AIDA and ASIA (by fiscal code, Nace rev.
    1.1 codes, location and legal form)
  • Some problems very few or easy to recover and
    to correct using ASIA

18
Issue 3 Preliminary check cnt.
  • ONLY 35 records are duplicated by fiscal code
    (same company)
  • Different cciaa code only 2 are clearly
    recording error
  • Different geographical location (33 companies)
  • Different legal form (2 companies)
  • Different economic activity code (17 companies)
  • Differences in accounting items (5 companies)
  • Retain records whose geographical location is
    equal to that of asia 18761 unique records

19
Issue 3 Preliminary chek cnt.
  • Missing data in
  • Fiscal code (12 records recovered in aida)
  • Geographical location (15 record 14 recovered
    in aida)
  • Legal form (859) Easy to obtain from ASIA
  • Economic activity (43 record)

20
Issue 2 Preliminary chek cnt.
  • LINKING BY FISCAL CODE WITH ASIA
  • 18023 companies out of 18761 match
  • 738 are not in ASIA NOT CONSIDERED IN OUR
    ANALYSIS
  • reasons
  • not eligible according to ASIA (excludes
    businesses in sectors A, B, L, P, Q, division 91
    and legal unit classified as institutions or
    belong to the private non profit sector)
  • not resident in Emilia Romagna at 2002 and moved
    later (location and other characteristics in AIDA
    refers to the last balance sheets available)
  • no more active in 2002 but compelled to present
    the balance sheet (i.e. liquidation, bankruptcy )
  • acquired by other enterprise during that year

21
Issue 3 Preliminary chek cnt.
  • Problems in comparing (Because point 2 -
    previous slide)
  • Ateco code 7982 records in AIDA with different
    values from ASIA (reduce to 3245 considering
    2-digits code)
  • Geographical location after normalisation, 814
    business are located differently (371 change SLL
    location)
  • Legal form after normalisation, 1416 enterprises
    with different legal form (whose 859 with missing
    data)
  • In the following analysis data from ASIA have
    been chosen

22
Framework for estimation
  • Sample is not random predictive approach is
    needed for estimation (Royall, 1988 Royall and
    Pfeffermann,1982 Särndal, 1996)
  • The predictive approach is based on a
    statistical model in the population relating the
    variable of interest to auxiliary variables.
  • The model should be able to make the selection
    process ignorable
  • It is not a behavioural model but its only task
    is to incorporate the relationships between
    observed and non-observed units

23
Framework for estimation cnt
  • The model is the only source for inference
  • From this model predictions for non observed
    units are obtained
  • Auxiliary variables are chosen in order to
    identify groups where the selection process is
    not informative (Estevao et al., 1995)

24
Framework for estimation cnt
  • Finite
    population
  • variable to be estimated
  • Auxiliary variables known for each
    unit
  • subsets of of units observed and
    non observed
  • sample units in a sample s
  • selection mechanism for a sample s

25
Framework for estimation cnt
  • In the model approach it i necessary to specify
  • function connected to the
    selection
  • Y distribution in the
    finite population on the basis of the a priori
  • Joint distribution
    of Y and

  • distribution based on
    observations

26
Framework for estimation cnt
  • Making inference on selection process
    can be ignored if , given ,
    (Rubin -1976)
  • If in a sample the variable of interest would be
    independent from the unit selection but
    dependent from the Z konwn
  • Then
    would be
    equal to the distribution of observations
    obtained ignoring the selection (inference is
    possible referring only to observations and the
    model)

27
Framework for estimation cnt
  • In a non random sample selection is represented
    by where Z is a variable indicator
  • Given s, inference on ,for the group
    selected by mean of Z, is obtained from
    , ignoring the selection
  • Making inference on others groups also
    should be known
  • If X are variables known for all the units in the
    population and it is true
    then
    the selection on Z can be ignored
  • Inference on quantities of the finite population
    needs also to know

28
Framework for estimation cnt
  • The first task for model specification is
  • to find out the variables driving the selection

29
Variables driving the selection
  • Legal form is the crucial variable for
    partecipating in AIDA and it is not possible to
    use it as auxiliary variable
  • Possible auxiliary variables are Size (number of
    employees) Sector of activity Sub-regional area
    SLL
  • To define their relevance, coverage and
    distributions in AIDA are compared with
    population

30
Tables by legal form
31
Coverage by size
  • SMALL FIRMS ARE NOT REPRESENTED
  • - gt 20 employees more than 74 (firms) 76.5
    (employees)
  • - 1 employee 1.2
  • - 2-9 employees 6 (firms) and 9 (employees)
  • - Distributions in AIDA 22 has less than 10
    employees vs 80 in the population.

32
Coverage by size
  • Excluding proprietorships and partnerships,
    coverage rise up
  • - gt 20 employees more than 90 (firms and
    employees)
  • - 1 employee 9
  • - 2-9 employee 30-40
  • - distributions get closer, especially for
    employees

33
Tables by size
34
Coverage by sector
  • Coverage by sector ranges between 5 to 60
    (firms) and between 30 to 80 (employment)
  • 50 of sectors present a coverage greather than
    17 (firms) and than 58 (employment)
  • AIDA and ASIA distributions are not too much
    different taking into account the simple index
    of dissimilarity is 0.22 and 0.11
  • Considering the subgroup of companies and
    cooperatives, coverage improves and distributions
    gets closer

35
Table by sector (2-digit code)
36
Coverage by SLL
  • Coverage by SLL is very variable it ranges
    between 3 and 23 (firms) and between 19 and
    78 (employees)
  • coverage in 50 of SLL is greater than 10.6
    (firms) and 53 (employees)
  • AIDA and ASIA distributions are not too much
    different simple dissimilarity indexes are very
    low (respectively 0.12 for firms and 0.06 for
    employees)
  • Considering the subgroup of companies and
    cooperatives, coverage improves and distributions
    gets closer

37
From the above analysis
  • From the evidence presented it is clear that the
    variables considered to study coverage are
    important in driving the selection hence are
    relevant to implement a model-based estimation
    strategy
  • Moreover they are important from an economic
    point of view.

38
From the above analysiscnt
  • Because of severe undercoverage of small firms,
    only firms with more than 1 employee (excluding
    mostly proprietorships and partnerships) are
    considered
  • As consequence, totals will be underestimated
    even if economic weight of firms with one
    employee is negligible they are 29 of
    population but represent only 2.8 of employees

39
BLU predictor
  • The total of Y
  • The predictor of T
  • We need a predictor only for the non observed
    units of the population
  • The form of the predictor its unbiasedness and
    optimality depend on the model
  • General specification for a BLU predictor as well
    as unbiasedness conditions are obtained assuming
    a general regression model

40
Model Groups
  • In our case, SLL are the estimation domains they
    form a partition of the population of N units
  • The population is to be divided in H groups able
    to explain most of the variance in Y.
  • Groups are based on size and economic activity
    according to strategies
  • 1 - groups are obtained cross classifying by
    firm size (number of employees) and economic
    activity
  • 2 - economic activity defines the groups and
    size enter the model as continuos variable with
    different parameter for each group

41
Model Groups cnt
  • SLL (estimation domain) could also be a relevant
    auxiliary variable to establish the model groups
  • Only if SLL effect is captured by activity sector
    (many SLL are specialised in one activity) they
    are not relevant
  • As consequence also SLL could be used to form
    groups

42
Superpopulation models
Group h- (size by sector) effect equal across
domains
Group (size x sector) effect plus indipendent
domain (q) effect
Interaction effect of size (continuous) with
sector plus indipendent domain effect
Interaction effect of size (continuous) with
sector
Group effect (size by sector) different across
domains
43
Superpopulation models cnt
  • Group effects are fixed
  • The model specification ask for the definiton of
    variance and covariance structure of observations
  • The simplest structure is Homoskedasticity
  • We do not expect Homoskedasticity but variability
    should depend on size and/or economic activity

44
Superpopulation models cnt
  • The analysis of residual of the model estimation
    for Value added confirms our expectation
  • We concetrated on two main heteroskedasticity
    structures
  • A) x size, k firm
  • B) m activity
    sector or size
  • A further alternative consider x2 in A and B

45
Superpopulation models cnt
  • Model selection is based on usual criteria to
    evaluate model optimality
  • - model fitting indicators (AIC, Likelihood
    ratio test)
  • - standard errors

46
Main findings cnt
  • the estimation under the model 5 is not suitable
    because it is possible only by high aggregation
    of groups considered no useful
  • The heteroskedasticity structure has a
    foundamental role and in particular the two
    alternative tested are relevant.
  • When x2 is considered further improvements are
    obtained

47
Main findings cnt
  • Under the assumption (A)
  • in the models 2 and 3 SLL effect is not
    significative (according to usual tests) hence
    the models 1 and 4 (SLL not considered) should be
    preferred

48
Main findings cnt
  • 2) Under the assumptions (B)
  • a) many specifications meet estimation problems
    (likehood does not behave properly)
  • b) when estimation is possible there are cases
    where SLL effect is significative
  • c) this happens in the model 3 when variance is
    modelled by m activity
    sector or size
  • d) according to AIC the best result is using
    activity sector

49
Main findings cnt
  • As consequence
  • The models to be considered are (1) and (3) but
    looking at AIC the variance specification (A)
    gives the worse results
  • model (3) is to be preferred also because it
    includes a significative SLL effect

50
Conclusions
  • AIDA is the only source to estimate main economic
    variables at sub-regional level and by economic
    activity sector
  • It is possible to reach a quite good
    approximation of national accounting variables
    and use it to obtain estimates of firms
    perfomances at sub regional and activity sectors
    levels

51
Conclusions cnt
  • Problems of estimation due to undercoverage and
    selection bias are managed by predictive
    inference
  • The approach has been implemented referring to
    different model specifications and variance
    assumptions
  • The
  • is to be preferred with the assumption that
    variance depends on firm size and activity sector

52
Further work
  • Further check of model robustness will be carried
    out
  • - Effect of influential observation
  • - Effects of some different groups
    aggregations
  • - Sensitivity analysis
Write a Comment
User Comments (0)
About PowerShow.com