Diapositiva 1 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Diapositiva 1

Description:

Quality Challenges in Processing Administrative Data to Produce Short-term ... What makes the Oros Survey peculiar with respect to other register-based ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 21
Provided by: tab51
Category:

less

Transcript and Presenter's Notes

Title: Diapositiva 1


1
Quality Challenges in Processing Administrative
Data to Produce Short-term Labour Cost Statistics
M. Carla Congia, Silvia Pacini, Donatella Tuzi
(tuzi_at_istat.it) Istat - Italy
European Conference on Quality 2008 in Official
Statistics Session on Administrative data.
Rome, 811 July 2008
2
Administrative data Session
Presentation Outlines
  • The Italian Oros Survey
  • The peculiarities of the administrative source
    used
  • The quality strategy in a context of timely and
    extensive use of administrative data
  • Final remarks

Q2008. Rome, 8-11 July 2008
3
Administrative data Session
The Oros Survey
Since 2003 the Italian NSI has released quarterly
indicators on gross wages and total labour cost
(Oros Survey) covering all size enterprises in
the private non-agricultural sector. Indices are
released 70 days after the end of the reference
quarter. In the past this information was
monthly collected only for large firms through
the Survey on Large Enterprises (gt 500
employees). The Oros Survey was planned to fill
this gap in the Italian statistics, using
administrative data (employees social
contribution declarations to the National Social
Security Institute - INPS) for Small and Medium
Enterprises, integrated with the survey data on
Large Enterprises (LES).
Nowadays, in Italy the Oros Survey is an
innovative example of administrative data
extensively used to produce timely business
statistics
Q2008. Rome, 8-11 July 2008
4
Administrative data Session
The Administrative Sources
All Italian non-agricultural firms in the private
sector, with at least one employee (roughly 12
million employees and 1.3 million employers per
year) have to pay monthly social security
contributions to INPS.
  • INPS administrative register (AR)
  • Contains structural information for each
    administrative unit (administrative id., fiscal
    code, name, legal form, dates of registration and
    cancellation, etc.). About 4 million records each
    quarter.
  • Transmitted at the end of the reference quarter.

Employers monthly declaration (DM10 form) Highly
detailed grid organized in administrative codes
with information on employment by type, paid
days, wage bills, social contributions, credit
terms and tax relieves. Each DM10 lays in more
records (on average 8 records per unit). About 10
million records each month. Transmitted 35 days
after the end of the reference quarter.
Q2008. Rome, 8-11 July 2008
5
Administrative data Session
Peculiarities of the Administrative Source
  • Differently from Survey data, the use of an
    administrative source
  • reduces the financial costs of a direct
    collection and avoids further response burden on
    enterprises
  • satisfies the growing demand for timely and
    detailed statistical information, for multiple
    statistical aims.
  • Yet, data collection is beyond the NSI control
    (that needs information about the quality of the
    administrative data used).
  • Strict relationships and coordination with the
    administrative institutions help to reduce the
    risks to incur in data quality problems due to
    the dependence from the data supplier.
  • In this, the Oros Survey does not differ from
    other register-based statistics.

Q2008. Rome, 8-11 July 2008
6
Administrative data Session
Peculiarities of the Administrative Source (2)
  • What makes the Oros Survey peculiar with respect
    to other register-based statistics is its release
    timeliness, that obliged Istat to acquire data
    without any previous check and aggregation
    (completely raw). Unusual statistical quality
    aspects are implied
  • the processing of a huge quantity of complex
    data in a very short time
  • the lack of standardized metadata to translate
    administrative information
  • the continuous changes of administrative
    definitions and concepts.
  • The acquisition of raw information allows Istat
    to monitor most of the processing aspects, but an
    hard work is needed to guarantee a high standard
    of quality.
  • A pervasive strategy of quality has been
    implemented, covering the whole Oros production
    process.

Q2008. Rome, 8-11 July 2008
7
Administrative data Session
The Quality Strategy in the Oros Production
Process
Q2008. Rome, 8-11 July 2008
8
Administrative data Session
The Administrative Register
  • The AR is used as a representation of the current
    population.
  • But
  • it suffers of over-coverage problems (temporary
    suspensions and firm closures are
    under-recorded)
  • the economic activity code is drawn from the
    Italian Business Register (BR) (90 of the Oros
    active units)
  • hard work to outline the estimation frame
    (exclusion of units not belonging to the Oros
    target population)
  • special attention to the quality of the fiscal
    code as leading matching variable.

Q2008. Rome, 8-11 July 2008
9
Administrative data Session
Preliminary Checks and Retrieval of the
Statistical Variables
Meta-information on laws, regulations,
contribution rates, codes and other technical
aspects of Social Security is timely collected
and updated in a standardized METADATA DATABASE
in-house built. It is necessary to carry out
  • preliminary checks on raw data and correction of
    errors on codes, record duplications,
    incoherencies with current legislation
  • translation of the administrative data into
    statistical variables, through complex additions
    and subtractions of a huge number of wage and
    contribution items identified by numerous
    administrative codes (actually more than 5,000)
  • estimation of some components for which
    information is not available in the
    administrative form (e.g. Employers injuries
    insurance premium and severance payment).

In this step each DM10 is reorganized in 1 record.
Q2008. Rome, 8-11 July 2008
10
Administrative data Session
Treatment of Measurement Errors
Once statistical data have been made available a
more traditional micro editing procedure is set
upbut
given the huge number of units, it is strongly
based on selective criteria. A score function
assigns to each of the 1.3 million of units the
probability that an error occurs in the target
variables.
Cut-off thresholds are fixed to select anomalous
values, but their identification is deeply
affected by the significant tails in the
distribution of the target variables
  • very low per capita wages (e.g. units with only
    supplementary earnings)
  • negative per capita other labour costs (e.g.
    social contribution rebates).

Q2008. Rome, 8-11 July 2008
11
Figure 1 Distribution of the per capita other
labour costs (euro values) in the Oros
manufacturing small and medium enterprises July
2007 -
Mean 450 Median 430 Max 6,900
Min -1,350
12
Administrative data Session
Treatment of Measurement Errors (2)
The edit and imputation rules are based on known
functional relations among the analyzed variables
and are aimed at evaluating and keeping at unit
record level both cross-sectional and
longitudinal consistency using information on the
closest months.
The number of monthly edits is generally not high
but even an oversight may have a significant
effect.
Quarterly changes of the Oros wage index in the
Wholesale and retail trade sector (G) In the
third quarter 2007, the number of employees of a
unit was affected by a measurement error part
time workers 73,000. Imputed data 2. Would have
implied a change of 0.8 instead of 3.
This step is mainly interactive. Given the nature
of data, by experience automatic corrections are
avoided
Q2008. Rome, 8-11 July 2008
13
Administrative data Session
Treatment of Non-response Errors
In the Oros Survey non-responses are units
delivering the DM10 with a delay. Nevertheless,
almost the 95-98 of the Oros population is
represented by the preliminary administrative
data. Given the tested MAR nature of the missing
units and their limited number in the preliminary
data, they do not significantly affect the Oros
wage and other labour cost changes.
Units referred to Temporary Employment Agencies
(TEA) are an exception, because of their strong
characterization.
About 100 units accounting for the 3 of total
employment in the private sector (20 in sector K
- Real estate, renting and business activities).
The absence of even few of these units may
significantly impact on changes of the per capita
indicators
Q2008. Rome, 8-11 July 2008
14
Administrative data Session
Treatment of Non-response Errors (2)
  • The single out of TEA unit non-responses is not
    an easy task
  • the population under study is represented by the
    current AR which suffers of over-coverage
    problems (a list of respondents is not
    available). It follows that the unit active
    status must be predicted, through a longitudinal
    analysis of the unit activity in the nearby
    quarters
  • given the strong dynamic nature of TEA, an hard
    work is necessary to follow their frequent
    changes (e.g. mergers, split-ups, etc.) over time
    to separate real non-responses from non-active
    units.

Imputation of missing data is deterministic and
widely based on the use of past information on
non-respondents and panel information on the
current respondents.
Q2008. Rome, 8-11 July 2008
15
Administrative data Session
Integration with Survey Data on Large Enterprises
In the Oros estimates a special attention is
given to Large Enterprises (firms with more than
500 employees - LE). In the Italian
non-agricultural sector LE account for about 1000
units employing 2 million workers.
  • In the past integration of survey data on LE was
    strongly motivated by a non-significant
    representation of these units in the preliminary
    administrative data.
  • Nowadays the INPS source guarantees a good
    coverage of these units but, as experience has
    suggested, the use of the statistical source
    provides higher quality data
  • enterprise recalling in case of non-responses or
    suspected measurement errors
  • more rapid and efficient management of the
    frequent legal changes these units are subjected
    to (e.g. mergers, split-ups, acquisitions etc.).

Q2008. Rome, 8-11 July 2008
16
Administrative data Session
Integration with Survey Data on Large Enterprises
(2)
  • Combining Survey and administrative data,
    specific quality aspects are involved
  • harmonisation of variables
  • record matching the fiscal code is the main
    linking variable, but ambiguities may happen
    because of formal errors or different updating
    time in the two sources (mergers, hive-offs,
    split-ups might be recorded in several periods).
    Big efforts are aimed at avoiding omissions and
    duplications, using supplementary information
    (legal name, number of employees etc.).

About 12 of LES employment is manually reviewed
and matched to the correspondent administrative
firms.
Q2008. Rome, 8-11 July 2008
17
Administrative data Session
Checks on Macro Data
Final checks on macro data are a key step in the
quality target to identify possible residual
errors that may affect the estimates. These
checks are mainly based on
  • analytic and graphical inspection of the time
    series at a sub-population detail acceptance
    boundaries must be respected by pre-defined
    statistical measures
  • automatic detection of outliers based on TERROR,
    an application of the software TRAMO-SEATS, where
    the detection of suspected errors is based on
    REG-ARIMA model estimates
  • comparison with other statistical source figures
    (e.g. National Accounts, Indices of wages
    according to collective agreements, etc.)
  • variable relationships, whose coherence has to
    be guaranteed (e.g. the ratio of other labor
    costs on wages, etc.).

If any error is detected, a drill-down to micro
data may be necessary
Q2008. Rome, 8-11 July 2008
18
Administrative data Session
Internal Oros Quality Reporting
  • The quarterly documentation and updating of the
    Oros production process is a fundamental task in
    the general strategy of quality
  • metadata are archived
  • methodological information is documented
  • imputed data are flagged (and pre-imputation
    data are archived)
  • quality indicators on the impact of imputation
    are calculated.

The documentation of the Oros process guarantees
its reproducibility and repeatability
Q2008. Rome, 8-11 July 2008
19
Administrative data Session
Final Remarks
  • The Oros Survey was
  • developed with any previous experience in the
    use of administrative data for the production of
    short term official statistics
  • gradually implemented learning by doing.
  • High timeliness, frequent changes in Social
    Security laws and regulations and strongly
    detailed raw data imply relevant and unusual
    quality problems managed through
  • strict relationships and coordination with the
    administrative institution
  • pervasive quality strategy along the whole
    production process
  • highly skilled human resources to handle the
    wide and non-conventional processing aspects,
    subjected to frequent modifications
  • systematic documentation of the production steps.

Less standardizable than a traditional survey
quality strategy?
Q2008. Rome, 8-11 July 2008
20
Administrative data Session
References
Baldi C., Ceccato F., Cimino E., Congia M.C.,
Pacini S., Rapiti F., Tuzi D. (2004) Use of
Administrative Data to produce Short Term
Statistics on Employment, Wages and Labour Cost.
Essays, n.15/2004, Istat, Rome. Caporello G.,
Maravall A. (2002) A tool for quality control of
time series data. Program TERROR. Bank of
Spain. Eurostat (2003) Quality assessment of
administrative data for statistical purposes.
Doc. Eurostat/A4/Quality/03/item6, available on
the web site http//epp.eurostat.ec.europa.eu/pls
/portal/docs/PAGE/PGP_DS_QUALITY/TAB47141301/DEFIN
ITION_2.PDF Istat, CBS, SFSO, Eurostat (2007)
Recommended Practices for Editing and Imputation
in Cross-Sectional Business Surveys, available on
the web site http//edimbus.istat.it/dokeos/docum
ent/document.php?openDir2FRPM_EDIMBUS
Thank you for your attention Donatella
Tuzi tuzi_at_istat.it
Q2008. Rome, 8-11 July 2008
Write a Comment
User Comments (0)
About PowerShow.com