Implementing Coding Tools for a New Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing Coding Tools for a New Classification

Description:

ACTR prefers SIC 2003 code: 51880 (Wholesale of agricultural machinery and accessories) ... I TRANSPORT, STORAGE AND COMMUNICATION. J FINANCIAL INTERMEDIATION ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 27
Provided by: cuf5
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Implementing Coding Tools for a New Classification


1
Implementing Coding Tools for a New Classification
  • John Perry, UK Office for National Statistics

2
Operation 2007 - The players
  • In the UK The Standard Industrial
    Classification of Economic Activities (SIC)
    (current version SIC (2003)
  • In Europe NACE, the Nomenclature générale
    des activités économiques dans les
    Communautés européens (current version NACE
    Rev 1.1)
  • In the UN ISIC, the International Standard
    Industrial Classification of all Economic
    Activities (current version ISIC Rev
    3.1)

3
The UK SIC
  • is a 5 digit classification system
  • is required, by EU legislation, to be identical
    to NACE down to and including the 4 digit Class
    level
  • contains a national 5th digit level which does
    not exist in NACE

4
The Results changes in structure
5
ACTR as an aid to coding
  • ACTR Automatic Coding by Text Recognition
  • Developed by Statistics Canada
  • ONS standard tool for coding, initially industry
    and occupation
  • Replaces Precision Data Coder for industry coding
  • Determines a code from a text description
  • Extent of automation of process is controlled by
    parameters

6
Knowledge Bases SIC2003
  • ACTR relies heavily on indexes of standard
    descriptions
  • Business descriptions from responses to the
    Business Register Survey
  • Published index for the SIC2003
  • The short descriptions for each SIC2003 code
  • Standard descriptions for construction industry
    statistics
  • Trade code descriptions for PAYE (Pay As You Earn
    Tax) employers
  • Farm type descriptions
  • With a total of gt 30,000 standard descriptions

7
How ACTR works
  • Each input description is converted to a standard
    form
  • This is compared with the standard forms of
    descriptions held in the knowledge base
  • The closeness is presented as a score between 0
    and 10
  • The system has rules to determine whether the
    score is sufficient to confirm a match
  • Requires a score of more than 7.5 to code
    automatically (our setting which may differ for
    other data sets)
  • Lower scores are passed through interactive
    coding
  • Coding does not depend on the order in which the
    knowledge bases are checked

8
Extract from Business Register Survey
Questionnaire
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
ACTR Process
  • Supplied text Horticultural services
  • HORTICULTURAL SERVICE
  • Best fit index entry Sales and service of
    horticultural machinery
  • HORTICULTURAL MACHINERY SALE SERVICE
  • Score is 6.911 (out of 10)
  • ACTR prefers SIC 2003 code 51880 (Wholesale of
    agricultural machinery and accessories)

13
(No Transcript)
14
Interactive coding
  • Scores below 7.5 are passed to clerical staff for
    coding interactively
  • The system presents options in descending order
    of score
  • If none of the choices appear good, staff modify
    the description
  • Once a decision is made, the person coding
    confirms the choice
  • The index description is then held on the IDBR.

15
Introducing the SIC2007 (NACE Rev 2)
  • New index files
  • SIC2007 headings
  • SIC2007 index
  • Initially code forward from the SIC2003 using
    bridging codes these are codes for each
    knowledge base entry that link the SIC2003 and
    SIC2007
  • Later will change to code backwards from the
    SIC2007
  • Eventually dual coding will cease

16
Impact of ACTR on IDBR at Micro Level
  • Existing SIC 2003 is 01120 (Growing of vegetables
    etc)
  • The preferred ACTR SIC 2003 is 51880 (Wholesale
    of agricultural machinery and accessories)
  • The SIC 2007 comes from the bridging code
  • SIC 2003 51880
  • Bridging code MTOLR
  • SIC 2007 46610
  • SIC 2003 code will change but only when agreed

17
Conversion to SIC2007
  • ACTR will deal with units that have a suitable
    business description
  • Conversion tables will deal with
  • Units with descriptions that ACTR is unable to
    code (vague descriptions)
  • Units without a description
  • Units supplied through administrative sources
    (existing VAT traders, PAYE employers, Registered
    Companies)

18
Creation of Conversion Tables
  • Tables have been created to convert units from
    SIC2003 to SIC2007
  • Using ACTR bridging codes
  • Coding existing data through ACTR
  • Producing cross-tabulation of SIC2003 to SIC2007
  • Allocating on a probability basis rounded to
    nearest 5
  • Validate relationships against the acceptable
    range of industries
  • Best fit tables also produced for users who
    cannot accommodate probability based conversion

19
Codingprocess
20
Impact on the IDBR at the Macro Level
  • Impact on SIC 2003 is only on those reporting
    units that have business descriptions for local
    units, where ACTR can code.
  • ACTR codes 620,000
  • ACTR does not code 210,000
  • No business description 340,000
  • Administrative data only 1,660,000
  • Total local units 2,830,000
  • SIC 2007 comes from the bridging codes only where
    ACTR codes otherwise SIC 2007 comes from
    conversion from SIC 2003

21
A AGRICULTURE, HUNTING AND FORESTRY SIC
2003 B FISHING C MINING AND QUARRYING D MANUFACTUR
ING E ELECTRICITY, GAS AND WATER SUPPLY F
CONSTRUCTION G WHOLESALE AND RETAIL TRADE
REPAIR OF MOTOR VEHICLES H HOTELS AND
RESTAURANTS I TRANSPORT, STORAGE AND
COMMUNICATION J FINANCIAL INTERMEDIATION K REAL
ESTATE, RENTING AND BUSINESS ACTIVITIES L
PUBLIC ADMINISTRATION AND DEFENCE COMPULSORY
SOCIAL M EDUCATION N HEALTH AND SOCIAL
WORK O OTHER COMMUNITY, SOCIAL AND PERSONAL
SERVICE ACTIVITIES P PRIVATE HOUSEHOLDS
EMPLOYING STAFF AND UNDIFFERENTIATED Q
EXTRA-TERRITORIAL ORGANISATION AND BODIES
22
Impact at SIC 2003 broad industry level
(provisional counts)
23
A Agriculture, Forestry And Fishing SIC
2007 B Mining And Quarrying C Manufacture D Electr
icity, Gas, Steam And Air Conditioning
Supply E Water Supply Sewage, Waste Management
And Remediation Activities F Construction G Wholes
ale And Retail Trade Repair Of Motor Vehicles
And Motorcycles H Transportation And
Storage I Accommodation And Food Service
Activities J Information And Communication K Finan
cial And Insurance Activities L Real Estate
Activities M Professional, Scientific And
Technical Activities N Administrative And Support
Service Activities O Public Administration And
Defence Compulsory Social Security P Education Q
Human Health And Social Work Activities R Arts,
Entertainment And Recreation S Other Service
Activities T Activities Of Households
U Activities Of Extraterritorial Organisations
And Bodies
24
Correspondence between SIC 2003 and SIC 2007 for
local units coded by ACTR
25
Implementation timetable
26
Conclusions
  • The ACTR tool delivers considerable savings in
    terms of cost and burden on businesses compared
    to traditional survey approaches.
  • The knowledge base is portable (i.e. independent
    of the coding engine), enabling sharing this with
    any interested parties, e.g. administrative data
    suppliers, to increase the consistency of coding.
  • The use of bridging codes permits simultaneous
    coding to multiple classification systems,
    essential if periods of dual-coding are required.
  • The knowledge base approach can help to inform
    the development of future versions of a
    classification, by providing a reference frame of
    business activity descriptions.
Write a Comment
User Comments (0)
About PowerShow.com