Implementing Coding Tools for a New Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing Coding Tools for a New Classification

Description:

Title: Implementing Coding Tools for a New Classification Author: cuffe Last modified by: maurizio Created Date: 8/8/2006 1:14:00 PM Document presentation format – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 27
Provided by: cuffe
Category:

less

Transcript and Presenter's Notes

Title: Implementing Coding Tools for a New Classification


1
Implementing Coding Tools for a New Classification
  • John Perry, UK Office for National Statistics

2
Operation 2007 - The players
  • In the UK The Standard Industrial
    Classification of Economic Activities (SIC)
    (current version SIC (2003)
  • In Europe NACE, the Nomenclature générale
    des activités économiques dans les
    Communautés européens (current version NACE
    Rev 1.1)
  • In the UN ISIC, the International Standard
    Industrial Classification of all Economic
    Activities (current version ISIC Rev
    3.1)

3
The UK SIC
  • is a 5 digit classification system
  • is required, by EU legislation, to be identical
    to NACE down to and including the 4 digit Class
    level
  • contains a national 5th digit level which does
    not exist in NACE

4
The Results changes in structure
SIC 2003 SIC 2007
NACE Classes 514 615
NACE Classes not split 414 537
UK Sub Class splits 285 191
Total Sub Classes 699 728
5
ACTR as an aid to coding
  • ACTR Automatic Coding by Text Recognition
  • Developed by Statistics Canada
  • ONS standard tool for coding, initially industry
    and occupation
  • Replaces Precision Data Coder for industry coding
  • Determines a code from a text description
  • Extent of automation of process is controlled by
    parameters

6
Knowledge Bases SIC2003
  • ACTR relies heavily on indexes of standard
    descriptions
  • Business descriptions from responses to the
    Business Register Survey
  • Published index for the SIC2003
  • The short descriptions for each SIC2003 code
  • Standard descriptions for construction industry
    statistics
  • Trade code descriptions for PAYE (Pay As You Earn
    Tax) employers
  • Farm type descriptions
  • With a total of gt 30,000 standard descriptions

7
How ACTR works
  • Each input description is converted to a standard
    form
  • This is compared with the standard forms of
    descriptions held in the knowledge base
  • The closeness is presented as a score between 0
    and 10
  • The system has rules to determine whether the
    score is sufficient to confirm a match
  • Requires a score of more than 7.5 to code
    automatically (our setting which may differ for
    other data sets)
  • Lower scores are passed through interactive
    coding
  • Coding does not depend on the order in which the
    knowledge bases are checked

8
Extract from Business Register Survey
Questionnaire
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
ACTR Process
  • Supplied text Horticultural services
  • HORTICULTURAL SERVICE
  • Best fit index entry Sales and service of
    horticultural machinery
  • HORTICULTURAL MACHINERY SALE SERVICE
  • Score is 6.911 (out of 10)
  • ACTR prefers SIC 2003 code 51880 (Wholesale of
    agricultural machinery and accessories)

13
(No Transcript)
14
Interactive coding
  • Scores below 7.5 are passed to clerical staff for
    coding interactively
  • The system presents options in descending order
    of score
  • If none of the choices appear good, staff modify
    the description
  • Once a decision is made, the person coding
    confirms the choice
  • The index description is then held on the IDBR.

15
Introducing the SIC2007 (NACE Rev 2)
  • New index files
  • SIC2007 headings
  • SIC2007 index
  • Initially code forward from the SIC2003 using
    bridging codes these are codes for each
    knowledge base entry that link the SIC2003 and
    SIC2007
  • Later will change to code backwards from the
    SIC2007
  • Eventually dual coding will cease

16
Impact of ACTR on IDBR at Micro Level
  • Existing SIC 2003 is 01120 (Growing of vegetables
    etc)
  • The preferred ACTR SIC 2003 is 51880 (Wholesale
    of agricultural machinery and accessories)
  • The SIC 2007 comes from the bridging code
  • SIC 2003 51880
  • Bridging code MTOLR
  • SIC 2007 46610
  • SIC 2003 code will change but only when agreed

17
Conversion to SIC2007
  • ACTR will deal with units that have a suitable
    business description
  • Conversion tables will deal with
  • Units with descriptions that ACTR is unable to
    code (vague descriptions)
  • Units without a description
  • Units supplied through administrative sources
    (existing VAT traders, PAYE employers, Registered
    Companies)

18
Creation of Conversion Tables
  • Tables have been created to convert units from
    SIC2003 to SIC2007
  • Using ACTR bridging codes
  • Coding existing data through ACTR
  • Producing cross-tabulation of SIC2003 to SIC2007
  • Allocating on a probability basis rounded to
    nearest 5
  • Validate relationships against the acceptable
    range of industries
  • Best fit tables also produced for users who
    cannot accommodate probability based conversion

19
Codingprocess
20
Impact on the IDBR at the Macro Level
  • Impact on SIC 2003 is only on those reporting
    units that have business descriptions for local
    units, where ACTR can code.
  • ACTR codes 620,000
  • ACTR does not code 210,000
  • No business description 340,000
  • Administrative data only 1,660,000
  • Total local units 2,830,000
  • SIC 2007 comes from the bridging codes only where
    ACTR codes otherwise SIC 2007 comes from
    conversion from SIC 2003

21
A AGRICULTURE, HUNTING AND FORESTRY SIC
2003 B FISHING C MINING AND QUARRYING D MANUFACTUR
ING E ELECTRICITY, GAS AND WATER SUPPLY F
CONSTRUCTION G WHOLESALE AND RETAIL TRADE
REPAIR OF MOTOR VEHICLES H HOTELS AND
RESTAURANTS I TRANSPORT, STORAGE AND
COMMUNICATION J FINANCIAL INTERMEDIATION K REAL
ESTATE, RENTING AND BUSINESS ACTIVITIES L
PUBLIC ADMINISTRATION AND DEFENCE COMPULSORY
SOCIAL M EDUCATION N HEALTH AND SOCIAL
WORK O OTHER COMMUNITY, SOCIAL AND PERSONAL
SERVICE ACTIVITIES P PRIVATE HOUSEHOLDS
EMPLOYING STAFF AND UNDIFFERENTIATED Q
EXTRA-TERRITORIAL ORGANISATION AND BODIES
22
Impact at SIC 2003 broad industry level
(provisional counts)
Section Starting stock In Out Net Change
A B 167,000 0.5 0.6 -0.1
C, D and E 180,000 5.9 5.2 0.7
F 260,000 1.4 0.9 0.5
G 530,000 2.4 2.5 -0.1
H 188,000 2.3 1.6 0.7
I 116,000 2.7 2.4 0.3
J 58,000 6.5 3.3 3.2
K 872,000 1.2 1.3 -0.1
L 29,000 10.4 11.1 -0.7
M, N and O 432,000 2.9 3.8 -0.9
23
A Agriculture, Forestry And Fishing SIC
2007 B Mining And Quarrying C Manufacture D Electr
icity, Gas, Steam And Air Conditioning
Supply E Water Supply Sewage, Waste Management
And Remediation Activities F Construction G Wholes
ale And Retail Trade Repair Of Motor Vehicles
And Motorcycles H Transportation And
Storage I Accommodation And Food Service
Activities J Information And Communication K Finan
cial And Insurance Activities L Real Estate
Activities M Professional, Scientific And
Technical Activities N Administrative And Support
Service Activities O Public Administration And
Defence Compulsory Social Security P Education Q
Human Health And Social Work Activities R Arts,
Entertainment And Recreation S Other Service
Activities T Activities Of Households
U Activities Of Extraterritorial Organisations
And Bodies
24
Correspondence between SIC 2003 and SIC 2007 for
local units coded by ACTR
25
Implementation timetable
December 2006 NACE published
January 2007 SIC 2007 is published on NS website
February 2007 Development and tuning of data coder (ACTR) first release on 2007 basis, subject to revision
June 2007 Re-coding using ACTR
August 2007 New release of ACTR, using SIC 2007 index
November 2007 SIC 2007 Index published (consistent with ACTR August 2007)
January 2008 SIC 2007 fully implemented on the Register
2008 ???? ACTR SIC 2003 overwrites historic SIC 2003
26
Conclusions
  • The ACTR tool delivers considerable savings in
    terms of cost and burden on businesses compared
    to traditional survey approaches.
  • The knowledge base is portable (i.e. independent
    of the coding engine), enabling sharing this with
    any interested parties, e.g. administrative data
    suppliers, to increase the consistency of coding.
  • The use of bridging codes permits simultaneous
    coding to multiple classification systems,
    essential if periods of dual-coding are required.
  • The knowledge base approach can help to inform
    the development of future versions of a
    classification, by providing a reference frame of
    business activity descriptions.
Write a Comment
User Comments (0)
About PowerShow.com