Title: Implementing Coding Tools for a New Classification
1Implementing Coding Tools for a New Classification
- John Perry, UK Office for National Statistics
2Operation 2007 - The players
- In the UK The Standard Industrial
Classification of Economic Activities (SIC)
(current version SIC (2003) - In Europe NACE, the Nomenclature générale
des activités économiques dans les
Communautés européens (current version NACE
Rev 1.1) - In the UN ISIC, the International Standard
Industrial Classification of all Economic
Activities (current version ISIC Rev
3.1)
3The UK SIC
- is a 5 digit classification system
- is required, by EU legislation, to be identical
to NACE down to and including the 4 digit Class
level - contains a national 5th digit level which does
not exist in NACE
4The Results changes in structure
5ACTR as an aid to coding
- ACTR Automatic Coding by Text Recognition
- Developed by Statistics Canada
- ONS standard tool for coding, initially industry
and occupation - Replaces Precision Data Coder for industry coding
- Determines a code from a text description
- Extent of automation of process is controlled by
parameters
6Knowledge Bases SIC2003
- ACTR relies heavily on indexes of standard
descriptions - Business descriptions from responses to the
Business Register Survey - Published index for the SIC2003
- The short descriptions for each SIC2003 code
- Standard descriptions for construction industry
statistics - Trade code descriptions for PAYE (Pay As You Earn
Tax) employers - Farm type descriptions
- With a total of gt 30,000 standard descriptions
7How ACTR works
- Each input description is converted to a standard
form - This is compared with the standard forms of
descriptions held in the knowledge base - The closeness is presented as a score between 0
and 10 - The system has rules to determine whether the
score is sufficient to confirm a match - Requires a score of more than 7.5 to code
automatically (our setting which may differ for
other data sets) - Lower scores are passed through interactive
coding - Coding does not depend on the order in which the
knowledge bases are checked
8Extract from Business Register Survey
Questionnaire
9(No Transcript)
10(No Transcript)
11(No Transcript)
12ACTR Process
- Supplied text Horticultural services
- HORTICULTURAL SERVICE
- Best fit index entry Sales and service of
horticultural machinery - HORTICULTURAL MACHINERY SALE SERVICE
- Score is 6.911 (out of 10)
- ACTR prefers SIC 2003 code 51880 (Wholesale of
agricultural machinery and accessories)
13(No Transcript)
14Interactive coding
- Scores below 7.5 are passed to clerical staff for
coding interactively - The system presents options in descending order
of score - If none of the choices appear good, staff modify
the description - Once a decision is made, the person coding
confirms the choice - The index description is then held on the IDBR.
15Introducing the SIC2007 (NACE Rev 2)
- New index files
- SIC2007 headings
- SIC2007 index
- Initially code forward from the SIC2003 using
bridging codes these are codes for each
knowledge base entry that link the SIC2003 and
SIC2007 - Later will change to code backwards from the
SIC2007 - Eventually dual coding will cease
16Impact of ACTR on IDBR at Micro Level
- Existing SIC 2003 is 01120 (Growing of vegetables
etc) - The preferred ACTR SIC 2003 is 51880 (Wholesale
of agricultural machinery and accessories) - The SIC 2007 comes from the bridging code
- SIC 2003 51880
- Bridging code MTOLR
- SIC 2007 46610
- SIC 2003 code will change but only when agreed
17Conversion to SIC2007
- ACTR will deal with units that have a suitable
business description - Conversion tables will deal with
- Units with descriptions that ACTR is unable to
code (vague descriptions) - Units without a description
- Units supplied through administrative sources
(existing VAT traders, PAYE employers, Registered
Companies)
18Creation of Conversion Tables
- Tables have been created to convert units from
SIC2003 to SIC2007 - Using ACTR bridging codes
- Coding existing data through ACTR
- Producing cross-tabulation of SIC2003 to SIC2007
- Allocating on a probability basis rounded to
nearest 5 - Validate relationships against the acceptable
range of industries - Best fit tables also produced for users who
cannot accommodate probability based conversion
19Codingprocess
20Impact on the IDBR at the Macro Level
- Impact on SIC 2003 is only on those reporting
units that have business descriptions for local
units, where ACTR can code. - ACTR codes 620,000
- ACTR does not code 210,000
- No business description 340,000
- Administrative data only 1,660,000
- Total local units 2,830,000
- SIC 2007 comes from the bridging codes only where
ACTR codes otherwise SIC 2007 comes from
conversion from SIC 2003
21A AGRICULTURE, HUNTING AND FORESTRY SIC
2003 B FISHING C MINING AND QUARRYING D MANUFACTUR
ING E ELECTRICITY, GAS AND WATER SUPPLY F
CONSTRUCTION G WHOLESALE AND RETAIL TRADE
REPAIR OF MOTOR VEHICLES H HOTELS AND
RESTAURANTS I TRANSPORT, STORAGE AND
COMMUNICATION J FINANCIAL INTERMEDIATION K REAL
ESTATE, RENTING AND BUSINESS ACTIVITIES L
PUBLIC ADMINISTRATION AND DEFENCE COMPULSORY
SOCIAL M EDUCATION N HEALTH AND SOCIAL
WORK O OTHER COMMUNITY, SOCIAL AND PERSONAL
SERVICE ACTIVITIES P PRIVATE HOUSEHOLDS
EMPLOYING STAFF AND UNDIFFERENTIATED Q
EXTRA-TERRITORIAL ORGANISATION AND BODIES
22Impact at SIC 2003 broad industry level
(provisional counts)
23A Agriculture, Forestry And Fishing SIC
2007 B Mining And Quarrying C Manufacture D Electr
icity, Gas, Steam And Air Conditioning
Supply E Water Supply Sewage, Waste Management
And Remediation Activities F Construction G Wholes
ale And Retail Trade Repair Of Motor Vehicles
And Motorcycles H Transportation And
Storage I Accommodation And Food Service
Activities J Information And Communication K Finan
cial And Insurance Activities L Real Estate
Activities M Professional, Scientific And
Technical Activities N Administrative And Support
Service Activities O Public Administration And
Defence Compulsory Social Security P Education Q
Human Health And Social Work Activities R Arts,
Entertainment And Recreation S Other Service
Activities T Activities Of Households
U Activities Of Extraterritorial Organisations
And Bodies
24Correspondence between SIC 2003 and SIC 2007 for
local units coded by ACTR
25Implementation timetable
26Conclusions
- The ACTR tool delivers considerable savings in
terms of cost and burden on businesses compared
to traditional survey approaches. - The knowledge base is portable (i.e. independent
of the coding engine), enabling sharing this with
any interested parties, e.g. administrative data
suppliers, to increase the consistency of coding. - The use of bridging codes permits simultaneous
coding to multiple classification systems,
essential if periods of dual-coding are required.
- The knowledge base approach can help to inform
the development of future versions of a
classification, by providing a reference frame of
business activity descriptions.