Title: Computer-Assisted Coding of Text CASCOT
1Computer-Assisted Coding of TextCASCOT
- Software demonstration
- Rob Jones and Peter Elias
2Structure of presentation
- Background manual text coding
- Development of software history, aims
- CASCOT demonstration
3Coding text to a classification
- Coding is the process of categorising the range
of all possible answers to a pre defined set of
categories. - The full set of categories is termed a
classification. Examples are - SOC 2000 (Standard Occupational Classification
2000) - SIC 92 (Standard Industrial Classification 1992)
- Three parts to a classification the structure,
the index and the classification rules
4Text responses in surveys
- Q What is your job title ?
- Q Briefly describe your duties.
- Q What does the organisation you work for
mainly make or do?
5Manual coding procedures
- Manual methods
- code books
- temporary labour
- query resolution systems.
- No standardised approach, major variations
between institutions, companies, etc. in quality
of coding. - Time-consuming, expensive.
6Development of software
- CASOC Pascal/C text coding software for DOS
1993 2001. - CASCOT Java text coding software for any
operating system. - CASOC was ad hoc development, funded from sales
revenue. - CASCOT funded by ESRC.
7Occupational coding in practice
- Quality of coding reflects quality of text
available for coding. - Need rules which specify how to deal problems
such as ambiguous job titles (e.g. engineer,
teacher). - Need to be aware that machine coding of text can
introduce bias. - Need to establish trade off between accuracy
and cost.
8Cascot
- Cascot will provide
- A list of recommendations.
- Code, title, best matching index entry, and
certainty score - Certainty Score
- Approximates the probability that the recommended
code is correct. - This is represented by a number in the range
0-100. - People never 100 right. Computer cant be 100
right.
9(No Transcript)
10Text Input Area
11civil engineer
Type job title
Press enter, or click Code button
12Recommendations Table
13Code
Score
Group Title
Index Entry
Recommendations Table
14Classification Structure
15Index Entries
16Output
17Best recommendation selected automatically.
Select another by clicking a different line.
18Structure Index entry list will change.
19And output has changed.
20Change selection via structure
21Index entry list will change.
22And output has changed.
23Reading from a file
- Instead of typing every job title in, we can read
job titles from a file. - File must be in an acceptable format.
24Reading from a file.
- Simplest file - each line is a job title.
- But how do we know which job title is for which
person? (solution use a delimited file)
25Reading from a file.
- Tab delimited file.
- Each line Person ID TAB Job Title
26Reading from a file.
- Comma delimited file.
- Each line Person ID Comma Job Title
27Recording codes from Cascot
- Rather than having to copy the code produced by
Cascot we can have Cascot record the codes to a
file. - Open an Output File.
- One line written for each piece of text coded.
28Output Items
- After coding we have the following facts
- The text that was coded.
- The code it was given.
- The title for that code
- The best matching index entry within that code.
- The score Cascot assigned the match.
- Each of these facts is a Output Item
- We can choose which we wish to output (on the
screen or to a file). - Can also output items from the input file.
29Example Using Files
Input file (tab delimited).
30Example Using Files.
31Step 1 Open Input File.
32Select file, click open.
33Confirm / Select File Format.
34Choose selection options.
35 Click ok.
36Input File Details
First job title coded
37Step 2 Choose Output Items.
38Step 2 Choose Output Items.
Click Edit.
39(No Transcript)
40Available Items
41Current Items
42Current Output
43To add score click Add
44Then, click OK
45Step 3 Open Output File.
46(No Transcript)
47Select file, or type in name for new file.
Click Save
MyOutputFile.txt
48You will be asked if you wish to make the first
output row be column titles.
49Output File Details
50Select the preferred recommendation.
Or navigate to the correct code.
Once you are happy with the code - click 'Accept'
51The next job title appears. (Automatically
read from file after Accept)
52Select the preferred recommendation.
Or navigate to the correct code.
Once you are happy with the code - click 'Accept'
53If you dont know the code, or wish to defer
coding to a more expert coder. Click No
Conclusion.
54Output set to zeros.
55The no conclusion output is not final until you
click Accept.
56Example Using Files
Input file (tab delimited).
57Example Using Files
Output file (Output items Input Record, Code,
Title, Score)
58A fully automated run.
- Rather than clicking Accept to agree to the
best recommendation every time we can automate
the process. - But how good is this ?
- Example follows
- Random sample of real data
- 1200 unique job titles
- Coded automatically, sorted by score.
59(No Transcript)
60(No Transcript)
61(No Transcript)
62Skipping some pages ..
63(No Transcript)
64Skipping some more pages ..
65(No Transcript)
66Skipping some more pages ..
67(No Transcript)
68Skipping some more pages ..
69(No Transcript)
70Skipping some more pages ..
71(No Transcript)
72Semi automatic coding.
- Job titles with high certainty scores right.
- Humans agree with Cascot for high scores.
- Job titles with low certainty scores wrong
- We need human intelligence to decide the correct
code when we have low certainty. - Automatically agree to high scores but have human
decision for low scores. - What score threshold ?
- Small study by IER, University of Warwick shows
manual coders happy with 70-75 (some with 60). - Balance between time ( money) vs. quality
- Best practice sort input file alphabetically by
job title.
73Automated Assisted Modes
- Requires input and output files.
- Threshold level certainty score.
- Assisted mode
- score below threshold user prompted
- Fully Automatic mode
- score below threshold no code/zeros written
- Set Automation using Options Automation from
the menu bar.
74Using additional information to aid coding.
- Ambiguous job title.
- Coding manually look at other questions
- E.g. Q Briefly describe your duties
- Do the same with Cascot.
- But The data must be present in the input file.
- Best if The input file is a delimited file.
75Teacher is ambiguous.
Click View Record Button
76This Information can be used to determine that we
want Secondary Teacher
77Click X to close.
78Now select Secondary Teachers
79And Accept to move on.